Why internal tech documentation doesn’t work

How I used internal documentation at my previous project and why it didn’t work.

At my previous company, for a few years, I was working as a software developer. At one point in time I wanted to jump into an existing project which was new to me. There was quite a lot to grasp, as this was a different technology to the one I was using — the project concerned the infrastructure so it was quite complex. Fortunately there were docs created by our CTO! That was great. So I cloned the project, set it up according to instructions, everything worked fine. I wanted to learn more about the architecture, design decisions, and important parts of the project. So I went to the documentation. First thing I did — I opened documentation in the browser on the left (it was not in Markdown then) and I opened the IDE with the project on the right. Then, while reading the documentation, for every code snippet put by the author in single quotes (e.g. `some_part_of_code`) I copied it and put it into the IDE search. I did it because I wanted to see the notions in the context of the code. It was important for me to see where it was in the project, in which module, and what did it contain. There I encountered two problems:

Problem 1. Searching was cumbersome.

Search returns all the places where a particular thing is mentioned. You can limit it but you need to know how (which is hardly possible for a new project). If I do not know the project, I am not sure which one was meant by the author. Especially when it is not the API reference but some description in specific context in the docs. I might not be able to find a proper place when search returns 100 occurrences. For my another small project, I took a random notion from code and s search returned 38 results in 12 files (see illustration).

cumbersome search for a newbie

If someone was describing this in the docs, without explicit indication which place was meant, I wouldn’t know which place to pick. The author could theoretically make a link from code to documentation (e.g. link to specific lines in GitHub) but the problem with this approach is the following…

Problem 2. Documentation gets out of date.

At any point in time docs can get outdated. No one knows when but it can happen during the next five minutes — for example if someone makes a new commit with a file name change. The worst thing is, that we do not know when it will happen, so we cannot set a reminder, that after six months we have to update it. Some parts will get outdated after five minutes, other parts after five years. I personally wouldn’t write any internal documentation knowing that it can get outdated the next day. In my mind, it would be a waste of time. Also, the more docs I write, the faster it gets outdated. So this doesn’t scale well, rather the opposite.There are solutions for API reference but for other parts of the codebase it is usually not the case. That is why, internal documentation at my company was small, plus partially outdated. Developers had no incentives to write it and to maintain it. Developers also won’t read it, because why would you read it, if you know that it can contain false information? You do not know which one is false and which one is true so basically the whole documentation related to a particular project is useless… That is the paradox — if you have outdated docs how does the user know that it can be relied upon. It is not a matter of one place being outdated. What is important is my perception as a client of the documentation (internal in this case but also external docs, in other cases). If I cannot rely on the docs I would rather ask support/skip it/use it without docs/use other tools with better documentation. And that is what I did. I skipped further reading, because I found a few things which were no longer there. What was the point for me to read it if I can learn incorrect things? So why would anyone write it, if no one would read it?

A practical example

Now let me give a practical example. You have all probably heard about the bitcoin limitation of 21 000 000 coins. It is not a part of the public API. Everyone talks about it. If I searched for a number 21 000 000 in bitcoin repo I would not find it, because bitcoin maximum supply is actually limited by the bitcoin halving algorithm, which is here: https://github.com/bitcoin/bitcoin/blob/fe03f7a37fd0ef05149161f6b95a25493e1fe38f/src/validation.cpp#L1133-L1144. The code is obviously self explanatory, right? ❤ Without someone pointing it to me and without this article https://ma.ttias.be/dissecting-code-bitcoin-halving/ (which, by the way, contains outdated link to the halving algorithm), I would not be able to tell you where the code responsible for maximum bitcoin supply is and how it works. I would think that I am dumb because I cannot find such an obvious thing in the bitcoin repo, the most popular OS project. If this was an internal project and your boss would ask you “can you estimate how fast you will change the maximum bitcoin supply from 21 000 000 to 22 000 000” you would probably imagine changing some number in the code. How long can it take? Well, it can take much longer than expected, if you cannot find the code responsible for that. I suppose many developers are in this place with their projects. The advantage of the bitcoin project is that it is well known, so there are articles about its internals. If your project is not that popular or if it is simply private, it will not work for you.

Basically, that is why no one wants to write internal documentation…

That is why I created…

Time for classical blog post summary…

Yes, exactly, considering the above, that is why I created the tool which:

  1. brings the code closer to the docs, by enabling to make connections between the two,
  2. helps to keep it up-to-date — automatically, when possible and manually, when it cannot be done automatically (or requires modification in the documentation).

The advantage of the tool is, that you can connect public docs to the code as well (probably everything else apart from API reference makes sense to be connected) and:

  1. create internal documentation for developers out of public docs,
  2. keep your public documentation up-to-date.

You might think that “public docs are far from code and cannot be connected to the code.” So my question is — why does public documentation get outdated, if it is not related to the codebase? Why is there a process in your company that once a PR is made you have to check what is being changed in public documentation? Why do Technical Writers and Product Managers need to work closely with the development team? When does the documentation change then, if not once the code changes? Obviously there are some settings which are not in the code but in the tools which are being used, however most of the docs probably describe the code — maybe on a completely different abstraction level. Still, if you describe some logic, there might be a test for that or a sequence of tests, or a configuration field, etc. If you are not describing code in public docs, what are you describing then?

Last thing — in the tool you can also browse the code next to documentation and you do not have to copy-paste the notions to search for in IDE, if you are a developer trying to understand a new feature. You have a clear entry-point when working on a new feature or if you are a new developer at the company.

Please share your thoughts on this topic, drop me a note if you think that there is some blindspot in my reasoning and write to me if you would like to create a PoC with me on your project (I can help you to make connections if you allow it, I am a developer).

PS. My product page: https://www.hastydocs.com.