Developing Programmers .com

Local Search:



This site is optimized for standards so you can use any standards compliant browser:

Valid XHTML 1.0 Transitional
Valid CSS!
(RSS) RSS Feed

Web Search:
Google


Thursday, 8 December, 2005

Documentation For Programmers  

We all know we should document our code. But how much documentation is enough?

There are a few schools of thought on the subject; but in general if what you’re writing is more than a trivial applet then you will need at least three distinct kinds of documentation:

  • In-code comments that outline the algorithm you’ve used.
  • API documentation explaining class layout and what each function is for.
  • High level architecture documentation, outlining how you view the system and the overall approach you have taken.

There are other sorts of documentation used for managers, the legal department, etc as well but for now we’ll focus on documentation that is aimed at other programmers. This kind of documentation will help other programmers to get up to speed with the project, it will act as a reference to look up API calls so they can be treated as “black boxes” properly, and it will even help you a year down the track when you’ve started to forget your early decisions or when you want to clean up the system a little and need to step back and look at an overview.

In-code documentation is aimed at someone who has to understand and possibly modify the body of a function you’ve written. It is basically an outline of the main steps in the algorithm you have used. Some people argue that you don’t need this kind of documentation because the code is self-explanatory. Sometimes they’re right. As a rule of thumb you should be explaining yourself so that even the most junior programmer on the team can follow what your code is doing. You get less bugs from them trying to change things they didn’t understand that way. Also, good comments are not simply a repeat of what the code says. The code says what is being done, and the comments say why it is being done. This adds new information that the steps on their own don’t offer; and also provides a kind of cross validation: a peer reviewing your code can check “did your actions match your stated intentions?”.

API documentation has two main roles: It gives a detailed outline of your program, that glosses over the details of how things work and just tells you what every entity in the program is for. Why does this class exist? What is this method for? This documentation allows people to work on the code without having read all of it. The system is still “black boxes” but the boxes are all labelled and it’s reasonably clear which one would contain a particular feature if you were looking for it. This kind of summary is useful both for locating a section of code you want to work on, and also for using objects from other sections of code without having to know all about how they work. Want a message displayed? Find the function who’s job it is to display messages. How it does it is someone else’s problem (providing that you test that it really worked). Another use for this kind of documentation is to get a high level “feel” for all the parts of a program, which can help in re-factoring a project.

High level documentation shows the “grand plan”: A birds-eye view of how you have defined the project’s purpose and nature, and the few most important modules you’ve broken the problem into. This pretty much has to be written by hand and would be a good first document to hand to a new developer on the project. Ideally, managers should be able to comprehend at least half of this document because it is so high level and doesn’t get its hands dirty with trivial details.

Automatic Documentation

In-code documentation you write by hand into the program comments and that’s pretty straightforward. The high level documentation is too abstract to really automate so there’s no way to escape making decisions about what’s important and exercising those communications skills to illustrate it well. API documentation on the other hand, is simply a statement of purpose for each entity in the program. We can automate the process of listing all the entities in a program and even drawing diagrams of obvious relationships like inheritance. All we need to go a step further and list the purpose of each entity is to read the comments before each entity. And that’s exactly what most automated documentation tools do.

There are other approaches to automated documentation systems, for example Literate Programming. Literate Programming is, compared to most systems I describe here, “inside out”. Instead of extracting documentation from a program and its comments, Literate Programming tools extract a program from a huge documentation file about the program. Basically you write up all the documentation in such detail that it makes sense to embed bits of source in the documentation. Literate Programming systems extract the pieces of source and put them together into a program for you. I’ve never seen Literate Programming used in industry; only in academia.

Popular Documentation Systems

Javadoc popularized the idea of automated documentation systems, although I suspect it wasn’t the first system out there. If you know something about the history of documentation systems, email me or leave a comment; it’d make a great article and I haven’t found much from google.

These days there is at least one, and typically many documentation tools for any given programming language. I have made a list of some of the more popular systems and their details:

  • DoxygenGPL, Dimitri van Heesch, around since 1997.
    • C++ (including Qt extensions), C, Java, Objective-C, Python, IDL (Corba and Microsoft flavors) and to some extent PHP, C#, and D.
  • JavadocCommercial but free, Sun Microsystems, around since 1993? (unsure).
    • Java only but Java was made with Javadoc in mind.
  • Perldoc — “Artistic” or GPL, Larry Wall, around since 1987.
    • Perl only but Perl was made with Perldoc in mind.
  • PydocCustom open license, Guido van Rossum, around since ????.
    • Python only but Python was made with Pydoc in mind.
  • kdocGPL, Sirtaj S. Kang, around since 1997.
    • C++ with Qt extensions.
  • fortodocGPL, an old and largely abandoned project of mine. OK I know I said I’d only list popular systems and this one isn’t popular but it does demonstrate that even older languages like Fortran have been retrofitted with documentation tools. Around since 2001.
    • Fortran only
  • Doc++GPL, Dragos Acostachioaie, around since 1999.
    • C, C++, IDL and Java

And of course there are many more systems. Feel free to comment listing more.

A Quick Example

The ideas behind these systems are all pretty similar so I’ll just pick one for the example: Doxygen because I’ve been using it most recently.

The basic idea is to mark up source files with comments saying what each entity is for. Doxygen, like most documentation systems, uses a special variation on the usual comment operators to signify a comment that is intended for producing API documentation.

Here’s a slightly contrived example:

/// A "square" is a four-sided shape. 
///
/// This implementation stores the side length and lets you query the area. 
class square
{
protected:
    /// The length of one side of the square, in centimeters.
    float side_length;
public:
    /// Create a square given the length of  side.
    /// \param new_side_length The length of a side in centimeters.
    square(int new_side_length);
    /// Calculate area of the square
    /// \returns the area in the square in centimeters.
    area() { return side_length * side_length; }
}

With a few tweaks of the Doxygen configuration file, this produces HTML and Latex output. The HTML output is viewable here.

Configurable variations include use of "///", "//*", or "//!" to indicate special API documentation comments; how to extract the “brief” description of elements from the full description, graphical class hierarchies, call graphs, include file dependency graphs, and more.

Chances are you’ve used this sort of API documentation before; it’s considered pretty standard in commercial programming and now you know how it’s created!

The key with all these systems is to learn how to use them early and to start documenting your program as you write it rather than having to catch up later. If you document as you go then it needn’t take significant time.

Roundup

  • I’ve described three kinds of documentation for programmers:
    • In-code comments
    • API documentation
    • High level documentation
  • I’ve listed several systems that can automatically generate API documentation from carefully commented program source; and listed license details and links for more information for each of them.
  • I’ve offered a “short and sweet” example of Doxygen to give an idea of what these systems’ inputs and outputs look like.

Chances are you will have to learn the specific system of choice for a new employer when you get there; but having learnt a couple of other systems will give you a good head start. The last thing you want is to have to ask at an interview “Documentation can be generated!?”

Posted by sarah at 5:01 pm in: Documentation , Tools (2167 views)

1 Comment

  1. I would guess that (like a lot of language features) automatic documentation came from LISP, in the form of docstrings. However, I too would have to look it up somewhere…

    Comment by jiri — On 8-12-2005 at 6:42:14 PM

Please use the DP Forums for further discussion of this topic.