Tuesday, February 01, 2011

Annotated source code

We programmers are told that reading code is a good idea. It may be good for you, but it's hard work. Jeremy Ashkenas has come up with a simple tool that makes it easier: docco. Ashkenas is also behind underscore.js and coffeescript, a dialect of javascript in which docco is written.

Interesting ways to mix prose and code have appealed to me ever since I first discovered Mathematica's live notebook, which lets you author documents that combine executable source code, typeset text and interactive graphics. For those who remember the early 90's chiefly for their potty training, running Mathematica on the Next pizza boxes was like a trip to the future. Combining the quick cycles of a Read-evaluate-print-loop with complete word processing and mathematical typesetting encourages you to keep lovely notes on your thinking and trials and errors.

Along the same lines, there's Sweave for R and sage for Python.

Likewise, one of the great innovations of Java was Javadoc. Javadoc doesn't get nearly enough credit for the success of Java as a language. It made powerful API's like the collections classes a snap and even helped navigate the byzantine complexities of Swing and AWT.

These days, automated documentation is expected for any language. Nice examples are: RubyDoc, scaladoc, Haddock (for Haskell). Doxygen works with a number of languages. Python has pydoc, but in practice seems to rely more on the library reference. Anyway, there are a bunch, and if your favorite language doesn't have one, start coding now.

The grand-daddy of these ideas is Donald Knuth's literate programming.

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: "Literate Programming."

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

Indeed, Ashkenas references Knuth, calling docco "quick-and-dirty, hundred-line-long, literate-programming".

This goodness needs to come to more language. There's a ruby port called rocco by Ryan Tomayko. And for Clojure there's marginalia.

I love the quick-and-dirty aspect and that will be the key to encouraging programmers to do more documentation that looks like this. I hope they build docco, or something like it, into github. Maybe one day there will be a Norton's anthology of annotated source code.

Vaguely related


  1. You forgot Roxygen for R (http://roxygen.org/)

  2. And org-mode, a plain text authoring environment for Emacs that supports ~ 30 languages, including R, for evaluation on export to PDF or HTML.


  3. There's PyLit (http://pylit.berlios.de/) and Sphinx (http://sphinx.pocoo.org/) for Python. I've been toying with making a two column layout to allow for docco style documentation out of RestructuredText.