I like programming – it’s what I do and I am blessed in that I get to spend most of my waking hours developing software. Like a lot of programmers I obsess over how good my code is and how I can get better at it.

Over the years there have been reading a lot of articles and books on software development. There has been a lot of ink spent (both physical and virtual) on ways to improve your “programming foo” and become a super ninja programmer ! There are some common pearls in all this ink and one of them is the advice on reading code. This advice, is usually a one liner couched in the midst of a bunch of other recommendations and usually along the lines of – find some great open source software or any piece of software that you admire, open up the source code (or print it out) and read it. While, this is on the whole, great advice there are some problems with actually putting it in practice. In this post I endeavor to give some practical suggestions on reading code, but first let us enumerate the problems.

The usual impression conveyed (in the posts that advise one to read code) is that the dispenser of the advice is a programming guru who can literally sit back in their chair with a page of code and read it like a novel. Well, I am sure there are some superb programmers out there who enjoy looking at pages of cryptic English-like statements over a cup of coffee and can hold entire class hierarchies and architectures in their heads. This post is not meant for them – this post is for poor slobs like me who find staring at reams of code a boring, frustrating and ultimately pointless exercise. Of course, it can be argued that one can learn simply by reading a single class or even a function of a the entire project code, but, IMO, except for the most simple problems, most software is interdependent. It is often impossible to appreciate the design decisions and the rationale behind a particular function or class layout without knowing the rest of the system…

The usual impression conveyed (in the posts that advise one to read code) is that the dispenser of the advice is a programming guru who can literally sit back in their chair with a page of code and read it like a novel. Well, I am sure there are some superb programmers out there who enjoy looking at pages of cryptic English-like statements over a cup of coffee and can hold entire class hierarchies and architectures in their heads. This post is not meant for them – this post is for poor slobs like me who find staring at reams of code a boring, frustrating and ultimately pointless exercise. Of course, it can be argued that one can learn simply by reading a single class or even a function of a the entire project code, but, IMO, except for the most simple problems, most software is interdependent. It is often impossible to appreciate the design decisions and the rationale behind a particular function or class layout without knowing the rest of the system… The next problem is getting code to read (actually before that you need to be able to identify code worth reading – check out this post for details on that). There is a lot of great software out there – both open source and freely available and licensed or proprietary. There are huge open source directories like Sourceforge and Google Code, and huge pieces of software like Open Office and Linux. If you are working in a software development company, you can probably get access to the proprietary code in your source control repository. A third common avenue are the programs distributed along with books on software development or as part of resources for education( Minix being the canonical example). Indeed we are actually spoiled for choice and from this universe of software identifying the ones that are good candidates for our purpose is a hard but essential task.

Another problem is the language in which the program is written – reading someone else’s code is tough enough as it is, adding the burden of familiarizing yourself with the quirks and syntax of a new language while doing this, is, IMO, a recipe for disaster and immense frustration . You need to find code written in a language that you are familiar with. This particular problem is not relevant if you are going through the code distributed as part of a book or as an educational resource, since you would have the book or your mentor to explain things and set out the context. If, despite this forewarning, you are planning to read code written in a different language than you are used to (without the benefit of having a book or a mentor) I would advise, that you at least learn enough of the language to create your own programs in it (“Hello World” does not count :-)) .

The bit about context brings me to the next problem – figuring out what the code is doing is a lot harder if you are not familiar with the software itself. For example, it is far more difficult to go through the Linux code and figure out the concept of runlevels if you don’t use Linux daily and see the Linux boot sequence. Using the software gives one a context with which to read the code – this context includes the common terminology used, the functionality and features of the software, even the quirks and bugs that you experience.

I have realized that for me ‘reading code’ does not really describe the activities that I undertake – a better phrase for what I do is ‘code comprehension’. It is quite difficult for me to sit back with a laptop screen (or a printout) full of code and simply read through it. I need a lot more than simply a piece of code – I like to be able to look at documentation, play with the software, step through the code and even write tests for it before I really appreciate it. This is a significant investment of my time and effort, so I have to be very picky about the software I want to “read” (comprehend).

The first filter I place on the code directory, when looking for code is the language filter – for me this means – C# or VB.NET or Python or Javascript(while I am familiar with C++, Ruby and F# as well I do not consider myself at a level where I can understand other people’s code in them). Next is to look for software that I have used – this allows me the a bit of a leg up since I know what the code is meant to do, cannot do and (if I am familiar enough) its limitations. Good candidates are open source software that you use in your day job (for eg. I use Cruise Control.NET, NANT and NUnit which are open source tools written in C#)

I happen to work in a software product company (a Microsoft shop), so one of the candidates for my reading list is the code in my companies source repository. If you happen to work in a software company, you can look at other projects, and even older versions of the software you are working on. In addition to providing insight on code, you get a pretty good idea of what was tried before and since. There are a few caveats though – First, if you don’t have direct access to other projects, you need to ask permission – some companies are very touchy about their “intellectual property”. Second, the quality of the software may not be as high as you think, since, in general, proprietary code does not get the kind of scrutiny open source code does. Warning signs to look out for are a lack of regular code reviews – if the software is not code reviewed the odds are that it would not be of good quality. Third (this point is inspired from feedback provided by my friend Praseed), if the code in your company is business software (HR, Finance, ERP, etc) there is a lot of business context that needs to be understood first. Also, since most of this code tends to be factored by business functionality, it generally seems less modular than utility code or APIs.

Look for well documented projects (this applies to open source as well as proprietary code). By this I mean, that the documentation should highlight the overall design, and rationale for the way the code is. Simply having auto-generated Java Doc type documents cannot be considered documentation :-). One useful avenue to explore is software created as educational resources (like Minix ). Since, the target is to teach through the software, they are usually quite clearly documented and have plenty of material explaining the design rationale behind the code. So, you have identified the software and downloaded the source code and documentation, so let’s get down and start spelunking ;-) Go through the design documentation and try to get a feel for the way the code has been built. Good software projects follow certain architectural patterns – these dictate the code organization. Once you get a handle on this, understanding the code becomes a whole lot easier. If you can create a class diagram of the code you can get a good idea of the layout.

The next thing to do is to compile it and run it. This can be straightforward or tough depending on the process followed in the project and it’s documentation.

Now it’s time to fire up your favorite IDE and go exploring. A good place to start your code exploration would be to try to trace a functionality of the project that you are familiar with. This would let you go through the various layers and sub-systems and get a handle on how they inter-connect. For example when I was exploring NUnit – I started by writing a test and looking at the code classes I needed to do that.

Try and identify the design patterns used in the code. If you do not know what design patterns are, then you need to stop reading this post right now and read this book. Familiarize yourself with design patterns – they form a great way to recognize and understand the design of well written code. This makes it easier to keep it in your head while reading code. It also helps you identify nuances and customizations made by the programmers more easily.

Try to write tests for the code to fully understand it – this is really useful way to understand the dependencies between different parts of the code. When you try to write a test for the code you first need to satisfy (mock) all its dependencies. Next you need to understand the possible entry points as well as the exit values for the code. This improves your understanding of the code and get you to the next level.

Finally, try to refactor the code. In this step you have moved from simply understanding the code to becoming familiar enough to be able to modify it. As the sophistication of your refactoring increases so too does your understanding. At this point you can if needed contribute your own code to the project :-)

“Code Reading” IMO is more than just reading – it is a distinct set of activities that together help one understand code. It might seem more intimidating than simply “reading code” but it is well worth then effort IMO.

Happy “code reading” :-)

Update: I came across this post by Joel Spolsky where he quotes Seth Gordon as saying code reading “Is just like reading the Talmud”… Yup, code reading is definitely not easy.

8.532389 76.955846