It seems to me that there are two programming modes: green fields and maintenance, and that their commenting requirements are different. In the green fields mode I emphatically agree with the assertion that less is better wrt comments. However, in maintenance mode comments are often the most valuable contribution to the code that you can make for a number of reasons: to correlate a bug with a code change if today's maintainer needed large effort to understand the code so most likely will tomorrow's a maintainer is often better placed to see where a comment provides the most bang for a buck (sometimes) to distinguish a maintenance change from original code (sometimes) to elucidate the maintainer's understanding of the code That is not to say that every changed line of code requires a comment or that every comment should be a story that includes the maintainers state of mind and how much coffee had been consumed. The rules for succint and sparse commenting still apply but the bar is lowered somewhat in light of a need to compensate for deficiencies (perceived or otherwise) in the original code. The alternative may often be a completely impractical refactorisation of the code. If a back story is required for the code changes, that should be in the bug tracking system with a synopsis in the check in comment in the revision control system. If even half an hour is spent trying to understand a section of code, and that investigation could be averted in the future by addition of a brief comment, then the comment is a good investment. Most times just a couple of words is enough to provide the hint required for understanding. Anything more than a few words should point to documentation elsewhere - five minutes reading documentation is still way better than a half hour investigation. Non-trivial bug fixes require a higher degree of commenting simply because, if the issue wasn't obvious to the original programmer, it probably won't be obvious to whomever follows. Very nice node btw! DWIM is Perl's answer to Gödel

I agree with you on this, in every detail. Especially if the maintenance programmers comments are of one of the following forms: ## See rt #1234





## See perlmonks node: 123456





## See change log item buk 08-05-2007





## A new element (TIME_OF_DAY) was added to the array





## See file DB_object for a description of the elements of the array





## The data is denormalised (fields are duplicated) for efficiency. That's just a few that come to mind. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Well, green field projects usually enter maintenance(ish) mode fairly quickly, usually somewhere between three and six months in. This isn't a bad thing, but rather a fact of life and so should be embraced and dealt with. If they don't take on a more maintenance kind of style, it is because of a lack of one or more of: Short iterations (shorter than a month)





Proper testing (and the resulting lack of code quality feedback)





Deployment, or at least decent show-and-tell sessions with the stakeholders (and the resulting lack of design and requirements quality feedback) /J

++ for your thoughtful insights.



I dislike coming across code that is more comment than code.



I disagree though, that just because we can't be expected to make up for deficiencies in the knowledge or those that follow us in maintenance, that that is reason to comment tersely or not at all.



A programmer, especially a skilled one, can hold a great many variables in their head at once. And a programmer using a succinct language such as Perl can produce a mountain of functionality from a few lines of code - that often can't be written more clearly even if it is spread across more lines of code. A maintenance programmer coming in later will have to absorb all of the variables, and all of the logic - JUST to get an idea of what is going on in a local area. Code split into paragraphs with simple discriptive comments at the top of each paragraph are easy to digest and figure out what is going on. They are also potentially search able.



Looking at some code I wrote six months ago I came across the following "code paragraph headings:"

### check if we are in the delete grace period ### adjust check time if the last order was a renew or a transfer ### determine what the status will be if there are no errors [download]

I was able to dive right in to what is going on with the code without reading any code. Each of those comments are around a section of code of 5 to 10 lines each that I COULD figure out if I wanted to. But I didn't have to expend the thought to do so.



Now, one important thing, is that you cannot account for programmers who lie or who comment incorrectly - but it becomes obvious quickly which comments don't correlate at all.



So I would add some additional addenda to your conclusions:



No programmer can expect to account for all maintenance programmer shortcomings, but that does not free you from the obligation to try.



More often-than-not you are the maintenance programmer.



Keep your logic in discrete logical chunks with defined purpose - defined either with a comment or in external documentation.



Some code requires no comments.



No amount of documentation (too much, too little, or just right) will ensure that the maintenance programmer will read your documentation. Use comments and documentation.



my @a=qw(random brilliant braindead); print $a[rand(@a)];

A thought experiment; what if I changed a few words in your post? Looking at some code I wrote six months ago I came across the following lines: if (in_delete_grace_period()) { ... } adjust_time( $check ) if last_order_needs_time_adjustment(); determine_status() unless $errors; [download] I was able to dive right in to what is going on with the code without reading any comments.

That would be great... assuming that those subroutines were able to to be passed the 20 variables that are in scope... and assuming that they are able to pass back the two or three variables that they modified.



You have given a nice contrived example that would certainly work under some conditions - but not this one ((sarcasm)which you "obviously" could've guessed without seeing the code(/sarcasm)). My example came from working code - a small random sample of tens of thousands of lines that are all broken into commented paragraphs. Some sections do look more like what you said - but not often.



Yes I could break into a subroutine every 10 to 20 lines, but then I have a different problem -- the code is arguably more fragmented and less easy to follow and has to find other ways to pass around variables. Some sections of logic cannot be broken up without severely hampering readability, or without serious code mangling - but they can be broken into paragraphs with comment headings.



Update: made sure that sarcastic comment is taken as such.



my @a=qw(random brilliant braindead); print $a[rand(@a)];

No programmer can expect to account for all maintenance programmer shortcomings, but that does not free you from the obligation to try. I'm sorry, but we will probably have to agree to disagree on this. To my mind, that is exactly what it does do. There is simply no way for me to predict what gaps there will be in the knowledge of any programmer that will see my code. Or what they will find difficult to understand. Or what idioms they will eshew as 'too complex'. I do not know who they will be, or what their experience levels will be. Or how many of them there will be. Or what their personal coding preferences, prejudices or blind spots will be. Given that uncertainty, the best I could hope to do is include the entire contents of perldoc in a comment at the top of every program. Ridiculous you say, but it isn't. As far as understanding the vagaries, inconsistencies and idioms of Perl is concerned, that is the definitive answer. The idea that I could write it better is ludicrous. That it should be a part of my source files is more so. No amount of documentation (too much, too little, or just right) will ensure that the maintenance programmer will read your documentation. Use comments and documentation. What makes you think he would read the comments? The first thing I do in most every piece of other peoples code I pick up is to throw away the comments (in a copy!). The second is to manually reformat the code to my personal preferences. The third is to 'fix' any obvious verbosities, by recoding them to use (my preferred) idiomatic constructs and discard any extraneous (used once intermediary) variables. Only once I have been through the entire file, line by line, and any related files required to allow me to form an understanding of what the code actually does--not what the author thought it did--do I even consider going back and looking at the authors comments. It's often very illuminating with respect to the authors level of understanding of what he wrote :) And only after that process is complete do I consider myself even vaguely competent to consider modifying that code. Even then, if this is real code that I am about to modify, I may well feel the need to add a bunch of trace statements and run it before I will consider attempting to modify it. I never, ever take comments to be the truth. Take those three comments you posted above. From those comments I can derive quite literally zero information about what that code, or the program they were a part of, is designed to do. You might, but then you wrote them. But to me, they are entirely and utterly meaningless in the absence of further information. However, if you had posted the code, minus the comments, I'll bet that I could derive a whole lot more information than is included in those comments. Often as not, if there is a code bug, or a subtle potential bug there, I'll find it. But if I read your description of what you think you code is doing, before I read the code itself, when I do read the code, I'll probably not question it as closely, and so come away with the assumption that what you described it was doing, is correct. Code is precise; words are not. Words are ambiguous and open to interpretation; code is not. The meaning of the words you use is subject to your education, your context, your prejudices, your mood, your experiences etc. etc. Your code suffers from none of these flaws. Programmers tend to comment those things that they personally found difficult to code or understand. They reflect their own abilities and knowledge, not that of the people they are (supposedly) writing the comments for. Even when you are your own maintenance programmer, you run the risk of influencing your own thought patterns back into the same (incorrect) groove you were in when your wrote that code. One of my favorite debugging techniques, (for my own code when time and circumstances permit), is to simply abandon it for a while. Leave it as is and go do something else for a while before coming back to it and trying to understand it again. The intervening period of time doing something else, preferably something very different, has the effect of washing my mind of all the assumptions and conclusions that I had previously reached about a problem and allows me to focus on it afresh. To come at it from a different angle. And so often that allows me to see the assumptions I was making. And see the errors in those assumptions. If I comment the code with those assumptions, then when I return to that code, I'm likely to fall right back into making the same assumptions, and the same errors. I'll say it again. Extensive comments are redundant and dangerous. I use occasional and sparse comments when I feel they truly clarify a piece of code, but in general, my choice to not comment my code has nothing to do with being lazy. It is an explicit and conscious decision based upon my personal experiences of the quality of comments I've encountered and the value I perceive they have. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

knowing your audience

(Deacon) on May 09, 2007 at 08:05 UTC by doom on May 09, 2007 at 08:05 UTC

There is simply no way for me to predict what gaps there will be in the knowledge of any programmer that will see my code. Or what they will find difficult to understand. Or what idioms they will eshew as 'too complex'. I do not know who they will be, or what their experience levels will be. Or how many of them there will be. Or what their personal coding preferences, prejudices or blind spots will be. Well yeah. And there's no way for you to be able to predict whether the folks here on perlmonks can understand these sentences you wrote, and even after trading remarks back and forth on the subject several times, you may still have some doubts about whether we can follow what you're saying here, and difficulties like this arise in every single aspect of communication right on down to choices of appropriate subroutine names... and yet we all must muddle through, as best we can. You may be someone who can read code better if it has no comments: don't assume everyone is like that. (Would you like some help writing a comment-stripper script?)

Some notes below your chosen depth have not been shown here

Comments serve a different purpose to documentation. And documentation does not belong in source code files. Are you talking about Perl here? Are you saying that using POD in perl code is wrong in some way? I cannot agree with that. (And just where should the documentation be?) Okay, programming is hard, and writing proper documentation is at least as hard -- often harder: striking the right balance of natural language coherence and logical/procedural accuracy is really another form of the common tasks in programming: problem-solving, puzzle-wrestling, optimizing, getting things into the right operational sequence, working out the best way to refer to things, and so on. (Let's face it, every higher-level programming language is really a sort of subset or reduced adaptation of a human language, designed to make control of computers possible for a wider range of humans, and easier for the ones who really understand the machines. Comparing this to mathematics, chemistry, etc, is disingenuous -- people in those fields need text books, in a human language that they know, in order to learn the symbols and the syntax for combining them. And that's a lot of work!) So doing both the code and the documentation -- and maintaining both of them to keep them in sync -- is really hard. But that's a lousy reason for not doing it, and based on experience, I tend to believe that the code is easier to write (and to get right more quickly) when the documentation has been written first, and is as clear and unambiguous as the code needs to be. What I mean is: the coding is less work when the documentation is done first. An added nice feature about that, when you're able to do it, is that you can show the docs to people who don't know how to program (e.g. a customer or sponsor), and assuming you are dealing with people who know how to use the same human language that you are using, they can understand it, react to it, make suggestions for improvements, and so on. Even though the population proficient in any given human language is "limited", it's a lot bigger than the population proficient in a given programming language -- English, French, etc are good for somehting, even to programmers. And a nice thing about doing it as POD in your Perl script is that it's always right there for you, the programmer, and it's also really easy to present to others, the non-programmers, in a clear, human-readable form. When time/money are limited, and you can only do one thing, then obviously it's better to write the code rather than the documentation. But then you get what you pay for, which is probably going to be about half of what you really need.

In general, I'm afraid that this piece stikes me as an extended rationalization for being lazy about documentation and writing uncommented code... my personal experience is that if the programmer is under the delusion that the code is "self-documenting" you might as well throw it away and re-write it than attempt to figure out what's going on with it. One of the strengths of the perl programming culture is that so many of it's programmers are also fluent in English, and have no problem with writing comments, pod, web pages, articles, books, etc... One of my slogans these days is that "perl is the best documented language in history". If this approach bothers you, maybe you should be looking into a language with a culture that's more suspicious of words. BrowserUk wrote: One of the high priority goals in programming is the removal/avoidance of codependencies. We avoid using parallel data structures (eg.parallel arrays), because it becomes a nightmare to maintain those parallel arrays in synchronisation as algorithms and projects evolve. One of the key attributes of well designed classes (and other abstractions), is that they are as fully independent of their peers as possible. As decoupled as is possible to achieve. This is a clever line of argument, but I don't think it applies to the subject at hand. Nearly every workable methodology for verifying the correctness of code involves comparisons between multiple implementations of the logic: the code isn't complete without documentation (and specs?), and these days most of us would say that it isn't complete without automated tests. Note that automated tests have the same problems you're complaining about with comments and documentation: when you make changes it's likely you're going to need to make changes in all of these roughly-parallel structures: code, comments, docs and tests. (Update: Fixed attribution of quote to BrowserUk.)

In general, I'm afraid that this piece stikes me as an extended rationalization for being lazy about documentation and writing uncommented code... Sorry, but that first sentence indicates that you have not read what I wrote, but rather skimmed a few bits and reached a conclusion based upon what you think I probably wrote. Eg. Comments are not, and should not be, documentation Documentation of code is vitally important for successful, ongoing projects, but comments are not documentation. I have no problem with you holding a different opinion to myself, but don't put words in my mouth, or draw concusions based upon things I haven't said, much less things I specifically and deliberately already countered. Eg.2 I (BrowserUk) wrote the article, but you've addressed your reply, and attributed my words to graff. If your comments and documentation are as good, you are welcome to them :) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comments and docs are issues. But with automated tests, you have to treat them as the running specs of the system. If you change the specs, then you have to change the test cases, and only then, can you change the code to conform to the tests again. In fact, you should never be able to successfully compile your entire project if a test fails. That way the test serve its purpose and it never becomes out of sync.

Are you talking about Perl here? Yes. Are you saying that using POD in perl code is wrong in some way? It depends, but essentially: yes. Firstly, I am not a fan of POD-- leastwise, not in it's current Perl 5 form--but that's a different issue. Secondly, there are many forms of documentation. Comments are (should be) explicitly confined to annotating the code. That is, this particular piece of code, in this file, Preferably on this line. Anything more than this should be dealt with in documentation--somewhere else. At the very minimum, documentation should be moved to the bottom of the source file. Interleaving it with the code, detracts from the code in ways that go far beyond just interrupting the programmers overview of the code. It also creates dependencies. I cannot agree with that. Fair enough. (And just where should the documentation be?) In a separate file with the same name as the source file but with a different extension. That is, the unit documentation. System documentation should exist at the system development directory structure. Okay, programming is hard, and writing proper documentation is at least as hard -- often harder: striking the right balance of natural language coherence and logical/procedural accuracy is really another form of the common tasks in programming: problem-solving, puzzle-wrestling, optimizing, getting things into the right operational sequence, working out the best way to refer to things, and so on. Sorry to be pedantic, but my meditation was addressed to comments, not documentation. and as I said: Comments are not (and should not be) documentation. Simplifying, for the purposes of discussion, some questions to ask when deciding what is comment and what is documentation are: Would this comment be useful to anyone other than another programmer working on this piece of code ?

?



Would this comment have any meaning to anyone, in the absence of the code?





If this code is ported to another language, would this comment still make sense? /li> If the answer to any of these questions is yes, then that comment is not a comment, but documentation. As such, it should be a part of the documentation set, not tucked away in a comment where it will never be seen outside of the source file. (Let's face it, every higher-level programming language is really a sort of subset or reduced adaptation of a human language, designed to make control of computers possible for a wider range of humans, and easier for the ones who really understand the machines. Comparing this to mathematics, chemistry, etc, is disingenuous -- Sorry, but I strongly disagree that it is disingenuous to compare these notations. people in those fields need text books, in a human language that they know, in order to learn the symbols and the syntax for combining them. And that's a lot of work!) And programmers do not?. They need text books (or their electronic equivalents), to learn how to program. So doing both the code and the documentation -- and maintaining both of them to keep them in sync -- is really hard. But that's a lousy reason for not doing it, and based on experience, Again, I am not questioning the need for, nor resisting doing, documentation. I am saying that comments -- # words more words etc. is the wrong place for documentation. I tend to believe that the code is easier to write (and to get right more quickly) when the documentation has been written first, and is as clear and unambiguous as the code needs to be. What I mean is: the coding is less work when the documentation is done first. I totally agree with you, but writing documentation in comment cards is wrong. An added nice feature about that, when you're able to do it, is that you can show the docs to people who don't know how to program (e.g. a customer or sponsor), and assuming you are dealing with people who know how to use the same human language that you are using, they can understand it, react to it, make suggestions for improvements, and so on. Even though the population proficient in any given human language is "limited", it's a lot bigger than the population proficient in a given programming language -- English, French, etc are good for something, even to programmers. But not for describing algorithms. It is too imprecise. Too shaded with (mis)interpretable meanings. Too complex and vague. And a nice thing about doing it as POD in your Perl script is that it's always right there for you, the programmer, and it's also really easy to present to others, the non-programmers, in a clear, human-readable form. If you place your documentation in (for example) POD, that's true, but if you place it in comments, the only way anyone sees it, is if they look in the source file. Again, you are failing to recognise the distinction I make between comments and documentation. When time/money are limited, and you can only do one thing, then obviously it's better to write the code rather than the documentation. But then you get what you pay for, which is probably going to be about half of what you really need. I find this to be an artificial distinction. I've never yet worked on a commercial project where time and money were not limited. I also think that no plan survives first contact with the enemy. That is to say, documentation written at a level that prescribes the implementation of any given subroutine, function or class, in advance of the first attempt to implement it, is usually so far from reality that it is next to useless either in writing that first implementation, or for maintaining it once the code is working. Without serious work, it rarely if ever reflects the realities of the implementation. But in any case, you are still confusing my meditation on comments with something relating to documentation. Nothing I said precludes, nor denigrates the writing of documentation. Indeed, I said in my meditation that "documentation was vital". All I said that related to documentation, is that comments are should not be used for documentation purposes. And that source files are not the right place for it. I will concede that, if you think that embedding documentation in code (POD) is a useful concept, then keeping that documentation in the same file as the source has some merit. Not much, but some. However, I strongly feel that the practice of interleaving code and documentation is a bad one. If you must keep it in the same source file, at least keep it all together. Preferably at bottom after the __END__ mark, though that is impossible if the code uses a __DATA__ section. Personally, I think that POD has it backward. If you want to go with the idea of containing code and documentation within the same file, I much prefer the Literate Programming concept of having the interpreter/compiler simply ignore everything outside of some equivalent of <code></code> blocks, as is used by Haskell, D and others. This allows you to write the documentation first, in whatever markup it adopted by the project, and using the full power of that chosen markup. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

At the very minimum, documentation should be moved to the bottom of the source file. Interleaving it with the code, detracts from the code in ways that go far beyond just interrupting the programmers overview of the code. You're in agreement with Damien Conway on this, but myself I think interspersed pod-style has some strong DRY advantages. Most subs require some verbal description to go with them, and there's some overlap between what you need to say for the benefit of a maintenance programmer working on your code and a client programmer interested in using your code, and it's not at all a bad idea to put that inside an "=item" that goes with the sub. (And I would hope that it is obvious that remarks that are only of interest to a maintenence programmer should be confined to comments.)

It also creates dependencies. The dependencies between the documentation and the code do not go away if you move the documentation elsewhere.

Personally, I think that POD has it backward. If you want to go with the idea of containing code and documentation within the same file, I much prefer the Literate Programming concept of having the interpreter/compiler simply ignore everything outside of some equivalent of , blocks, as is used by Haskell, D and others. This allows you to write the documentation first, in whatever markup it adopted by the project, and using the full power of that chosen markup. I'm once again, having trouble figuring out what you're getting at here. Nothing about the interspersed pod style I'm arguing for prevents you from writing the documentation first, and in fact I often work that way. Of course, if you use this style you have to present the code in the same order that you want it to appear in the documentation, and that's a problem that would go away in a true "literate programming" language. By the way: you do understand that you can embed other markup inside of pod if you want to, right? From perlpod: =begin html <hr> <img src="thang.png"> <p> This is a raw HTML paragraph </p> =end html [download]

If you must keep it in the same source file, at least keep it all together. Preferably at bottom after the __END__ mark, though that is impossible if the code uses a __DATA__ section. Actually, you should get out of the habit of using __END__ at all: it breaks mod_perl code.

I agree with you completely, BrowserUK, but especially with two points (my paraphrasing): The best course of action in examining code is to strip out everything but code and format it consistently.







Documentation should be detailed and use every tool we humans have come up with for enhancing readability. I, too, am a fan of Literate Programming, though I have yet to find the right tool to give me both views of a project. We should be able to switch between pure-code view and documentation view at will, without the messy comment-stripping and PerlTidy steps in the middle. Even better, I'd like three views: pure code, syntax-highlighting with hyperlinked definitions and where-used lists, and documentation view with full text enhancement. Ed Ream's Leo is a step in the right direction, but it isn't all the way there. Mindmaps have documentation advantages, too, as do some CASE environments. I think doxygen is also a great example of the direction we should be heading in this arena. The Xerox PARC Star environment (also Smalltalk/V, Squeak, etc.) with classes and clickable browsers also had excellent usability features, although Smalltalk, the language, was a syntactic mess.



We have powerful computers these days, with incredibly powerful graphics. I grew up with 7-segment displays and a hex keypad, and I spent much of the first half of my career burning EPROMS on microcontrollers. I have come to believe that, no matter what the ultimate target, fancy IDEs that enhance the programming (and debugging) experience are essential to rapid and effective code development. The need grows exponentially more urgent as teams expand and the underlying hardware, OS, and system interfaces become more complex and critical.

Don Wilde

"There's more than one level to any answer."

Very well said. Thank you for the time you put into it. And most especially, thank you for saying something that needs to be said over and over again until everyone in the universe understands it: Programmers should not expect to be able to open a source file at random, move to some arbitrary point within that file and then instantly be able to modify the code there from just a cursory glance at the code and comments in the locality. I cannot tell you how many times I've heard "This program worked fine until the last time we had it modified by our usual programmer, and ever since it works fine most of the time but every now and then doesn't work at all". It's always always always because their "usual programmer" modified the thing without bothering to gain sufficient understanding of the program as a whole entity. When I hear that story from a potential new client, I won't even look at the code until I'm on the clock. Too often, that other guy in whom they've lost confidence has been given one last chance and is out there feverishly kludging together yet another bandaid pseudofix. His desperation will always move faster than my diligence.

I cannot tell you how many times I've heard "This program worked fine until the last time we had it modified by our usual programmer, and ever since it works fine most of the time but every now and then doesn't work at all"

My usual follow up to conversations like these is something like: "Good thing you now have extensive unit tests around your code. You do have unit tests around your code right?"



Any recent system I've worked on has a test suite. "I" understand the code "I" have written and "I" have broken the code when "I" though "I" knew what "I" was doing. Commenting or not has little to do with the situation. It is good that I've had revision control and unit tests to figure out what I broke, how to fix it, and how to check that everything still works.



That really should be done in maintenance mode as well - find all of the existing use cases, wrap them up in a unit test file. Make sure that test file works as you progress in your changes.



Oh - and be sure to leave a comment for why you chose the new algorithm that you implemented. :)



my @a=qw(random brilliant braindead); print $a[rand(@a)];

It's not often that I consent to inherit someone else's problem code, but when I do there's an economic reality to be considered. If the client's cap is at four hours of labor (billed), I'm not going to spend 40 hours writing tests. I don't provide charity to commercial enterprises aside from that extorted by the IRS to be doled out as corporate welfare. I always comment non-obvious design decisions, by the way. Those are comments that might actually be useful to the competent maintenance programmer who's been tasked with refactoring my code. I frequently include URI's for in-depth discussions of the algorithms that might not be known to programmers at large. If I've optimized the code, I'll frequently include benchmark results to show that the code has been optimized at the right point in the development cycle. But I'm not going to take on the role of educator to explain to a maintenance programmer why I've chosen to implement a database connection as a singleton, or memoized a recursive method that can benefit from it. He already knows why, or can do his own homework, or can take the job he's best suited for down at the car wash. I can't save the code or the client from him and it's not my job. While I value good code and sound professional practices, I'm really not going to lose any sleep over what happens two years down the road when some cheap client hands my code over to some high school kid who's only worked through the first four chapters of Learning Perl. Not my problem, not my job. If I'm for some reason deemed not quite good enough to do the work, then that squeaky voiced kid who calls to ask "Yo, Gee, w'sup widdis singleton shizit?" had darn well better be. He dialed the wrong number and is on his own.

I enjoyed your node very much. One observation - in the case of the sciences, we spend a great deal of time bringing people up to speed so that they can be able to manipulate the abstractions we use to describe the world. I don't see this happening in the IT world of *** certified technicians, developers and the other short cuts that are used to force feed basic knowledge and practices into someone who then calls themselves a certified programmer, admin or whatever will sell a certificate. A PhD can take three to seven years and even then we regard the output of these programs as beginners - and this is someone who has been constantly exposed to and practising the discipline since graduating secondary school. It takes several more years to really know what is going on and to be able to manipulate it in a peer acceptable way. On top of that we promote our research and ideas by writing down and publishing the results of our research in a specially crafted format that allows some one to reproduce what we have done. If it's not published and reproducible, its simply not science. Nowadays if it's good science you count the numbers of times it is cited by others as support for their own work. The closest I think IT comes is in the likes of APIs, RFCs and formal documentation systems. Although certification is very in vogue, really the products are still beginners - very like moving from printed letters on ruled pages to joining the letters up in scripts on unlined pages, but still really not knowing how to compose an essay very well. The IT equivalent of peer review is either passing and failing test suites or use cases. Not many organizations do code reviews from what I can see. And each of these does it differently - I have never really come across a standard for writing code. Or seen it enforced in the way the scientific community uses peer review to at least standardize the transmission of knowledge in papers. The closest I have ever seen of the process of teaching and learning how to do the trade in IT was in my company before it was acquired by a larger corporation. They brought in people, had them work under more senior managers and they learned through tutoring and experience how to manipulate the codebase in a way that was acceptable to groups of coders. We had documentation of the code both in the source and as formal documentation. We had test suites, though not to the level I see advocated by the Monks. The process was sufficiently succesful enough that we could offshore to India, but the documentation was not enough - what was still missing was the simple ability to ask a question and get an answer. Once the original coders were gone, development slowed to a fraction of what it used to be. Its not a problem of the new developers - they are very skilled. Its a problem of how much time it takes to develop enough knowledge to understand an abstraction in the code base that is not documented well or at all. From what I can see we may all speak Perl but there are dialects and accents in how it is used. This is much the same as in natural languges where there are dialects of English I can't begin to understand and accents that I find more pleasing than others. Going back to you original point, I think what you are advocating is great if it's your own stuff, you are working within a stable group or you are very experienced in all idioms of yor language/development platform. The main problem is the level of documentation needed for projects where the personnel are not stable and the next developer is someone who is not a script kiddie but is not as fluent as the last guy to comprehend and be able to make the modifications necessary to complete a job. What does some fellow with a small business really need to have in order to keep on using the code you developed for his web site, his aplication or whatever. The closest he can undestand to coding and development is going to his local garage and getting work done on his car. The proof of the mechanic is how well the car runs afterwards. An upgrade (new seats, new stereo) either work or don't work - the question for the mechanic is how well can he make modifications based on his understanding of what appears to be necesary to modify the car. And this difference between what you meant and what he thinks he needs to do to make a modification is where the dragons lurk. Just because he is not the world's greatest mechanic shouldn't stop him from changing the beaks, replacing sparc plugs or doing a new paint job. The question is how to help him recognise the job will be more difficult that first or second glance shows and then help him understand enough to either do it or pass it on to someone more experienced. Sometimes the manufacturer's manuals just don't do it and you wish you could quiz the systems mechanic on what the hell was he thinking. I hold that if you write something you should keep a mind to the next maintainer and provide enough documentation or test suites to help them understand what the code does. Sometimes it has to be code comments. Promoting standardized coding practices ala Conway is a good way of reducing dialectic confusion. But if you think the code needs it, then put in the comment. The new guy will soon see whether they are useful or not, but at least you gave him the chance to understand what you were thinking of at the time you wrote the comment. If nothing else it shoes some fairness to the boyo that paid you to do the work in the first place, even if he was a tight wad. MadraghRua

yet another biologist hacking perl....

There is a serious flaw in comparing programming to scientific research. Programming in the large is not science.



To equate programming in general to fields of science is like equating architecture and construction of a building to physics. Programming uses CS as a basis the way architecture and construction use physics as a basis.



There are a couple of simple rules at work here that often get overlooked in the programming realm: The size and complexity of the project determines how sophisticated and informed your methods need to be. Also, the more levels of knowledge being built upon your contributions, the more fundamental and universal your contributions need to be.



Let's face it. A cook is a chemist of sorts, and so is a candle maker. Neither is going to rewrite textbooks about chemistry. A carpenter or bricklayer has some sort of physics knowledge, but they are not going to split the atom or need a particle accelerator. At one time, a cab driver or a cashier actually had to know some math, but they weren't in the business of proving the number of dimensions in the universe.



We don't have any solid, universally accepted differences in title between a sysadmin who's really talented with shell scripts and someone who writes frameworks for generalizing facial recognition software. Just because both might be called "programmer" or "software developer" does not mean the two are equivalent. I think this is a large part of the confusion in the software industry. I also think it's part of what causes so much friction between the highly qualified, very method-oriented people doing lofty things and the people who write small applications for small clients in small markets.



The truth is, you don't want everyone to be a scientist. You don't even want everyone to be an engineer. You won't wait for a Pasteur or a Salk to mix your cough syrup. You won't wait for a Hawking or an Einstein to build your house, or for a Tesla or Westinghouse to wire it. What you want, down in the trenches, is people who take the results of research and engineering and figure out how to apply those components and best practices to the project at hand. Researching fundamentals is important, but so is having homes, food, clothing, and a community site like Perlmonks to debate the point. If the research and experiments have already given us evidence that a particular way of doing things is a good way, why can't some people just use those good ways while others further the research?



Just because some people don't use best practices when they should doesn't mean everyone should be in the realm of developing new fundamental knowledge. That's where the line between technology and science lies. Technology benefits from science, but people as a whole benefit from the fruits of technology more than directly from the fruits of science.



Day-to-day programming is, like building construction, more a technological trade than a science or research topic. It's a method of combining established parts using established methods to further one of a set of fairly common goals. Sometimes a building contractor finds an innovative and truly better way to build part of a house. An architect is more likely than the contractor to find a new and better way to design the whole house. That same architect might develop a new and better way to fasten two materials together, but an engineer is more likely to do so. The engineer might develop an altogether better material from which to make the fasteners, but a researcher is more likely to do that. It's this way in any field, really. I'm not sure why people get the idea that software should be different.



Another trend you'll notice if you look at more established technologies is that these mostly stratified layers become more established the longer a trade is around. We didn't used to have a designer, a spinner, a dyer, a weaver, a cutter, and a seam sewer for making garments. People used to design and build a car single-handedly, but now there are auto researchers, auto engineers, auto factory engineers, auto factory employees, and auto factory robots. In time, I think we'll see such lines drawn in the software industry, of course with some mobility among layers. I just hope the titles aren't still, 'chief software designer', 'system analyst 3', and such meaningless chaff.



Christopher E. Stith

Excellent post. :-) I always liked the comparison between software and building construction. I think it's one of the best "real-world" analogies we can use to explain what we do, but there are differences. I view myself as being somewhere between an engineer and an architect. I like to "think with my hands" so to speak: think up a rough overall design, implement it and then move code around, rewrite parts until it works well and makes sense. The reason that works well (or at all) in software development is because of a pretty fundamental difference between building construction and software development; there is hardly any difference in software between the design and the product. It's only a matter of abstraction. Any really comprehensive software design is practically code (and you can argue that the reverse is also true). That's also a reason I don't like/believe in "automatic code generation from designs" - if your design is that comprehensive you're still programming, only chances are you're not using the right tools for the job. "What should it profit a man, if he should win a flame war, yet lose his cool?"

It's hard to pick out a appropriate quote from your first six paragraphs, it would require quoting the whole lot I think. I completely agree with you. Programming is still yet a very young discipline when compared to most others. It has yet to established itself in the same way or to the same level of scientific reproducibility as the older disciplines. it is still searching for it's equivalent of the universally standard nomenclatures enjoyed by those other disciplines, and that's reflected in the constant strains between existing and new languages; existing and new methodologies; existing and new tool-sets; working practices; et al. Everything within the industry is still in flux. In part, this is because unlike the historic, slow boil evolution of the scientific methods involved in mathematics, and the other sciences, which evolved over centuries, through the hands, minds and publishings of a few greats, working primarily alone, but in communication with their (few) peers; programming is evolving in a world of mass communication, widespread, low-level participation, and commercial pressure. It is not so much evolving as constantly revolving. (As in revolution, but also going around in circles. :) With each new generation, comes a least one 'new' methodology, language, set of working practices. And usually many of each. Each will be held up by it's proponents as the magic bullet that will revolutionise the industry. The Holy Grail for productivity, or reliability, or re-usability, or efficiency. And usually all of those. Invariably, each of these revolutions has some merits. Equally invariably, they have their flaws also. In this ever changing, and multi-frontiered world, it has become nigh impossible to be aware of all the developments as they happen, never mind keep pace with them sufficiently to be able to pick out those parts that will persist and make it into the future. The historical debates between expert practitioners, often conducted over years by letter, or even the publishing of books, has no true equivalent in the modern world. Single ideas that would have formerly been debated over years, (sometimes decades), come and go in fleeting electronic moments. Lost in the noise of internecine wrangling; interpersonal debates; ad hominem attacks; general and specific flame-fests of all kinds. Everyone has an angle; an axe to grind; an ego to defend; a buck to make. The software industry spends far too little time exploring and debating the outcome of ideas, methods and practices. Instead of looking back at what was done, and trying to determine what was effective and what was not; how small changes might have made big differences to outcomes; how small parts from several methodologies could be extracted and combined to produce something bigger than the sum of the parts. As an industry, we are always looking for the next magic bullet. The next revolution that will save our collective souls. We discard not just the bad parts of what we did before, but everything, and leap upon the next bandwagon like it was a number 32 bus headed for nirvana. The art of debate has been lost. The merit of seeing the other guys point of view; the benefit of constantly re-evaluating ones opinions in the light of the other guys thoughts and experience; is effectively dead within the programming community. If you use Perl, you must also be convinced that CamelCase is bad; must subscribe to the OSS ethic; must be one of the LAMs in LAMP. I'm a relatively new study on the history of mathematics, and I am aware that some of the historical greats were far from ego less. I know that there were, at various points in time where two or more of the greats coexisted on this earth, considerable and heated (if you can get heated when communications is by a letter back forth every month or two :), disputes over big ideas. But, from what I read, on those occasions when the owners of those egos and conflicting ideas got together in the same place, they would happily sit down and share a meal and a few bottles of wine before or after a face-to-face debate where they each held opposing views. Technology means that we are able to communicate ever more quickly and with ever wider groups, but the human part of the equation hasn't kept pace. We humans still fall into our historic roles of tribes and combatants. To try and bring this back on track. The software industry became an industry before it became a discipline. There are many, too many, languages, methodologies, sets of working practices, existing and coming on stream everyday, for any one person or group of people to be able to say what is best and what is not. Where pure research (into software and development) exists, it tends to be very far removed from the every day realities of commercial practice. To badly paraphrase a line from a sci-fi program: the software industry is still young; it has much to learn. Sometimes, the best way to move forward is to look backward. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

BrowserUK wrote: I completely agree with you. And I found that to be a very puzzling remark at first, because it was pretty clear that MadraghRua was disagreeing with you: MadraghRua wrote: Going back to you original point, I think what you are advocating is great if it's your own stuff, you are working within a stable group or you are very experienced in all idioms of yor language/development platform. The main problem is the level of documentation needed for projects where the personnel are not stable and the next developer is someone who is not a script kiddie but is not as fluent as the last guy to comprehend and be able to make the modifications necessary to complete a job. But I gather that this is the reason you mentioned the "first six paragraphs", because you were explicitly ignoring everything else. ("The art of debate"?)

Some notes below your chosen depth have not been shown here

I think you've muddied the waters somewhat by saying: Documentation of code is vitally important for successful, ongoing projects, but comments are not documentation, and it has no place in source files. If you're talking about POD here, then I think that's distinctly different to comments. POD (or JavaDoc or whatever you happen to be using) is documentation, not comments. It tells a user of a module how it's interface works. The fact that it's kept in the same file as the source code doesn't make it a comment. In fact, I think it's much better to keep it there, than some separate file that someone may not even realise is there when they're updating the code (one reason why doing this has become an industry-wide standard).



As far as actual comments go, I think you tend to find roughly 3 types: Those explaining what something does Those explaining how something does what it's supposed to do Those explaining why something does what it does I tend to think #1 is OK in small doses. It can make code more readable, which to me trumps the synchronisation issues you're talking about. And at any rate - if kept high level enough - what a particular piece of code does isn't likely to change that much (at least without the whole block of code getting wiped or re-written, along with the comments), so it's not likely to get too out of date.



#2 is either there because whoever maintains the code is assumed to be stupid, or the code itself is obfuscated. In the latter case, the comments are really a red flag (i.e. not evil in and of themselves, but a good sign some refactoring may be needed).



#3 is often needed if the system-level design is (or has become) poor. This is true of quite a lot of the code I'm currently working with. In these cases, the ideal situation is to fix the design. However this is not always practical. This is where these comments can become useful. They shouldn't be a subsitute for system documentation, but they can make it much easier to pick up the context when reading code (and even more so if they reference system docs).



So on the whole, I don't really think extensive comments in and of themselves are evil, but they can often be pointers to other issues.

If (module/unit) documentation only ever consisted of interface specifications, then (perhaps), POD would be fine. But, all too frequently, it also has to contain a mountain of stuff that is essentially unrelated to the code. Whether it's algorithm explainations, or market research, or comparative studies or whatever else might be useful or required by the users of a module. But often this stuff is of no consequence to the programmer maintaining the module. To conflate user documentation with programmer documentation is a bad idea. Even if the modules users are also programmers at the next level up, mixing the two types of documentation together means that it serves neither group well. There's also the problem that purely textual changes, from typos to reattributions to rephrasing of prose can trigger RCS trails associated with the code that shouldn't be. Documentation also belongs in the RCS system, but as independant entities to the code, so that documentation changes do not affect the revision history of the code and vice versa. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

I think you have some valid points, but there are some issues with having separate module documentation files. Apart from documentation not being directly under the maintainer's nose as I mentioned above, a lot of tools expect the docs to be in the same file as the code, so browsing it becomes an issue.



This could fairly easily be worked around, but on balance, I think it becomes a matter of personal choice, rather than one way being inherently superiour to the other. (There are ways to work around the problems of having it all in one file, such as having a good standard of what the format for each module should be, to keep out-of-place info creeping in).



But I think this is a separate discussion to your original point, which was on the merit of code annotations.

There's also the problem that purely textual changes, from typos to reattributions to rephrasing of prose can trigger RCS trails associated with the code that shouldn't be. Documentation also belongs in the RCS system, but as independant entities to the code, so that documentation changes do not affect the revision history of the code and vice versa. This is actually a very good point (though I have to say I thought that I was the last person in the world using RCS for version control). Documentation and comment changes are more common than actual code changes (at least if you're someone like me who does re-write it to keep it up-to-date), and it might be better if those changes weren't cluttering the logs. Damien Conway recommends against putting project documentation in a dedicated *.pod file: he argues that there's no way to be sure the *.pod will get installed correctly in some place where it can be found, but if it's inside the *.pm file, you know it's always going to be with the code. In part, under the influence of Conway, I've been sticking to a documentation style where more tangential/higher-level material is confined to some pod at the bottom of the *.pm file (I'm a fan of "MOTIVATION" sections myself, they're often the key thing that makes clear whether you want to use a module at all). But in a multi-module project, I find I definitely have a problem with repeating myself too much, e.g. stating the project's purpose over-and-over again... There are some clear advantages with having a project overview pod file elsewhere.

in my experience most code begins its life as prose, whether it is pseudo-code in my head or the pure natural language of CEO/CTO-speak, project notes, task descriptions, PRDs, statements of work, feature requests, bug reports, brainstorming sessions, notebook scribbles, user stories, sprint objectives, and other human-sourced content. along the way, ideas are translated from the human realm to the human-computer shared-language realm of computer programming. a well-placed comment of intent by a well-meaning but slightly-ignorant ;) co-worker can help reveal that a paragraph of code is mis-implemented. in my opinion, the "coupling" of comments and code can be a good thing, as long as both are both concise and local. but the purpose of the comments should not be to document the intricacies of your particular choice of implementation, but a general paraphasing of the task at hand.

++ for the excellent node. I frequently find comments that violate the DRY principle. They just repeat the code. They cause the same maintenance problems that repetitive code creates, often worse since some developers read the comments more carefully than the code. I liked the comment on mathematicians, physicists, etc. avoiding textual descriptions of theorems and designs. I suppose testing is a good way to help us understand both code and theorems (even when they're well written).

Mathematicians, physicists, chemists and engineers go out of their way to avoid textual descriptions of their theorems and designs. Their nomenclatures have evolved over hundreds of years through hard won, practical experience. Much greater time periods than the computer industry, and computer code has existed. In every case, they have evolved not because someone decided that it would be a 'good idea'; or as a form of protectionism al la the use of Latin in religious ceremonies, legal work and matters of state; nor because they are too lazy to write their technical descriptions out in full in 'proper English' (French, German, Italian etc.). Hopefully you'll pardon a nit-picky digression, on the grounds that it's at least mildly interesting. Legal Latin belongs in the former group, not the latter. Legal writing has every reason to be precise, and the use of phrases from a dead language serve that end. They're useful because they aren't used or understood in other contexts. Take the phrase, per stirpes: literally, it means "by the stocks". If your will leaves money to your children per stirpes, you're invoking a recursive algorithm. Each of your children represents a branch, and the money will be divided evenly amongst those branches, if anyone is left alive in them. So, for instance, you leave $100,000 to your five children per stirpes, but at the time of your death only three of them are alive. Poor Gregor had a tragic mushrooming accident, and left no children, so his share is divided amongst the other branches. Thus, the three who are alive each get $25,000. Unfortunately, your boy Ike died before you did... but he had five children. Four of them get $5000... but sadly, Ike's daughter Prudence was in the boat with him when the leeches attacked, so her two children wind up with $2500 apiece. The legal definition of per stirpes accounts for more than this. You could insert the definition in each place you wanted it, as if you were expanding a macro, but whether the result would be clearer is dubious at best. Even then, if your late son married, and then divorced, a woman with children, to they get shares? If second cousins from different branches married, had a daughter, and then died, does she inherit from both branches? A millenium of common law rulings clarifies a lot of unusual situations with per stirpes, but not the description of per stirpes you wrote. The very fact that phrases like per stirpes and corpus delicti are foreign ensures that those using them mean something very particular. They are, in any case, no more difficult than comparable English terms, such as "replevin" or "tortefeasor".

"replevin" or "tortefeasor". I laughed, and I hope you will to. According to google, it should be "tortfeasor" (no 'e') by a margin of 322,000 to 6 :p None the less, your point is well taken. In the context of modern law, the use of these pointed, sparse remnents of Latin are indeed a shorthand notation with very particular meanings. In my defense, I was thinking more about the historical practice whereby all legal documents had to be enscribed entirely in Latin. A practice that came about, or rather survived, mostly because it meant that only the privileded few could produce, read or hope to benefit from them. Not to mention that it meant that laywers were required to produce and interpret them. To a great extent, the latter practice still persists albeit that the Latin content is not the root cause. For example, 25 years ago, when I purchased my first flat, it cost me £365 in laywers fees to complete the purchase of a 15% share of a £12,000 property. 2 years ago, I self-conveyanced the transfer and sale of a let property (£200,000;my previous home), and completed the entire transaction for £65. The only complications were due to the other party's lawyer raising and re-raising a series of totally pedantic points, in what the other party and I concluded was an attempt to put him off of the purchase, simply because I was not using a lawyer. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

I recommend reading Refactoring by Martin Fowler.