Notes from Underfoot

Style is Substance

by Ken Arnold

October 7, 2004



Summary

... wherein I decide that, with winter a-cumin in, it's time for a lot of heat, so I venture into the programming language equivalent of TV Wresteling: coding style...


I'm sure this will cause me no end of grief, but I'm about to confess publicly here that I am a heretic. (In this particular case I'm only confessing to heresy in computer language design. Other heresy confessions will have to await another time.)

I'll state it right out: For almost any mature language (C, Java, C++, Python, Lisp, Ada, FORTRAN, Python, Smalltalk, sh, javascript, ...) coding style is an essentially solved problem, and we ought to stop worrying about it. And to stop worrying about it will require worrying about it a lot first, because the only way to get from where we are to a place where we stop worrying about style is to enforce it as part of the language.

Yup. I'm really saying that. I'm saying that, for example, the next ANSI C update should define the standard K&R C programming style into the language grammar. Programs that use any new features should be required to be in K&R style or be rejected by the compiler as syntactically illegal.

I'm gonna pause here. When I was talking about this on a mailing list I had to go through this several times. People didn't quite get me because they didn't quite believe someone was saying this. I mean this quite literally. For example, I want the next C grammar to define that a space comes between any keyword and an opening parenthesis. " if (foo) " would be legal, but " if(foo) " would not. Not a warning, not optionally checked, but actually forbidden by the language parser. Flat out illegal. Can't compile.

Here is the logic in its most simple form:

Premise 1: For any given language, there are one or a few common coding styles. Typically one is set by the founder(s) or earliest documenter, but others will evolve over time. But even for C there are only a handful of commonly used styles, ignoring trivial variations. Premise 2: There is not now, nor will there ever be, a programming style whose benefit is significantly greater than any of the common styles. Get real. Discovering a style that improves your productivity or code quality by more than a few percent over the common styles is about as likely as discovering a new position for sex. [Astronauts need not apply, unless they want to invite me along.] Premise 3: Approximately a gaboozillion cycles are spent on dealing with coding style variations. Think about it: How many reformatter/pretty-printers projects are there on sourceforge alone? How many options does any given IDE (including emacs) have for formatting code? How many cycles are spent deciding on a style, documenting it, enforcing it, and updating it? How many history logs for CVS, Clearcase, etc., have a lot of noise from varying format changes? How many brain cycles are spent on arguing about this topic? Premise 4: For any non-trivial project, a common coding style is a good thing. I really think this is pretty well agreed on. How constraining the style is varies, but having several folks hacking on the same code with conflicting coding styles introduces more pain than any single style imposes on any single person. Every project I know of has a style, if not spelled out at least by custom. Conclusion: Thinking of all the code in the entire world as a single "project" with a single style, we would get more value than we do by allowing for variations in style.

Think of it. All the programming examples in one style. Web pages, journals, papers, emails use one style. Reformatting issues gone. Arguments over whose style is better gone. Reformatters a quaint historical artifact.

And most of all: No More Style Wars! Really! Think of all those cycles that we could then plow into something more productive, like vi/emacs wars! Or world peace! Or a really good chocolate cookie recipe! You choose!

Of course, you will never enforce any style globally unless people have literally no choice. How many C programmers use during as a stylistic preference to while ? (Preprocessor abusers need not apply. On second thought, please do: We need to identify you for our eugenics program.) Or skip the parens around an if clause? They don't because they can't. You know they would if they could. The thing that stops these "personal styles" is that the C compiler will not accept them. If you can't compile your code you fix it. It's so simple it's stupid. And therefore it works.

So I want the owners of standards for established languages to take this up. I want the next version of these languages to require any code that uses new features to conform to some style. Let the standards committees gnash and snarl and wring their hands over which of the common styles is the winner. Sell tickets. We all get to comment and the langauge lawyer standards geeks decide. We know where they'll go -- C will go to K&R; C++ will go with Bjarne's style (excuse me while I cringe); Java will go with the Sun style as shown in the language spec and most of the Java books from Sun (including mine); Lisp style is almost already set mostly in stone. Perl is a vast swamp of lexical and syntactic swill and nobody knows how to format even their own code well, but it's the only major language I can think of (with the possible exception of the recent, yet very Java-like C#) that doesn't have at least one style that's good enough.

Some things are either uncheckable (Hungarian notation, using "get" and "set" method prefixes) or not widely agreed upon (such as import/#include ordering). These can be left for future standards. Or not. The owners of the the standard decide. But whatever they do, they should set the style and build it into the actual freakin' grammar.

This heresy encompasses one major sub-heresy: That whitespace should matter.

Most style rules have to do with the placement of whitespace: newlines before or after curly braces, white space around operators or not, etc. So I'm saying that languages should indeed care about whitespace. A lot. Yet one of the things we supposedly learned from languages like FORTRAN was that whitespace should only matter to separate tokens. This was accepted wisdom because FORTRAN had columns -- the first five columns were reserved for a statement number or a comment indicator, the sixth column with any character in it meant a continuation of the previous line, seven through 72 where language statements, and the last eight were reserved for sequence numbers useful for re-ordering the card deck if it was dropped. Yes, I mean cards, the physical type, with rectangular holes. Also, DO10I=1,100 was the same as DO 10 I = 1, 100 because DO was a keyword followed by a number and so the space wasn't required, although it made DO10I=1 interesting, as that assigned 1 to a variable named DO10I .

I lived this ugliness, so I feel the pain. But all it really proved was that FORTRAN's whitespace rules sucked. License to put whitespace anywhere has proven to be expensive and cycle-wasting in practice. We're not editing on punched cards anymore, and reformatters are as common as spam. We can use this power -- type code however you want to but before you compile it, reformat it (or reformat on the fly, whatever).

In the end, this requires only that editors and IDEs used by coders will let the user type stuff and it will make it look right. This is basically just reformatting on the fly, which many editors already do. We don't need you to type zero, one or seventeen spaces between an if and its open paren, we just need the editor (assuming K&R C) to put exactly one space there. And getting even this right will be easier if there is only one style to worry about. It's one of those things that those reformatting or style adapting cycles can go to.

Basically, freedom for formatting style has proven extremely expensive, and does not deliver much value for cost. Think of it this way: Could you honestly fill in the following:

I, [insert name here], know of a programming style whose impact on programmer productivity and/or program quality is large enough that my freedom to choose it over any major common style validates the programmer productivity and investment used industry wide in arguing about style, imposing style, and reformatting to match styles. That style is [insert style description here] and its benefits are [insert benefits here].

Or even the less demanding:

I, [insert name here], know of a programming style whose impact on programmer productivity and/or program quality is >= 5% when compared to any major common style. That style is [insert style description here] and its benefits are [insert benefits here].

I think you will mostly get snickers even suggesting that this can be filled out. And even on a single project you can spend 5% on coding style issues -- mostly up front, but it's a continuous bleeding as style wars crop up over things as yet undefined; new tools are suggested, written, or integrated; people forget to put it in the right style and then it gets corrected and pollutes the change history; training new people in the style; disciplining engineers who are uncooperative; and just general bitching, whining and moaning.

So 5% doesn't even touch the opportunity and other costs associated with not having a mandated style across all the code in the world.

Or if you prefer the question the other way 'round: What benefits do we get from freedom of style that outweighs the cost we pay for it?

To me the answer seems obvious: Nowhere near enough.

Talk Back!

Have an opinion? Readers have already posted 44 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Ken Arnold adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Ken Arnold is a recognized loose cannon in the software business, whose previous fusilades include being an inventor of Jini, designing JavaSpaces, writing books on Java and distributed systems, helping design CORBA 1.0, and (while at Berkeley on the BSD project) the curses library package, co-authoring rogue, and generally enjoying himself. His interests include designing APIs and programming languages using general principles of human factors design because of his radical hypothesis that programmers are human, and other applications of this same principle to software design, management, and production.

This weblog entry is Copyright © 2004 Ken Arnold. All rights reserved.