In the odd case that you are an experienced programmer who doesn’t have a preference over using camel case or underscores for identifiers, try making up your mind now. Try choosing independently of (language) convention, habit or type of the identifiers. If you are a Lisper and like dashes, just vote for your next favorite.



if ( thisLooksAppealing ) { youLikeCamelCase = true; votePoll( camelCaseFormatting ); } else if ( this_looks_appealing ) { you_like_underscores = true; vote_poll( underscore_formatting ); } Take Our Poll

Did you vote? Good! Now it’s my turn to do some work, as I will try to take you through a semi-scientific explanation to prove which formatting is best suited for programming.

I wouldn’t have written this post, if I hadn’t read Koen’s Tao of Coding. As an ex-colleague he converted me to the underscores camp. The trigger to write this post was when reading a reply on a formatting discussion.

“honestly the code is easier to read” Opinion or fact?

It inspired me to look for scientific resources. Surely, studies must have been done right? As it turns out, not too many, but I found one. But first, in case you never had this discussion, … the usual opinions, and rebuttals. If you are looking for the facts, skip to round 3.

Round 1: The opinions Pro underscores Underscores resemble natural writing the most, and thus are more readable . Spaces are simply replaced with underscores. Extreme example: isIllicitIgloo vs is_illicit_igloo.

the most, and thus are . Spaces are simply replaced with underscores. Extreme example: isIllicitIgloo is_illicit_igloo. Consistency with constants . Underscores are still needed in all-caps naming conventions. E.g.: THIS_IS_A_CONSTANT

. Underscores are still needed in all-caps naming conventions. E.g.: THIS_IS_A_CONSTANT Abbreviations could still be kept uppercase easily. E.g.: TCP_IP_connection vs tcpIpConnection



could still be kept uppercase easily. E.g.: TCP_IP_connection tcpIpConnection Classes can be kept camel case, giving a clearer difference between them and identifiers/functions. E.g.: CamelRider.ride_camel() vs CamelRider.rideCamel().

Thank you, Yossi Kreinin, for the last two points, as discussed in IHateCamelCase.



Pro CamelCase

Camel case is easier to type , and underscores are hard to type.

, and underscores are hard to type. Camel case makes paragraphs easier to read . my_first_variable=my_second_variable-my_third_variable vs myFirstVariable=mySecondVariable-myThirdVariable

. my_first_variable=my_second_variable-my_third_variable myFirstVariable=mySecondVariable-myThirdVariable Camel case is shorter .

. Camel case is used by convention in a lot of major languages and libraries. (You weren’t allowed to use this argument when voting!)

Round 2: Rebuttals

Anti underscores

Underscores are ugly, camel case is more elegant.

Anti CamelCase

Underscores aren’t that hard to type. Seriously, as a programmer it is your duty to learn blind typing with all ten fingers . Learn qwerty , and save yourself the trouble of having to use the exotic AltGr button.

. Learn , and save yourself the trouble of having to use the exotic AltGr button. Use whitespaces and an IDE with color coding to easily see the difference between operators and identifiers.

Round 3: The facts

When reading the abstract of the research paper, it seems science is on the camel case side.

Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style.

Existing research

Natural language research in psychology found that replacing spaces with Latin letters, Greek letters or digits had a negative impact on reading. However, shaded boxes (similar to underscores) have essentially no effect on reading times or on recognition of individual words. Removing spaces altogether slows down reading 10-20%.

Experiment setup

Empirical study of 135 programmers and non-programmers. Subjects have to correctly identify a matching phrase (maximum of 3 words long) out of 4 similar phrases. The important variables researched:

Correctness : whether the subject identified the correct phrase.

: whether the subject identified the correct phrase. Find time : time taken to identify the phrase.

: time taken to identify the phrase. Training: how being a programmer affects the performance.

Results

Camel casing has a larger probability of correctness than underscores. (odds are 51.5% higher) On average, camel case took 0.42 seconds longer, which is 13.5% longer. Training has no statistically significant impact on how style influences correctness. Those with more training were quicker on identifiers in the camel case style. Training in one style, negatively impacts the find time for other styles.

The paper concludes:

Considering all four hypotheses together, it becomes evident that the camel case style leads to better all around performance once a subject is trained on this style. Training is required to quickly recognize such an identifier.

Discussion

Personally, I find the conclusion flawed for a couple of reasons.

Correctness isn’t of much importance when programming. Correctness refers to being able to correctly see the difference between similar identifiers. E.g. startTime vs startMime. This is not a common scenario when programming. Additionally, with modern IDE’s you have auto completion and indications when a written identifier doesn’t exist. This makes me believe results (1) and (3) are irrelevant. As a sidenote, I believe the correctness of camel casing is due to the slowness of the reading. When you need to take more time to read something, you will read it more accurately.

When discussing possible threats to validity they mention the following. “Essentially all training was with camel casing, it would be interesting to replicate the study with subjects trained using underscores.” Result (4) and (5) just seem unfair when taking this into account. Isn’t it obvious that people who are used to camel case are better at it. Additionally, it has a proven negative impact on the “find time” for underscores.

So, only the slowness of reading camel case (2) remains. It takes 13.5% longer on average to read a camel case identifier than an underscore identifier. Multiply this for entire code blocks, and you have my semi-scientific opinion on the war between camel case and underscores!

For those brave enough to stick around until the end, what is your opinion now? Again, try choosing independently of convention, habit or the type of identifiers.P.s.: If you still believe camel casing to be more appropriate for programming, it would be interesting to leave a comment with argumentation. 😉 I could update “Round 2: the rebuttals” to include your comments to make the article more balanced.

Take Our Poll

Update: I’ve discussed a follow-up study in a new post. They reproduced the study and measured it takes 20% longer on average to read a camel case identifier, and additionally using eye tracking they identified camel case identifiers require a higher average duration of fixations.

Share this: Twitter

Facebook

Like this: Like Loading...