Software Readability / Source Code Readability

Are classical readability measures relevant to software readability?

If you download the Unix program style, you'll find it can run many English text readability metrics on text that you provide. We looked at the these measures and found that many are frequency based with some kind of log-scaling. This was remarkably similar to entropy (sum_i p(x_i) log p(x_i) where p(x_i) is the probability of a token appearing) This got us wondering if there was something analogous to software?

We stumbled upon the readability data of Buse et al. who asked 100+ people to rank the readability of 100+ code snippets. Buse et al. produced a few models of readability using this data, but we felt that the data could say more. We noticed that Halstead volume was a metric that seemed to measure the naive bit-wise encoding of tokens of code, and this was somewhat similar to entropy. Entropy on other hand was the average number of bits needed to represent a token. We thought we'd apply these measures of the source code in order to see how they related to human scores of readability.

It turned out great. Size + entropy or Size + Halstead volume tended to produce linear regression models that were simpler (more degrees of freedom) and performed better than the larger models of readability posed by Buse et al.

In the end we showed that code readability related to the size of the code and the information content of the code (entropy or Halstead volume). The more information in the code or the larger the code snippet the less readable the code was perceived to be.

Daryl Posnett and Abram Hindle described this work in their 2011 MSR paper: A Simpler Model of Software Readability pdf

@inproceedings{posnett2011readability, title = "A Simpler Model of Software Readability", year = "2011", author = {Daryl Posnett and Abram Hindle and Prem Devanbu}, booktitle = {Proceedings of the 8th International Working Conference on Mining Software Repositories, MSR 2011 (Co-located with ICSE), Waikiki, Hawaii, 2011, Proceedings}, }