A study of the effect of programming languages on software quality was reported in October's Communications of the ACM. In this most-read news item of 2017 we report some of its major findings relating to the prevalence of bugs.

Researchers Baishakhi Ray, Daryl Posnett, Premkumar Devanbu and Vladimir Filkov used data from GitHub for an large-scale empirical investigation into the ever present debate among programmers as to which language is best for a given task. They combined multiple regression modeling with visualization and text analytics, to study the effect of language features such as static versus dynamic typing and allowing versus disallowing type confusion on software quality.

The short version of their conclusions is given in the abstract:

Language design does have a significant, but modest effect on software quality. Most notably, it does appear that disallowing type confusion is modestly better than allowing it, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages.

The object of the exercise was to shed light on the idea that the choice of programming language choice affects both the coding process and the resulting programming with the emphasis being on static versus dynamic typing:

Advocates of strong, static typing tend to believe that the static approach catches defects early; for them, an ounce of prevention is worth a pound of cure. Dynamic typing advocates argue, however, that conservative static type checking is wasteful of developer resources, and that it is better to rely on strong dynamic type checking to catch type errors as they arise. These debates, however, have largely been of the armchair variety, supported only by anecdotal evidence.

For this investigation the team chose the top 19 programming languages from GitHub, adding Typescript as a 20th and identified the top 50 projects written primarily in each language. They then discarded any project with fewer commits than 28 (the first quartile) and any language used in a multi-language project with fewer than 20 commits in that language.

As the above table shows, this provided the study with 728 projects developed in 17 languages. The projects spanned 18 years of history and included 29,000 different developers, 1.57 million commits, and 564,625 bug fix commits.

Next the team defined languages classes, distinguishing between three programming paradigms: procedural, scripting and functional; two categories of type checking: static and dynamic; whether implicit type conversion is disallowed or allowed and managed memory as opposed to unmanaged:

Using keyword search for 10% of bug fix messages to train a bug classifier, the researchers identified both cause and impact for each bux-fix commit.

The first question to be addressed was "Are some languages more defect-prone than others?" and this was done using a regression model to compare the impact of each language on the number of defects with the average impact of all languages, against defect fixing commits:

At the top of this table are variables used as controls for factors that are likely to be correlated. Project age is included as older projects will generally have a greater number of defect fixes; the number of developers involved and the raw size of the project are also expected to affect the number of bugs and finally the number of commits is bound to. All four were found to have significant positive coefficients. The languages with the strongest positive coefficients - meaning associated with a greater number of defect fixes are C++, C, and Objective-C, also PHP and Python. On the other hand, Clojure, Haskell, Ruby and Scala all have significant negative coefficients implying that these languages are less likely than average to result in defect fixing commits.With regard to languages classes functional languages are associated with fewer defects than either procedural or scripting languages.

The researchers next turn their attention to Defect Proneness, the ratio of bug fix commits over total commits per language per domain and produce a heat map where darker colour indicates more prone to bugs:

From the above heat map they conclude that there is no general relationship between application domain and language defect proneness. However looking at the relation between language class and bug category indicates that:

Defect types are strongly associated with languages; some defect type like memory errors and concurrency errors also depend on language primitives. Language matters more for specific categories than it does for defects overall.

As this heat map shows a strong relationship between the Proc-Static-Implicit-Unmanaged class and both concurrency and memory errors. it also shows that Static languages are in general more prone to failure and performance errors, these are followed by Functional-Dynamic-Explicit-Managed languages, such as Erlang.

Summing up the findings, the conclusions of the report are:

The data indicates that functional languages are better than procedural languages; it suggests that disallowing implicit type conversion is better than allowing it; that static typing is better than dynamic; and that managed memory usage is better than unmanaged. Further, that the defect proneness of languages in general is not associated with software domains. Additionally, languages are more related to individual bug categories than bugs overall.

More Information

A Large-Scale Study of Programming Languages and Code Quality in GitHub

Related Articles

The Working Programmer's Guide To Language Paradigms

Type Systems Demystified

Weakly Typed Languages

Strong Typing

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.







Comments



Make a Comment or View Existing Comments Using Disqus





or email your comment to: comments@i-programmer.info