Making static analysis easier

Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

It has been clear for some years now that static analysis tools have the potential to greatly increase the quality of the software we write. Computers are well placed to analyze source code and look for patterns which could indicate bugs. The "Stanford Checker" (later commercialized by Coverity) found a great many defects in a number of free software code bases. But within the free software community itself, the tools we have written are relatively scarce and relatively primitive. That situation may be coming to an end, though; we are beginning to see the development of frameworks which could become the basis for a new set of static analysis tools.

The key enabling changes have been happening in the compiler suites. Compilers must already perform a detailed analysis of the code in order to produce optimized binaries; it makes sense to make use of that analysis for other purposes as well. Some of that already happens in the compiler itself; GCC and LLVM can produce a much wider set of warnings than was once possible. These warnings are a good start, but much more can be done. That is especially true if projects can create their own analysis tools for project-specific checks; projects of any size tend to have invariants and rules of their own which go beyond the rules for the source language in general.

The FSF was, for years, hostile to the idea of making it easy to plug analysis modules into GCC, fearing that a plugin mechanism would enable the creation of proprietary modules. After some years of deliberation, the FSF rewrote the license exception for its runtime modules in a way that addressed the proprietary module worries; since then, GCC has had plugin module support. The use of that feature has been relatively low, so far, but there are signs that the situation may be beginning to change.

An early user of the plugin mechanism was the Mozilla project, which created two modules (Dehydra and Treehydra) to enable the writing of analysis code in JavaScript. These tools have seen some use within Mozilla, but development there seems to have slowed to a halt. The mailing list is moribund and the software does not appear to have seen an update in some time.

An alternative is GCC MELT. This project provides a fairly comprehensive plugin which allows the writing of analysis code in a Lisp-like language. This code is translated to C and turned into a plugin which can be invoked by the compiler. MELT is extensively documented; there are also slides from a couple of tutorials on its use.

MELT seems to be a capable system, but there do not appear to be a lot of modules written for it in the wild. One does not need to look at the documentation for long to understand why; the "basic hints" start with: "You first need to grasp GCC main internal representations (notably Tree & Gimple & Gimple/SSA)." MELT author Basile Starynkevitch's 130-slide presentation on MELT [PDF] does not get past the introductory GCC material until slide 85. MELT, in other words, requires a fairly deep understanding of GCC; it's not something that an outsider can pick up quickly. The lack of easy examples to work from is not helpful either.

More recently, David Malcolm has announced the release of a new framework which enables the creation of plugins as Python scripts which run within the compiler. His immediate purpose is to create tools for the development of the Python system itself; the most significant checker in his code tries to ensure that object reference counts are managed properly. But he sees the tool as potentially being useful for a number of other projects and even for prototyping new features for GCC itself.

At a first glance, David's gcc-python-plugin mechanism suffers from the same difficulty as MELT - the initial learning curve is steep. It is also a very young and incomplete project; David has, by his own admission, only brought out the functionality he had immediate need for. The analysis code seems more approachable, though, and the mechanism for running scripts directly in the compiler seems more natural than MELT's compile-from-Lisp approach. It may be that this plugin will attract more users and developers than MELT as a result.

Or it may just be that your editor, being rather more proficient in Python than in Lisp, naturally likes the Python-based solution better.

In any case, one conclusion is clear: writing static analysis plugins for GCC is currently too hard; even capable developers who approach the problem will need to dedicate a significant chunk of time to understanding the compiler before they can begin to achieve anything in this area. The efforts described above are a big step in the right direction, but it seems clear that they are the foundations upon which more support code must be built. It's hard to say when it will reach the tipping point that inspires a flood of new analysis code, but it's easy to say that we are not there yet.

GCC is not where all the action is, though; there is also an interesting static analysis tool which has been built with the LLVM clang compiler. Documentation of this tool is scarce, but it appears to be capable of detecting some kinds of memory leaks, null pointer dereferences, the computation of unused values, and more. Some patches have been posted to add a plugin feature to this tool, but they do not seem to have proceeded very far yet.

Back in May, John Smith ran the checker on several open source projects to see what kind of results would emerge. Those results have been posted on the net; they show the kind of potential problems that can be found and the nice HTML output that the checker can create. Some of the warnings are clearly spurious - always a problem with static analysis tools - but others seem worth looking into. In general, the clang static analyzer seems, like the other tools mentioned here, to be in a relatively early state of development. Things are moving fast, though; this tool is worth keeping an eye on.

Actually, that is true of the static analysis area in general. The lack of good analysis tools has been a bit of a mystery - given the number of developers we have, one would think that a few would scratch that particular itch. Your editor would not have minded living in a world with one less version control system but with better analysis tools. But the nature of free software development is that people work on problems that interest them. As the foundations of our static analysis tools get better, one can hope that more developers will find those foundations interesting to build on. The entire development community will benefit from the results.

