Static code analysis

By Andrey Karpov on 03/12/12 03:12:00 am

Static code analysis is the process of detecting errors and defects in software's source code. Static analysis can be viewed as an automated code review process. Let's speak on the code review now.

Code review is one of the oldest and safest methods of defect detection. It deals with joint attentive reading of the source code and giving recommendations on how to improve it. This process reveals errors or code fragments that can become errors in future. It is also considered that the code's author should not give explanations on how a certain program part works. The program's execution algorithm should be clear directly from the program text and comments. If it is not so, the code needs improving.

The code review usually works well because programmers can notice errors in somebody else's code much easier than in their own's. To learn more about the code review method, please see a wonderful book "Code Complete" by Steve McConnell [1].

The only crucial disadvantage of the joint code review method is an extremely high price: you need to gather several programmers at regular times to review a fresh code or re-review a code after recommended changes have been applied to it. The programmers also need to have a rest regularly, as their attention might quickly weaken if they review large code fragments at a time, so there will be no use of code review then.

It appears that - on the one hand - you want to review your code regularly. On the other hand, it is too expensive. Static code analysis tools are a compromise solution. They can tirelessly handle source texts of programs and give recommendations to the programmer on what code fragments he/she should consider. Of course, a program can never replace complete code review performed by a team of programmers, but the ratio use/price makes usage of static analysis a rather good practice exploited by many companies.

The tasks solved by static code analysis software can be divided into 3 categories:

Detecting errors in programs. We will speak on that in detail further. Recommendations on code formatting. Some static analyzers allow you to check if the source code corresponds to the code formatting standard accepted in your company. We mean control of the number of indents in various constructs, use of spaces/tabs and so on. Metrics computation. Software metrics are a measure that lets you get a numerical value of some property of software or its specifications. There are lots of various metrics that can be computed with the help of certain tools.

There are also other ways of using static code analysis tools. For instance, static analysis can be used as a method to control and teach new workers who are not yet familiar enough with the company's programming rules.

There are a lot of commercial and free static code analyzers. The Wikipedia website contains a large list of static analyzers: List of tools for static code analysis. The list of languages static code analyzers support is great too (C, C++, C#, Java, Ada, Fortran, Perl, Ruby, ...).

Like any other error detection methodology, static analysis has its strong and weak points. You should understand that there are no ideal software testing methods. Different methods will produce different results for different software classes. Only combining various methods will enable you to achieve the highest quality of your software.

The main advantage of static analysis is this: it enables you to greatly reduce the price of eliminating defects in software. The earlier an error is detected, the lower the price to fix it. Thus, according to the data given in the book "Code Complete" by McConnell, fixing an error at the stage of testing costs ten times more than at the code writing stage:

Figure 1. An average cost of fixing defects depending on the time they have been made and detected (the data for the table are taken from the book "Code Complete" by S. McConnell).

Static analysis tools allow you to quickly detect a lot of errors of the coding stage, which significantly reduces the cost of development of the whole project. For example, the PVS-Studio static code analyzer can run in background right after compilation is done and tell the programmer about potential errors if there are any (see incremental analysis mode).

Other static code analysis' advantages are the following:

Full code coverage. Static analyzers check even those code fragments that get control very rarely. These code fragments usually cannot be tested through other methods. It allows you to find defects in exception handlers or in the logging system. Static analysis doesn't depend on the compiler you are using and the environment where the compiled program will be executed. It allows you to find hidden errors that can reveal themselves only a few years later. For instance, these are undefined behavior errors. Such errors can occur when switching to another compiler version or when using other code optimization switches. Another interesting example of hidden errors is discussed in the article "Overwriting memory - why?". You can easily and quickly detect misprints and consequences of Copy-Paste usage. Detecting these errors through other methods is usually a too inefficient waste of time and efforts. It's a pity when you have spent an hour on debugging just to find out that the error is in an expression of the "strcmp(A, A)"-kind. People usually don't remember such troubles when discussing typical errors. But practice shows that it takes much time to detect them.

Static code analysis' disadvantages

Static analysis is usually poor regarding diagnosing memory leaks and concurrency errors. To detect such errors you actually need to execute a part of the program virtually. It is too difficult to implement. Such algorithms take too much memory and processor time. Static analyzers usually limit themselves to diagnosing simple cases. A more efficient way to detect memory leaks and concurrency errors is to use dynamic analysis tools. A static analysis tool warns you about odd fragments. It means that the code can actually be quite correct. It is called false-positive reports. Only the programmer can understand if the analyzer points to a real error or it is just a false positive. The necessity to review false positives takes work time and weakens attention to those code fragments that really contain errors.

Errors detected by static analyzers are rather diverse. Here is, for example, the list of diagnostics implemented in the PVS-Studio tool. Some analyzers focus on a certain area or type of defects, while others support certain coding standards, for instance, MISRA-C:1998, MISRA-C:2004, Sutter-Alexandrescu Rules, Meyers-Klaus Rules, etc.

The sphere of static analysis is actively developing; new diagnostic rules and standards appear, while some rules get obsolete. That's why there is no sense in trying to compare analyzers on the basis of defects they can detect. The only way to compare tools is to check them on a set of projects and count the number of real errors they have found. This subject is discussed in detail in the article "Difficulties of comparing code analyzers, or don't forget about usability".

Examples of errors detected by static code analysis

Myths about static analysis

References

Return to the full version of this blog

Copyright © UBM Tech, All rights reserved