Static analysis tools (SAST) are perhaps the most common tool for an AppSec team in the endless effort to move security to the left. They can be integrated into development pipelines in order to offer quick feedback to the developer to catch security bugs, resulting in faster remediation times and improved return on investment for developing secure software.

The SAST market is dominated by a small number of big players that charge high licencing fees for tools that do sophisticated analysis. One of the main features of such tools is the data flow analysis, which traces vulnerabilities from source to sink, potentially going across a number of files. More information about what such tools can do can be found in this Synopsis report. These tools typically can take hours or even days to complete a scan, and may use a large amount of memory. Furthermore, some of the tools have a large number of false positives, and potentially their own eccentricities. For organisations with deep pockets, the benefits outweigh the costs.

There are lower cost, more efficient SAST tools, but they typically do lack the sophistication in terms of quality of security bug findings. Perhaps the most popular low-cost alternative is SonarQube, which is quite popular amont developers. SonarQube does very quick scans, but lacks the data flow analysis capability that the more expensive tools have. As a consequence, security findings are represented in one place (not source-to-sink analysis).

In this blog, we’re going to talk about grep-based code scanning, which is an old fashioned way of SAST scanning — but we argue that good grep based scanning can do reasonably well compared to expensive SAST tools in terms of quality of bugs found. While grep based-scanning in the form we present it cannot do data flow analysis, we claim that the major shortcomings in what we miss are not extensive. We also show examples of things we found that our commercial tool at the time missed. While it is certainly capable that the commercial tools can build up their rule sets to catch everything we catch, it falls upon them to add those rules.

I have experience we three leading SAST tools, but considerable experience with only one of them. I’m not going to call out any names of tools, but I do generally see room for improvement in them.

In order to make this blog the most beneficial to the readers, we will provide a starting pack of grep-based rule sets that we have used. The tools we have developed are not in a state that they can be open sourced, and we do not have the time to fix them up. However, the rules we provide open the door for somebody with that time availability to do so. It takes very little effort to build a PoC in this way.

This is joint work with Jack Healy.

Tool philosophy

False positives seem part of the game in the SAST market. While I have heard of one tool trying to put considerable focus into removing false positives, I have not experienced such features in the tools that I have worked with — at least not “out of the box.” As a consequence, many tools seem to require manual inspection of the findings to remove the junk and focus on the issues that seem real. In our grep-based scanner, we play by the same rules: we assume that the results are going to be inspected manually, and the person running the tool has a basic understanding of security vulnerabilities and how to identify them in code. Thus, grep-based code scanning assumes a certain level of security competence and language expertise by the person running the code scan.

We take the philosophy that is okay to have a large number of false positives if we are looking at something that is very serious and too often a problem — for example, SQL injection. We take the philosophy that it is not okay to have a lot of false positives if the issues is not that serious (by CVSS rating). Generally, we are not trying to find everything, but instead trying to maximise the value we get out of our code scanning under a manual inspection time constraint. It’s worth spending extra time for catching the big fish, not for the small ones.

Depending upon how the tool is used, we may under some conditions want to report only things that are high confidence, and ignore everything else. A common example is in a CICD environment, where you want to break a build only if you are quite confident that something is wrong. So in our case, we have an option to only report very high confidence results. We also allow developers to flag false positives similar to how SonarQube does it, thus making it developer friendly.

What do we lose by not having data flow analysis?

While the expensive tools have the advantage of data flow analysis, our belief is that we can get a lot of what they get without it. The biggest exception is cross-site scripting, which our grep-based scanner does not find. We consider alternative tooling such as DAST to be an option for finding this serious issue.

Another example of an important issue that we are unlikely to find is logging of sensitive information. Typically that does require some type of data flow analysis — you’re almost never going to find code that directly does something direct like Log(password).

But something like SQL injection, we have found a number of times. The reason being — we report every SQL query we find. Yes, it’s noisy, but it takes a code reviewer at most 10 seconds of manual inspection time to see if there is a string concatenation in the query, and if that’s the case, it may be vulnerable. If not, the reviewer can quickly dismiss it as a false positive. Obviously more sophisticated code scanning can make this less noisy (example: regular expression code scanning), but this is our starting point.

Certainly tools with data flow analysis are more efficient for something important like SQL injection. However the benefit from that efficiency is often lost when weighed against how much manual time is lost on false positives for issues that are not that important and/or low confidence. Furthermore, some tools do not do so well in terms of rating the seriousness of a vulnerability: they can be over-zealous in calling too many issues critical or high risk (though this may be configurable). Because of this, the overall usage of our tool — at least for us — seems to require similar manual effort as some of the popular commercial tools despite the simplicity of the design.

Examples of things we found that our main commercial tool missed

Because of the simplicity of our design and because we assume that a competent reviewer is going to look a the code, we were able to put in rules that help the reviewer find things that big commercial tools tend to miss.

One very fruitful area in finding security bugs is bad crypto. As mentioned in Top 10 Developer Crypto Mistakes, bad crypto is more prevalent than good crypto. Part of the reason for this is bad crypto APIs. Another part of the reason is that the tools don’t help developers catch mistakes. Vendors of these tools could find a lot more problems if they were to listen to cryptographers. Our tool flags any use of “crypt” and any other key words that are indicative of bad cryptography, such as rc4, arcfour, md5, sha1, TripleDES, etc…. We go into more details about crypto bugs in the rules sets.

Another issue that we find surprisingly often that our main commercial tool missed was http://. Of course it should always be https://. Similarly one can look for ftp:// and other insecure protocols.

There are also a number of dangerous functions that we flag, and often these open up discussions about coding practices. For example, the Angluar trustAsHtml or React dangerouslySetInnerHTML. In our case, we had to educate the developers on the right approach to handle untrusted data, because they thought it was safe to sanitise data prior to putting in the database and then use trustAsHtml upon display. We did not agree with this coding style and worked to change it.

One thing that we like to always look at is MVC controller functionality because sometimes that’s where input validation should be applied (in other times you might do it in the model). We often find a lack of input validation, which does not necessarily imply a vulnerability but does show the need for improved defensive programming. Even more interesting, one time when we looked at a controller, we found that it was obviously not doing access control checking that it should be doing, and this was one of the most major findings that we caught. Normally access control problems are not something you would find with a SAST tool, but our differing approach to how to use the tool made it work for us.

Sample rules: starter pack

Rules can go on forever, so here I want to just focus on some of the more fruitful ones to get you started. Generally we do a case insensitive grep to find the key word, and then we grep out (“grep -v”) things that got in by chance but are not actually the key word we are looking for. We call these “exceptions” to our rules. For example “3des” is indicative of the insecure triple DES encryption algorithm, but we wouldn’t want to pull in a variable like “S3Destination”, which our developers might use for Amazon S3 storage (nothing to do with triple DES). We learn the exceptions over time.

A very common grep rule that we need to make exceptions for is “http:”. While we are looking for insecure transport layer communications, the problem is that “http:” is also used in XML namespaces that have nothing to do with transport layer security. This means we have to grep out things like “xmlns”. Also, writing logic to ignore commented code helps a lot — otherwise you will pull in many false positives like license example (http://www.apache.org/licenses) or stackoverflow links, which often occur in comments.

It may look like we omitted some obvious rules in some cases, but sometimes there are subtle reasons. For example, even though we do a lot of crypto rules, we don’t do an “ECB” (insecure electronic code book mode of operation) search simply because it is too noisy. Those characters “E”, “C”, and “B” all look like hexadecimal numbers, and we often got false positives from hexadecimal values in code. It was deemed not fruitful enough to keep such a rule (as explained in our design philosophy above).

We also try to group our rule by language to reduce false positives. However some rules apply to all languages. In our case we are primarily working on web software so we don’t specify languages like C/C++, etc…

Below is the starter pack of rules. Some rules are clearly more noisy than others — people can pick and choose the ones they want to focus on.