Static Analysis for PHP

Posted by Nick Galbreath on August 10, 2012

Note: In 2020 we updated this post to adopt more inclusive language. Going forward, we’ll use “allowlist/blocklist” in our Code as Craft entries.

At Etsy we have three tiers of static analysis on our PHP code that run on every commit or runs periodically every hour. They form an important part of our continuous deployment pipeline along with one-button deploys, fast unit and functional tests, copious amounts of graphing, and a fantastic development environment to make sure code flows safely and securely to production.

Sanity and Style Checking

These checks eliminate basic problems and careless errors.

A obvious rule is that syntax errors never make it make into the source repository, let alone production. So “ php -l ” is run as pre-commit check on each changed file.

We don’t use PHP’s native templating where code and HTML is mixed in one file. Our PHP files are purely code, and output is rendered via another templating system. To make sure this works correctly, we check that there is no text before the initial <?php tag. Otherwise that text is sent to the client and prevents us from setting HTTP headers. Likewise we make sure that PHP tags are balanced and there is no trailing text as well.

There are also a number of basic coding style checks. Nothing particularly exotic but they help make the code readable and consistent across an ever growing engineering department. Most of these are implemented using CodeSniffer.

Formal Checks using HpHp

Facebook’s HipHop for PHP is a full reimplementation of the PHP/Apache stack including a compiler, a new PHP runtime, and a web server. To make your application run under it will require some serious surgery since it’s missing many modules you might depend on. However, it has a fantastic static analyzer which can be used independently. It does global analysis of your entire code base for:

Too many or too few arguments in a function/method call

Undeclared global or local variables

Use of return value of a function that actually returns nothing

Functions that have a required argument after an optional one

Unknown functions, methods, or base classes

Constants declared twice

and a few others. It’s doesn’t always track exactly the latest PHP versions, so you’ll have to allowlist some of the errors, but overall it’s been wildly successful with almost no false positives.

Why not do all this in using PHP’s built-in tokenizer token_get_all or CodeSniffer? HpHp can analyze 10,000 files in a few seconds, while the built-in function is a few orders of magnitude slower. Since it’s so fast, we can put the static analysis as a pre-commit hook that prevents bugs from even being checked in. Which it does almost every day.

Security Checks

We run ClamAV and antivirus check our source repository. Has it found anything yet? No (phew). But at 1GB/sec scan rate and the low low cost of free, there is no reason not do it. We aren’t worried so much about PHP code, but occasionally Word and PDF documents are put into the source repo. ClamAV also scans for URLs and matches them against the Google Safe Browsing database for malware and phish sites.

The other security checks are alerts and triggers for code reviews. Files are scanned for commonly misused or abused functions involving cryptography, random numbers, and process management. If new instances are found, a alert for a code-review is done. Many times the code is just fine , but sometimes adjustments are needed or the goal can be achieved without using crytopgrahy. Likewise we alert on functions that “take a password” such as ftp_login. We want to avoid passwords being checked into source control. Some files are sensitive enough that any change triggers an alert for full review.

A lot more detail can be found in the presentation Static Analysis for PHP first given at PHPDay, Verona, Italy on May 19, 2012. They put on a great conference and highly recommend it for next year.