Web application scanning with skipfish

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

Web application security flaws are, as always, a hot area in security research, and it isn't surprising that a company which derives much of its income from the web would be interested in helping to secure it. Google has released several tools over the past couple of years—along with a Browser Security Handbook—many of which have been written by longtime security researcher Michal Zalewski. His latest release, skipfish, is an automated web application scanner that actively probes to find vulnerabilities.

Skipfish is a high-performance tool that can do several hundred to several thousand requests per second. Each of those requests tests a different kind of potential security flaw in an application. It spiders a web application and tries its tests on each of the pages it finds. For any complicated application, that will result in huge numbers of requests—and probably errors—but because of the post-processing it does to its results, it summarizes the reported problems in a fairly manageable way.

The code itself is 12,000 lines of C, which builds from a simple make as long as libidn is available to handle internationalized domain names. The program is command-line driven with top -like, continuously updating output (seen at right). Zalewski made some odd choices for colors in that output, making it hard to find a terminal color-scheme where it was readable. The recommended 100x35 terminal size is decidedly non-standard as well. Those nits aside, it is quite easy to get started with skipfish.

Understanding what one should do with skipfish is another story entirely. There is a large number of tests that are run, which are listed on the documentation page. That page also provides some examples of using the tool. As one might guess, there are a large number of options to handle different application needs like cookie values, HTTP authentication credentials, logout URLs to avoid, and so on. Before getting to that point, though, one must choose a dictionary.

Dictionaries in skipfish provide a starting point for the scanner to find additional URLs, files, and parameters that are used by the web application. There are four different dictionaries distributed with skipfish (minimal, default, extensions-only, and complete), and the tool will add what it learns to the dictionary as it runs. The dictionaries/README-FIRST file describes each dictionary as well as how the dictionaries are used. The minimal.wl dictionary is suggested as a good, lightweight starting point for skipfish experimentation.

And one gets the sense that a lot of experimentation will be required before any kind of skipfish-mastery is achieved. That said, a fairly short run of skipfish against a local development version of a reasonably complex web application turned up several obvious, though relatively minor, problems. There is also quite a bit more to go through in the report, so there are likely more problems awaiting discovery in even a small sample of skipfish's capabilities. One note of warning for those that have their application email with significant errors: either disable that, or you may get a chance to stress your mail server and/or be subjected to an inbox denial-of-service.

The report that skipfish produces is a summary of the problems, or potential problems that it found. It is in HTML format, that, somewhat amusingly, requires Javascript to be turned on to be useful. In fact, the "known issues" page mentions that due to "important security improvements" in Safari and Chrome, neither of those browsers will display the report via the file: protocol—"put the report in a local WWW root and navigate to http://localhost/... instead; or use Firefox".

In the report, various categories of problems found are listed with color-coded icons to estimate the severity of the problem. Categories can be clicked on which will expose a list of the pages that exhibited the problem. For each of those, an HTTP trace can be examined (example shown at left). While some of the categories are fairly obvious, some are a bit more obscure and will require some investigation to determine whether there is truly a problem or not.

Like most, if not all, automated scanners, there will be plenty of false-positives reported, which means that the results will have to be sifted to find the real problems. Skipfish is aimed at minimizing false-positives, but it will still require an iterative approach. Limiting the search to the "interesting" parts of the application, without missing something important in the portions deemed "unimportant" will be somewhat tricky to get right.

Most web applications have vast numbers of pages that are governed by the same underlying code, so picking a truly representative sample of one of those pages is important. Otherwise, skipfish will spend an awful lot of time repetitively testing the same kinds of things against "/ExampleContent/1", "/ExampleContent/2", and so on. The same problem exists for any automated web scanner, of course.

As the documentation points out, there are other tools that do similar jobs (Nikto and Nessus are given as examples), and skipfish is "not a silver bullet". But, clearly a lot of thought has gone into it, and Zalewski has an excellent track record as a finder of security vulnerabilities. Skipfish is certainly a tool that is worth a long look.