Contributed by jj on 2015-06-19 from the bug-ate-my-manpage dept.

Ingo Schwarze (schwarze@) writes in with an analysis of the issues found by afl in mandoc:

After realizing that I have nine topics for my BSDCan talk and that I can't cover them all in the depth they deserve, here are a few more details about afl and mandoc than I can't cover in the talk. Not a spoiler, there is still plenty of material for the talk!

1. Introduction

During the half year from November 2014 to May 2015, Jonathan Gray (jsg@) repeatedly ran the American fuzzy lop security-oriented fuzzer program to scrutinize the mandoc documentation parser and formatter program for bugs causing crashes and hangs. Most of these runs took multiple days of around-the-clock operation on a modern amd64 computer. All bugs found were quickly investigated and fixed.

A grand total of 45 issues was found. Fortunately, this number is not large enough to support a full-blown statistical analysis, and anyway, it is unknown whether, and in which ways, the mandoc codebase may or may not be representative for application software in general.

But unfortunately, the number of issues is much larger than i would have anticipated, so large that looking at it from a quantitative angle does make some sense, even if the results are not going to be representative (for systematical reasons) nor precise (for statistical reasons). But at the very least, they may inspire some hypotheses and remind us of some well-known recommendations and best practices, so i do consider the exercise worthwhile.

2. Analysis of root causes

In a first step, let's classify the root causes of the issues found by afl, proceeding from lower-frequency to higher-frequency classes.

2.1. 1 case out of 45: abuse of assert() for input validation

It is really needless to say that you shouldn't do that.

What is interesting here is that

mandoc was in general written with security in mind

by people not completely inexperienced or unskilled

and by the time of the afl audit, the code had already matured for half a decade

Yet an error of this type still persisted.

Admittedly, it is not always possible to make all the details perfect right away when starting a new project from scratch. But be aware that every stupid shortcut is *very* dangerous, it may persist for a long time even in an environment like OpenBSD that tends to be unusually hostile to bugs. So think twice about what you are doing, even in the beginning, avoid as much as possible giving in to hurry and laziness right from the start.

2.2. 1 case out of 45: powerful language = exploitable language

The roff(7) language is Turing complete. In particular, it supports iteration and recursion. Recursively defining a user-defined string, then expanding it, tricked the parser into an infinite loop, eventually dying from exhaustion of memory.

This teaches us two things:

Some vulnerabilities are extremely hard to fix. Attempts at fixing them can easily degenerate into limiting functionality. Even if you manage to fix them, if the language is powerful enough, chances are it's not very difficult to construct a subtler variant of the exploit that the fix doesn't catch.

So if you have a choice, keep the power and complexity of languages down: Make them powerful enough to do the job, but no more. Otherwise, it may be impossible to ever design a secure parser for them.

2.3. 3 cases out of 45: use after free of an allocated pointer

In all three cases, a pointer was allocated, a copy of the pointer saved somewhere else, the original pointer freed, and the copy later used at the other place.

This teaches three lessons:

When freeing a pointer, think twice whether any copies may exist, and act accordingly, for example NULLing out the copies after freeing the original.

When using a pointer that is not stored at its canonical place but a copy from somewhere else, think twice where the original pointer might get freed and make sure the original copy is still valid.

Try to avoid copying allocated pointers if possible, in particular if they may get freed before the end of the program.

2.4. 5 cases out of 45: string parser advances beyond the terminating \0

All of these cases were triggered by grossly incomplete syntax at the end of a string coming from user input.

Again, there are three lessons:

When advancing a parse pointer, be vary of incomplete syntax and always make sure you are not on a \0 before advancing to the next byte. Apparently, this rather trivial point is much more easily missed than i would have thought.

Use double care when advancing a parse pointer that will be passed back to the caller upon return. Mere read buffer overflows in the called subroutine can easily turn into even more dangerous write buffer overflows in the calling function.

If possible, avoid writing to parse buffers. If you have to manipulate the input data during parsing, consider copying the respective part, then modify that, and throw it away when done. That reduces the risk of write buffer overruns.

2.5. 6 cases out of 45: missing elementary input validation

The cases that came up were:

dividing by user input without checking for 0

incomplete checks for the validity of input bytes (specifically, in UTF-8 parsing, accepting some invalid sequences, resulting in forbidden bytes in the ASCII parse buffer, resulting in forbidden bytes in ASCII output buffers, eventually triggering an assertion - without that assertion, the consequences might have been even worse)

unreasonably large positive integers in user input (twice)

unexpectedly negative integers in user input

inconsistent claims of the user about the size of the provided data (hello, heartbleed...)

Of course, many of the more complicated problems in this list can - in one way or another - also be classified as input validation issues. But i was surprised that there were so many rather elementary cases of missing input validation. So be sure to *systematically* check your input code for all corner cases that you can think of, and repeat the check some time later or ask somebody else to repeat that particular check, even if you think you are not all that stupid. Apparently, such errors are easier to miss than one might think.

2.6. 14 cases out of 45: exceptions from invariants

Whenever writing code, you need to assume that your data structures maintain some invariants. If you would assume that the content of any variable could be invalid in any possible way at any point in the program, you would get utterly bogged down in error handling.

In mandoc, assuming some invariant holds, relying on it, and then finding out that there can be an exception to that particular invariant at some particular point in the code was one of the two main root causes for crashes found by afl. Most of these cases were related to syntax tree node structures.

In about half of the cases, it was general invariants supposedly applying to each and every instance of a given type that could get violated when specific macros were used in invalid ways.

The other half of the cases were specific invariants applying to specific macros only that could get violated when these macros were used in invalid ways.

A possible lesson is that you ought to explicitely write down all the invariants you intend to use, make sure they are really maintained at all places where the respective data structures are created and changed, and make sure you assume nothing about the content of the data except those explicitly specified invariants. I haven't found the time yet to do that for mandoc.

2.7. 15 cases out of 45: logic errors arising from complexity

This class is particularly disturbing: It is the top root cause for crashes found by afl in mandoc, one third of all. These bugs arose from code complexity that is hard or impossible to describe in terms of invariants of data structures. There were two subclasses.

For one subclass (block rewinding), causing 6 crashes, it turned out the complexity was needless. It was possible to reorganise this code in a much simpler way. Obviously, complexity that isn't present cannot cause logic oversights, so you should look at your code, identify the most complicated parts, and try to find ways to simplify them.

But the most disturbing subgroup, containing 8 out of 44 crashes and the only endless loop that wouldn't eventually die from memory exhaustion, is caused by the complexity of the implemented language itself (specifically, support for bad nesting of mdoc(7) blocks), not by a clumsy implementation. Even after considerable simplification, that code is still frighteningly complex and afl found two new crashes even after the simplification and associated partial audit was completed.

So, for writing secure software, the key really is to already design the user-visible functionality for simplicity and clarity. Complexity introduced on that level will reach all the way through to the implementation details. It is obvious that those issues will be hardest to fix, and the example of the mandoc afl audit demonstrates that these issues may even end up being quantitatively dominant.

3. Analysis of severities

Ordered by decreasing severity, the issues found had the following consequences:

2 write buffer overruns (both advancing a pointer past \0 and returning it)

3 read buffer overruns (all advancing a pointer past \0 without returning it)

2 use after free (both free a pointer, keep a copy, and use it)

2 unsigned integer underflows (by unchecked subtraction and unvalidated input)

2 cases of effectively infinite output (by not checking input for large numbers)

2 infinite loops (one by user input, one by complicated internal logic)

1 division by zero (by unvalidated user input)

11 NULL pointer accessess

20 assertion failures

This looks like a negative correlation between the difficulty to avoid classes of issues and the typical severeness of the consequences of failure. The easier to avoid, the more dangerous a failure appears to be. This indicates that it really pays off to get the obvious things right:

Be particularly careful when passing around parse pointers. Make sure you don't parse beyond \0. Watch out for use after free. Don't forget to validate any input. Be careful with arithmetic operations (/, *, -, +, etc.)

In this study, the hardest problems only caused assertion failures, NULL pointer accesses, and only one infinite loop. But don't rely on that. For one thing, i don't see any theoretical reason why issues arising from excessive complexity could not cause issues of arbitrary severity. Besides, the dominance of assertion failures over other modes of failure seen in this study might be a benefit of security-conscious coding practices being used in the first place, catching problems relatively early in most cases, while the average severity of vulnerabilities caused by similar issues in code written with less internal safeguards could be considerably more serious.

4. Bug distribution across modules

It is interesting to note that the number of bugs found per line of code is constitent in all modules with the average density across all modules, see the second appendix for some numbers. For almost all modules, the numbers agree within one standard deviation, and only for one, there is a difference of two standard deviations. Of course, statistics are small, so this is not a very strong statement.

In particular, contrary to what might have been expected, the largest and most complicated module, the mdoc(7) parser, doesn't have a higher density of bugs than other, smaller and simpler modules. The reasons are only accessible to speculation, i don't see any way to safely establish why this might be so. It could be, though, that the reason for the mdoc(7) parser to not be more bug-ridden is that it received more scrutiny in the past because it is used most of all the languages, so functional defects are more visible and annoying to lots of people compared to functional defects in less commonly used languages like tbl(7) and eqn(7).

5. Total number of bugs

In retrospect, my surprise at the number of bugs found may surprise you. Looking at studies like the following, finding 45 crashes and hangs in over 30.000 lines of code - less than 1.5/KLOC - doesn't seem all that out of proportion for a project with 1.5 part-time developers and no formal QA department. That, yet again, emphazises how people tend to over-estimate their own abilities (myself included). Unless you are a real genius, your code is likely to contain more bugs than you think, too. https://books.google.de/books?id=I-83BAAAQBAJ&pg=PA652 http://www.net-security.org/secworld.php?id=14871

In addition to that, Kristaps Dzonsons pointed out in private communication that cleanly architecturing the mandoc codebase was unusually difficult. The languages mandoc(1) parses are not designed to any strict paradigm but rather use a mixture of different syntactic and semantic concepts in pragmatic, sometimes historically evolved ways, and none of the languages has a formal definition. Existing manuals were often vague and sometimes contradictory. That certainly doesn't help strict parsing. Many parser requirements were only discovered after the parsers were already deployed in production, requiring changes to existing logic and to existing invariants. In particular, support for badly nested blocks was added a long time after the design of all basic data structures was complete, and even normal block rewinding rules weren't clear at all at the outset. Originally, there was no intention to include any low-level roff(7) support, and the whole roff(7) parser was edged in below existing higher-level parsing code at a later stage, when it became clear that many legacy manuals rely quite a bit on low-level roff features.

6. Conclusion

Even in a reasonably mature open-source project developed with security in mind, you may expect on the order of one remaining bug of some importance per thousand lines of code. For optimal return on investment while fighting the worst consequences, the present example suggests to watch out for buffer overflows in string parsers, use after free errors, missing input validation, and integer overflow in arithmetic operations. In addition to that, two particularly hard but quantitatively important measures are to explicitly specify invariants and to avoid excessive complexity of both interface design and implementation, even though those more difficult topics, in the present study, were only related to issues of smaller severity.

Appendix - chronological list of issues found by jsg@ with afl in mandoc ======================================================================== date root technical code location and fixed cause consequence brief problem description Nov 20 GI/decen ass term.c term_flushln left>right margin Nov 24 GI/exc NULL mdoc_term.c termp_sh_pre Sh w/o body (Sh Xo Sh) Nov 26 SI/exc ass mdoc_validate.c Sm with multiple args Nov 27 SI/exc ass mdoc_validate.c Db with multiple args Nov 27 compl/rew ass mdoc_macro.c Eo rewinding Nov 28 SI/exc NULL mdoc_term.c termp_nm_pre meta->name == NULL Dec 15 pass \0 r ov roff.c parse error, next try after EOL Dec 15 pass \0 w ov roff.c parse point one too far, passed back Dec 16 compl/rew corr/ass man_macro.c block closure in next-line Dec 17 compl/br corr/NULL mdoc_macro.c add item to breaking list Dec 18 input val /0 roff.c divide by user input Dec 18 GI/exc NULL mdoc_validate.c post_it item w/o body (It Xo no-Xc) Dec 18 SI/exc ass mdoc_validate.c post_bl Sm in Bl Dec 18 SI/exc corr/ass mdoc_macro.c add Ta to closed column list Dec 18 input val ass preconv.c bare BS in text from invalid UTF-8 Dec 19 ass=inval ass term.c font stack limit impl by assert Dec 19 compl/rew ass mdoc_macro.c rewinding Dec 19 compl/br corr/ass mdoc_macro.c only Bl end bef begin; only extend Bl with It Dec 22 compl/br loop mdoc_macro.c block break 2 others (Po Ao Pc Bo Pc Ac Bc) Dec 24 input val output man_term.c large indent or par dist Dec 24 input val unsi unde tbl_term.c number too wide for tbl cell Dec 24 input val unsi unde man_term.c large neg indent Dec 24 input val output mdoc_term.c large indent Dec 25 turing loop roff.c recursive expansion of user-def strings; limit length Jan 01 GI/exc NULL eqn_term.c missing args Jan 01 pass \0 r ov mandoc.c trailing \s- \s+ Jan 01 pass \0 w ov roff.c trailing backslash at EOF in cond; pass \0, returned Jan 29 GI/exc NULL tbl_layout.c line w/o cell Jan 30 GI/exc NULL tbl_data.c tblcalc not called if first span in .TP head Jan 30 *alloc ass term.c don't keep absolute pointers into area that is realloc'ed Feb 01 compl/br corr/ass mdoc_macro.c Fo Nm Fc; opening body of non-existant block Feb 01 compl/rew ass mdoc_macro.c rewinding Feb 03 compl/br ass mdoc_macro.c expl broken by It; close twice Feb 07 *alloc use free mdoc_validate.c delete node in tree iteration Feb 10 pass \0 r ov tbl_layout.c f layout mod at eol Feb 10 SI/exc NULL mdoc_validate.c St no-arg child-mac; empty St Feb 11 compl/rew corr/NULL mdoc_macro.c exp blocks close Nd; syntax tree corr Feb 11 SI/exc NULL mdoc_*.c Eo w/o tail Feb 12 compl/br corr/ass mdoc_macro.c Bo Bl It Bd Bc It; pending pointer corrupt Feb 12 compl/br ass mdoc_macro.c Bl It Bl -column It Bd El text text El; miss ENDBODY check Apr 29 *alloc use free tbl_layout.c last line empty Apr 29 compl/br corr/NULL mdoc_macro.c bogus MDOC_BROKEN by stray end macro Apr 29 SI/exc ass term.c \z\o'ab'c at BOL Apr 29 compl/br corr/ass mdoc_macro.c body broken, don't extend end May 01 compl/rew corr/ass mdoc.c/roff.c roff_node_unlink clobbers ROFF_NEXT Abbreviations of classes of root causes: - GI/decen - attempt to enforce general invariant decentrally - GI/exc - exception from supposedly generic invariant - SI/exc - exception from macro-specific supposed invariant - *alloc - use after free of an allocated pointer - ass=inval - abuse of assert() for input validation - compl/br - excessive complexity in language design (badly nested blocks) - compl/rew - excessive complexity in the implementation of macro rewinding - input val - missing elementary input validation - pass \0 - string parser advances beyond the terminating \0 - turing - Turing-complete language allows the user to trigger infinite loops Abbreviations of classes of technical consequences: - NULL - directly caused NULL pointer access - corr/NULL - data corruption indirectly causing NULL pointer access - ass - direct assertion failure - corr/ass - data corruption indirectly causing assertion failure - loop - infinite loop - output - excessive amount of output - r ov - buffer overflow (read access only) - unsi unde - unsigned integer underflow - use free - use after free - w ov - buffer overflow (may write beyond end of buffer) - /0 - division by zero Appendix - Bug distribution across modules ========================================== In the following tables, the numbers preceding the keywords are: - loc: number of lines of C code in the respective module - bugs: number of bugs found in the respective module - exp: expected number of bugs assuming a Poisson distribution with a mean of 1.5 bugs/kloc

The first four lines refer to functional modules, in particular separating parsers and formatters, and include all 45 bugs.

The second group of three lines refers to source languages, partitioning the same code in a different way, summing parsers and formatters for the each language. For mdoc(7), no line is included because the mdoc(7) code is fully dominated by the large mdoc(7) parser, the small mdoc(7) formatter doesn't change the picture. Similarly, the second group doesn't include issues of the low-level roff(7) language because there is no roff(7) formatter.

low-level parsers 5400 loc 7 bugs 8+-3 exp mdoc(7) parser 17782 loc 22 bugs 27+-5 exp other parsers 3430 loc 5 bugs 5+-2 exp formatters 5504 loc 11 bugs 8+-3 exp man(7) p/f 2543 loc 3 bugs 4+-2 exp tbl(7) p/f 1373 loc 5 bugs 2+-1.5 exp eqn(7) p/f 1248 loc 1 bug 2+-1.5 exp

Our thanks to Ingo for his detailed write-up and for working on making on manpages and the tooling around them so much better.