Grammar based fuzzing PDFs with Domato

Introduction

Welcome back to another fuzzing blog post. This time let’s talk about grammar based fuzzing! I will be writing about how I tried to fuzz a few PDF software such as Foxit and Adobe.

In order to do that, I used the following tools:

domato, grab it from its repo while it’s fresh!

Debenu Quick PDF Library, for my campaign the current version as of writing this is 17.11 but YMMV, please note that you need to register in order to request a trial.

BugId to help us triage any crashes/save crashers.

Your favourite PDF parser/software!

So here’s the idea: We will be installing the Debenu Quick PDF library and taking advantage of its SDK and functions. Why grammar based on a massive complex format such as a PDF you say? Remember that the PDF file format includes text, images, multimedia, JavaScript and has very complex parsing code. As such, although a smart guided fuzzer such as Checkpoint’s research can be used, we can take advantage of this library which provides a ton of features from messing with HTML objects to adding images, fonts, or even adding custom javascript!

Grammar Based Fuzzing

From the wiki: A smart (model-based, grammar-based,or protocol-based fuzzer leverages the input model to generate a greater proportion of valid inputs. For instance, if the input can be modelled as an abstract syntax tree, then a smart mutation-based fuzzer would employ random transformations to move complete subtrees from one node to another. If the input can be modelled by a formal grammar, a smart generation-based fuzzer would instantiate the production rules to generate inputs that are valid with respect to the grammar. However, generally the input model must be explicitly provided, which is difficult to do when the model is proprietary, unknown, or very complex.

In short, grammar based is aware of input structure, and instead of dumb fuzzing where we simply mutate bytes without having any knowledge of the target/file/network protocol specification we do have knowledge of the structure (such as the API presented here) and we will be generating test cases based on that specification.

There are many tutorials out there, but I recommend having a look at domato’s page, where you can fully understand how it works. As mentioned earlier, we will be creating a grammar so the function

int DPLDrawHTMLText(int InstanceID, double Left, double Top, double Width, wchar_t * HTMLText)

can be called with bogus; yet valid input such as the following:

DrawHTMLText ( 1 , 2 , 1 , "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" ); DrawHTMLText ( 0 . 285839975231 , 4 . 0 , 10000000 . 0 , "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" ); DrawHTMLText ( 5 . 0 , 4294967295 . 0 , 2147483647 . 0 , "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" ); DrawHTMLText ( 65 . 862385207 , 9 . 2399248386 , 8 . 01 963632388 , "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" );

Getting started with Debenu Quick PDF Library

Once you obtain your trial and install it, you need to register the ActiveX DLL. This can be done by either running %systemroot%\System32\regsvr32.exe targeting the 64-bit version of the DLL (DebenuPDFLibrary64AX1711.dll) or %systemroot%\SysWoW64\regsvr32.exe to register the 32-bit version (DebenuPDFLibraryAX1711.dll) While you are there make sure to note down the TRIAL_LICENSE_KEY.TXT as you’ll need it later for generating the files.

Exploring the library and reading the documenation we can see that the library offers a variety of bindings: From C#, C++, Delphi, Objective-C to Perl, PHP, VB6, VBScript and Visual Basic (.NET). If you want to experiment, go ahead and check this page! The library moreover, provides many function groups that can be targeted:

For my case, I ended up using the Visual Basic and Perl bindings. Once you create a grammar it’s very easy to modify the template and use another language, and that’s they beauty of grammar based fuzzing!

Let’s use this following Visual Basic example:

' Debenu Quick PDF Library Sample ' * Remember to set your license key below ' * This sample shows how to unlock the library, draw some ' simple text onto the page and save the PDF ' * A file called hello-world.pdf is written to disk WScript . Echo ( "Hello World - Debenu Quick PDF Library Sample" ) Dim ClassName Dim LicenseKey Dim FileName ClassName = "DebenuPDFLibraryAX1711.PDFLibrary" LicenseKey = "" ' INSERT LICENSE KEY HERE FileName = "hello-world.pdf" Dim DPL Dim Result Set DPL = CreateObject ( ClassName ) WScript . Echo ( "Library version: " + DPL . LibraryVersion ) Result = DPL . UnlockKey ( LicenseKey ) If Result = 1 Then WScript . Echo ( "Valid license key: " + DPL . LicenseInfo ) Call DPL . DrawText ( 100 , 500 , "Hello world from VBScript" ) If DPL . SaveToFile ( FileName ) = 1 Then WScript . Echo ( "File " + FileName + " written successfully" ) Else WScript . Echo ( "Error, file could not be written" ) End If Else WScript . Echo ( "- Invalid license key -" ) WScript . Echo ( "Please set your license key by editing this file" ) End If Set DPL = Nothing

Executing it with the 32-bit version of the DLL yields the following output:

Opening it with Foxit we can confirm that our file has been generated!

Success! Within few minutes, we managed to set up the library, get some sample code and generate a valid PDF. Let’s move on!

Creating the grammar

To demonstrate domato’s capabilities, let’s target the following sample function:

As you can see, this function expects four parameters: double Left, double Top, double Width, wchar_t * HTMLText)

As such, the SDK expects the following call:

DrawHTML(200.0, 400.0, 800.0,"my text")

Forming the above function call with domato and creating a grammar is straightforward, we simply need to define a symbol and assign its corresponding value. The value can be something like MAX_INT or MIN_INT interesting values, common values that they may lead to common signed/unsigned integer overflows/underflows or undefined behaviour.

< fuzzstring > = " AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA " < interestingint > = 32768 < interestingint > = 65535 < interestingint > = 65536 < interestingint > = 1073741824 < interestingint > = 536870912 < interestingint > = 268435456 < interestingint > = 4294967295 < interestingint > = 2147483648 < interestingint > = 2147483647 < interestingint > = - 2147483648 < interestingint > = - 1073741824 < interestingint > = - 32769 < fuzzdouble > = - 1.0 < fuzzdouble > = 0.0 < fuzzdouble > = 1.0 < fuzzdouble > = 2.0 < fuzzdouble > = 3.0 < fuzzdouble > = 4.0 < fuzzdouble > = 5.0 < fuzzdouble > = 10.0 < fuzzdouble > = 1000.0 < fuzzdouble > = 10000.0 < fuzzdouble > = 10000000.0 < fuzzdouble > = < double min = 0 max = 10 > < fuzzdouble > = < double min = 0 max = 100 > < fuzzdouble > = < largedouble > < fuzzdouble > = < interestingint > < LeftTopWidth > = < fuzzdouble > , < fuzzdouble > , < fuzzdouble > < HTMLText > = DrawHTMLText ( < LeftTopWidth > , < fuzzstring > )

Continuing, since we will be generating programming language code we have to include the !begin lines and !end lines keywords:

! begin lines $QP ->< HTMLText > ; ! end lines

Following the API specification and creating the HTMLText method can be formed within literally a few lines:

Creating the template.pl

Once you have the basic grammar, how are we going to call these functions within our binding? In fact, looking at previous github code, we simply need to provide the sample code we were given with slightly modifications as seen below:

From the screenshot above, you can see that the code within the <DPLFuzz> will get substituted with the $QP-><HTMLText> generated cases! Here’s a sample of how it looks like once domato has done its magic:

Now our next step is to create a file where it actually generates this grammar (called a generator). This can be achieved by using the already existing ones, such as Ivan’s generator.py, with a few modifications:

# Domato - main generator script # ------------------------------- # # Written and maintained by Ivan Fratric <ifratric@google.com> # # Copyright 2017 Google Inc. All Rights Reserved. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from __future__ import print_function import os import re import random import sys parent_dir = os . path . abspath ( os . path . join ( os . path . dirname ( __file__ ), os . pardir )) sys . path . append ( parent_dir ) from grammar import Grammar _N_MAIN_LINES = 50 _N_EVENTHANDLER_LINES = 25 def generate_function_body ( jsgrammar , num_lines ): js = jsgrammar . _generate_code ( num_lines ) return js def GenerateNewSample ( template , jsgrammar ): """Parses grammar rules from string. Args: template: A template string. htmlgrammar: Grammar for generating HTML code. cssgrammar: Grammar for generating CSS code. jsgrammar: Grammar for generating JS code. Returns: A string containing sample data. """ result = template handlers = False while '<DPLFuzz>' in result : numlines = _N_MAIN_LINES if handlers : numlines = _N_EVENTHANDLER_LINES else : handlers = True result = result . replace ( '<DPLFuzz>' , generate_function_body ( jsgrammar , numlines ), 1 ) return result def generate_samples ( grammar_dir , outfiles ): """Generates a set of samples and writes them to the output files. Args: grammar_dir: directory to load grammar files from. outfiles: A list of output filenames. """ f = open ( os . path . join ( grammar_dir , 'template.pl' )) template = f . read () f . close () jsgrammar = Grammar () err = jsgrammar . parse_from_file ( os . path . join ( grammar_dir , 'DPL.txt' )) if err > 0 : print ( 'There were errors parsing grammar' ) return for outfile in outfiles : result = GenerateNewSample ( template , jsgrammar ) if result is not None : print ( 'Writing a sample to ' + outfile ) try : f = open ( outfile , 'w' ) f . write ( result ) f . close () except IOError : print ( 'Error writing to output' ) def get_option ( option_name ): for i in range ( len ( sys . argv )): if ( sys . argv [ i ] == option_name ) and (( i + 1 ) < len ( sys . argv )): return sys . argv [ i + 1 ] elif sys . argv [ i ]. startswith ( option_name + '=' ): return sys . argv [ i ][ len ( option_name ) + 1 :] return None def main (): fuzzer_dir = os . path . dirname ( __file__ ) multiple_samples = False for a in sys . argv : if a . startswith ( '--output_dir=' ): multiple_samples = True if '--output_dir' in sys . argv : multiple_samples = True if multiple_samples : print ( 'Running on ClusterFuzz' ) out_dir = get_option ( '--output_dir' ) nsamples = int ( get_option ( '--no_of_files' )) print ( 'Output directory: ' + out_dir ) print ( 'Number of samples: ' + str ( nsamples )) if not os . path . exists ( out_dir ): os . mkdir ( out_dir ) outfiles = [] for i in range ( nsamples ): outfiles . append ( os . path . join ( out_dir , 'fuzz-' + str ( i ). zfill ( 5 ) + '.pl' )) generate_samples ( fuzzer_dir , outfiles ) elif len ( sys . argv ) > 1 : outfile = sys . argv [ 1 ] generate_samples ( fuzzer_dir , [ outfile ]) else : print ( 'Arguments missing' ) print ( "Usage:" ) print ( " \t python generator.py <output file>" ) print ( " \t python generator.py --output_dir <output directory> --no_of_files <number of output files>" ) if __name__ == '__main__' : main ()

Saving the actual test cases

Before we continue, notice how on the provided sample code (hello-world.vbs) this line was responsible for saving the file name: FileName = "hello-world.pdf" . This one is hardcoded and certainly does not suit us. In order to solve this issue, I’ve coded something very simple, a python script which finds the “placeholder” which is the hardcoded value XXX, and replaces it with fuzz-<num>.pdf :

BugId and you!

If you haven’t read already the Fuzz in sixty seconds article blog, please spend some time and see how BugId can be integrated into your fuzzing workflow. The idea is very similar, but instead of fuzzing browsers, we are looping through the generated cases one by one; I have modified some parts to reflect those changes as seen below:

Essentially, here we are executing Domato’s generator, replacing the XXX marker with the actual filename, executing the perl generated cases from domato, and finally saving the generated PDFs to our test folder.

With the above modifications, once the BAT file is executed, it gives us the following screenshot:

Putting it all together

With all these steps combined, let’s run the cmd file, and see how this goes:

Et voila! By using open source tools, and with some effort we are now able to fuzz not only Foxit software, but pretty much any PDF parser out there!

The results

Surprisingly, although I put in a lot of effort from creating the grammar to modifying BugId, unfortunately the only crashes I managed to get were some meaningless NULL pointer dereferences. You’d expect that such software has been fuzzed to death, however as j00ru once said according to the bug hunter’s law… there is always one more bug :)

Caveats

Interestingly, I initially used the Visual Basic bindings, however once a very large integer was passed to these methods, Visual Basic would complain and fail to generate the case as seen below:

Please note how it also informs the user in case the parameters or the assignments are wrong. That’s very handy and can be used to your advantage!

Conclusion

In this blog post we’ve covered a very brief introduction to grammar based fuzzing. We have used the Quick PDF library where we could apply this knowledge and have demonstrated how we can create a grammar from scratch. We have also fuzzed a sample function within the API generating structure aware test cases. Finally, we’ve used BugId to iterate over our cases in case any crashes were found. The sky is the limit, this type of fuzzing can be used not only for this specific library, but for every file format which is text based or even programming languages!

I hope you enjoyed as much as I did! As always, any ideas, comments, feedback is welcome!