In this series of blog posts, I’ll explain how I decrypted the encrypted PDFs shared by John August (John wanted to know how easy it is to crack encrypted PDFs, and started a challenge).

Here is how I decrypted the “easy” PDF (encryption_test).

From John’s blog post, I know the password is random and short. So first, let’s check out how the PDF is encrypted.

pdfid.py confirms the PDF is encrypted (name /Encrypt):

pdf-parser.py can tell us more:

The encryption info is in object 26:

From this I can conclude that the standard encryption filter was used. This encryption method uses a 40-bit key (usually indicated by a dictionary entry: /Length 40, but this is missing here).

PDFs can be encrypted for confidentiality (requiring a so-called user password /U) or for DRM (using a so-called owner password /O). PDFs encrypted with a user password can only be opened by providing this password. PDFs encrypted with a owner password can be opened without providing a password, but some restrictions will apply (for example, printing could be disabled).

QPDF can be used to determine if the PDF is protected with a user password or an owner password:

This output (invalid password) tells us the PDF document is encrypted with a user password.

I’ve written some blog posts about decrypting PDFs, but because we need to perform a brute-force attack here (it’s a short random password), this time I’m going to use hashcat to crack the password.

First we need to extract the hash to crack from the PDF. I’m using pdf2john.py to do this. Remark that John the Ripper (Jumbo version) is now using pdf2john.pl (a Perl program), because there were some issues with the Python program (pdf2john.py). For example, it would not properly generate a hash for 40-bit keys when the /Length name was not specified (like is the case here). However, I use a patched version of pdf2john.py that properly handles default 40-bit keys.

Here’s how we extract the hash:

This format is suitable for John the Ripper, but not for hashcat. For hashcat, just the hash is needed (field 2), and no other fields.

Let’s extract field 2 (you can use awk instead of csv-cut.py):

I’m storing the output in file “encryption_test – CONFIDENTIAL.hash”.

And now we can finally use hashcat. This is the command I’m using:

hashcat-4.0.0\hashcat64.exe --potfile-path=encryption_test.pot -m 10400 -a 3 -i "encryption_test - CONFIDENTIAL.hash" ?a?a?a?a?a?a

I’m using the following options:

–potfile-path=encryption_test.pot : I prefer using a dedicated pot file, but this is optional

-m 10400 : this hash mode is suitable to crack the password used for 40-bit PDF encryption

used for 40-bit PDF encryption -a 3 : I perform a brute force attack (since it’s a random password)

?a?a?a?a?a?a : I’m providing a mask for 6 alphanumeric characters (I want to brute-force passwords up to 6 alphanumeric characters, I’m assuming when John mentions a short password, it’s not longer than 6 characters)

-i : this incremental option makes that the set of generated password is not only 6 characters long, but also 1, 2, 3, 4 and 5 characters long

And here is the result:

The recovered password is 1806. We can confirm this with QPDF:

Conclusion: PDFs protected with a 4 character user password using 40-bit encryption can be cracked in a couple of seconds using free, open-source tools.

FYI, I used the following GPU: GeForce GTX 980M, 2048/8192 MB allocatable, 12MCU

Update: this is the complete blog post series:

Cracking Encrypted PDFs – Part 1: cracking the password of a PDF and decrypting it (what you are reading now) Cracking Encrypted PDFs – Part 2: cracking the encryption key of a PDF Cracking Encrypted PDFs – Part 3: decrypting a PDF with its encryption key Cracking Encrypted PDFs – Conclusion: don’t use 40-bit keys

