My first intuition was to examine the page source files. I will skip the parts where I was randomly clicking through all possible directories and folders while looking for the right files, and instead will go straight to the ones relevant to this tutorial. You can press Command+Shift+C to bring up the developer console in Chrome. Then open the Sources tab.

As you can see there is a pdflord.com directory, with a plugins folder under assets. If you scroll down, you will find a folder called pdfjs, which contains two files — pdf.js and viewer.js. It turns out that PDFLord is using an open-source PDF rendering and parsing javascript library by Mozilla, which you can find here https://mozilla.github.io/pdf.js/

Let’s dig through the viewer.js file a bit more. After some inspection we find a method which sounds like it deals with page rendering:

function webViewerPageRendered(evt)

Let’s add a breakpoint on line 2141 inside this method right after the pageView variable and reload the page. Our goal is to examine what the object pointed at by this variable represents.

After clicking through a bunch of object members… voila! We finally stumble on what we have been looking for — an integer array that very likely represents pixel data of the image of page 1 of the PDF.

Surely, now we can just write a script to go over every page in the PDF, extract the image data arrays, convert them to jpegs, and end up with a sequence of images of the PDF file. To be honest, I wasn’t quite satisfied with this finding — I would still not be able to select any text or search through the images. I was looking for a better way.

If we examine the viewer.js file a bit more, we find another interesting function:

In particular, there is this very intriguing line which looks like it deals with restricting downloads:

if (PDFViewerApplication && PDFViewerApplication.appConfig.allowdownload) {

And then we also find the following sequence which deals with binding events to button click listeners. It’s amusing how the “print” and “download” events are very sloppily commented out, most likely to handle print and download logic in a different part of the code.

At this point our action plan is clear: