Photo by Samuel Zeller on Unsplash

The Problem

Reside is a technology-based real estate brokerage. At times, we deal with documents over 500 pages long. We need to be able to manipulate these documents and add signature locations, which means converting a potentially massive PDF into image files with specific resolution and size requirements to play nice with other tools in our product.

The Solution

When the user uploads a PDF, we kick off a series of steps to convert it to an image. The first is creating a database event trigger which the Firebase Cloud Function picks up on to begin processing with Ghostscript.

Setting up Ghostscript

For conversion, we use Ghostscript, however, Cloud Functions doesn’t come with Ghostscript installed so we have to add it to our project ourselves, as a git submodule:

git submodule add --name lambda-ghostscript -- https://github.com/sina-masnadi/lambda-ghostscript.git functions/lambda-ghostscript

Doing this will add the directory of the submodule and create a file called .gitmodules that is used to version control your submodules. For more details on how the .gitmodules file is used in a team setting see the documentation provided by git.

directory structure

Next, we’ll need to update package.json with some things to help out later:

npm i --save child-process-promise fs-extra

And, lastly, we’ll update package.json manually with a github link:

Creating the Firebase Function

Import needed dependencies into the module:

importing dependencies

We use @google-cloud/storage , instantiated within the scope of the function to ensure a new instance of the Google Cloud client and minimize the possibility of ECONNRESET errors that may come from underlying socket connections between Google Cloud services and Firebase. (Later versions of this package don’t face this connection reset issue).

initializing @google-cloud/storage

Next, create the needed directories so there is a place to download, create and ultimately clean up within the function. Also, Ghostscript needs to write to the filesystem since using stdout creates a single file with multiple images; whereas we need an image for each page of the PDF.

create the needed directories

wait for download from Google Cloud Storage

Next, we wrap the gs call in a Promise so we can await the conversion. We don’t want to resize images until conversion of the PDF file is complete. I described each option I passed to Ghostscript. There are plenty of other options that may suit your needs. For a complete list of flags, check out the docs for Ghostscript. Notice the the option .executablePath with the gs library to specify where we can find the executable code of Ghostscript.

Promisifying node-gs

Only after the conversion do we call spawn to invoke ImageMagick/mogrify to resize the newly created image files. Again, there are many options available to pass to mogrify . For more information please visit ImageMagick/mogrify.

using mogrify

Upon completion, the files are resized and can be uploaded to Google Storage. We get a list of files from the filesystem before making the calls to upload.

upload each page image file to Google Storage

After this point the files are on Google Cloud Storage! You can generate download urls, copy, move, etc. The last step you’ll want to do prior to exiting the function is remove the files from the filesystem.

cleanup

I hope you found this useful. If it was, show some love by reposting, clapping, or otherwise sharing. Also, if you haven’t heard of Reside you should check us out. We are a growing team with a great culture… and we are hiring.

For the whole scoop including research finding on using the command line tools and a little more storytelling, stay tuned for Part 2.

Resources

npm modules

command line tools

github submodules