Building and deploying to AWS

Before we get started you will need node8.10 and npm installed on your machine and you will need an AWS account to deploy your code to. AWS Lambda has a reasonably generous free tier — see AWS Lambda Pricing

Serverless

I’m going to use the serverless framework, which I find to be the easiest way to deploy to AWS. If you haven’t used serverless before, start by installing the cli:

npm install -g serverless

You then need to set up your AWS credentials:

How to create AWS Access Keys

Once you’ve finished the setup, create your project.

serverless create --template aws-nodejs --path ./lambda-puppeteer

This will create the lambda-puppeteer folder containing a basic javascript lambda deployment project.

My preference is to use typescript rather than plain javascript so we will convert the project to typescript below. The serverless template aws-nodejs-typescript could be used above but it creates a project that misses out a number of useful comments and it includes webpack, which we don’t need.

cd lambda-puppeteer

The serverless.yml files contains all the configuration necessary to deploy you project and the template creates a project that can be deployed and tested straightaway.

serverless deploy -v

Now test your function and look at the logs with these commands:

serverless invoke -f hello -l serverless logs -f hello -t

Chromium and puppeteer core

Lambda has a 50Mb deployment limit (unless using layers) but the community has provided an easy way to deploy everything needed in a package of about 35Mb. We will use this library to get the chromium dependencies we need:

Initialise node package manager:

npm init

Just accept the defaults for the project setup.

Add chromium:

npm i chrome-aws-lambda --save

and puppeteer-core, which is a version of Puppeteer that doesn’t download Chromium by default:

npm i puppeteer-core --save

Using typescript

There are a number of ways to configure your project for typescript such as using the serverless-plugin-typescript . In this case we’re going to manually convert the project in five steps:

1. install typescript

npm i --save-dev typescript

2. rename handler.js to handler.ts

3. install node types:

npm i @types/node

4. Add a tsconfig.json file with the following content:

5. Add these two scripts to package.json :

"scripts": {

"build": "tsc",

"deploy": "npm run build && serverless deploy",

...

},

Here we’ve added a deploy command that will compile typescript and do a serverless deploy. You could also run tests as part of the deploy by defining a test script and changing deploy to npm run build && npm run test && serverless deploy .

Implementing the service

Our pdf service will have the following interface:

export interface PdfService {

getPdf(url: string): Promise<Buffer>;

}

We expose a single function that accepts a URL parameter and returns a promise of a Buffer containing the PDF of the content of the URL.

Create a file named pdf-service.ts and add the interface code above to it.

The implementation of the interface looks like this:

Add the implementation code above to pdf-service.ts so that it contains both the interface and the implementation.

This code expands on the simple example near the beginning of this post. One thing to note is the waitUntil options I have included. This setting determines when to consider navigation has succeeded and it defaults to load . When you specify an array of event strings, navigation is considered to be successful after all events have been fired.

load - consider navigation to be finished when the load event is fired.

- consider navigation to be finished when the event is fired. domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.

- consider navigation to be finished when the event is fired. networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.

So capturing the pdf does not proceed until the last of these three have completed.

Wiring up to an https endpoint

To make our service callable, we change the handler code to:

Here we convert the buffer returned from our PdfService to a base64 string.

Finally, we add an https endpoint /pdf to call our function by replacing the functions section of serverless.yml with:

functions:

pdfReport:

handler: lib/handler.pdfReport

events:

- http:

path: pdf

method: get

integration: lambda

Note that the handler path of lib matches the outDir specified in tsconfig.json above.

Deploy your service using the deploy script we defined in package.json :

npm run deploy

After the deployment has finished we can call our pdf service by going to the url allocated by the serverless deploy, for example:

https://<your project id and region>.amazonaws.com/dev/pdf?url=https://example.com

If all is well, this should return a long base64 text response. If we use an online base64 to pdf converter (eg base64.guru) to convert the text of the response to a pdf we can see the result.

Returning application/pdf

By changing some settings in API gateway you can have your endpoint return the correct Content-Type to be displayed as a PDF. There is a serverless plugin that is meant to automate these settings:

I wasn’t able to get it to work but it may work for you. However, I was able to make the change manually following these instructions, but it’s not ideal to have configuration outside of your serverless deployment.

Update: A couple weeks after publishing this post, serverless framework v1.42.0 added support for binary media type responses. See this serverless blog post.

Adding header and footer

You can add your own HTML markup to create custom page headers and footers. One thing to note is that none of the stylesheets from the page are available so any styling needs to be done inline.

The header and footer markup can contain the following classes used to inject printing values into them:

date formatted print date

formatted print date title document title

document title url document location

document location pageNumber current page number

current page number totalPages total pages in the document

Here’s an example of adding a footer containing page numbers:

This is how it looks on the page: