What are step functions?

AWS Step Functions lets you coordinate multiple AWS services into serverless workflows so you can build and update apps quickly. Using Step Functions, you can design and run workflows that stitch together services such as AWS Lambda and Amazon ECS into feature-rich applications.

The flow that we are going to build looks like this:

We will pass in the url that we want to pdf along with optional encryption parameters and the email address to send to. The workflow only runs the encrypt step if our parameters request it.

Why use step functions?

There are a number of reasons why I recommend using step functions. These include:

It’s an easy way to handle long-running workflows. If you’ve used Api Gateway, you’ve probably come up against the 30 second timeout limit. With step functions you entire business process is allowed up to a year to complete! Also, each individual lambda function is allowed up to 15 minutes.

It encourages you to break up your code into a bunch of reusable micro-services that can then be composed in different ways for different business processes.

Each step can specify whether it should be re-tried on failure. You can specify the interval between retries, the maximum number of attempts and the back-off rate. This means that you can prevent your process from failing due to a temporary outage in one of your dependencies.

You get a visual display of you workflow that updates as the flow progresses, allowing you to inspect the inputs and output of each step, any exceptions and the full logs. You can also add alarms to notify you when problems occur.

You can add Choice steps where you can decide which step to run next based on the input data. In the flow above, IsEncryptionRequired is a Choice step.

You can also add a parallel step that is used to create parallel branches of execution.

Finally, you can trigger step functions directly or schedule them with cron-job like expressions.

The main downsides are:

once a step function flow has been started, there is currently no api support for checking on its progress.

you are limited to passing no more that 32K of data between steps. We’ll look at how to workaround that limitation later in this post.

the free tier is not particularly generous, although at least it doesn’t expire at the end of your first year. You currently receive 4,000 free state transistions per month. See pricing. Note that each retry will be charged as an additional state transition.

Sending emails

Our service is going to email PDF files. To support that, let’s define an appropriate interface to support email sending:

export interface EmailService { send(to: string, subject: string, body: string, base64Attachment?: string, attachmentName?: string): Promise<any>; }

There are many ways to implement this interface, for example using SendGrid, but perhaps the simplest way is to use nodemailer

npm i nodemailer

Here’s a possible implementation, using gmail as the sender:

To allow nodemailer to connect to your mail server you will need to provide credentials, which should not be included in your code. The serverless framework offers a few choices for managing secrets. My favourite option is to make use of the AWS Parameter store (also known as SSM, for Simple Systems Manager).

Passing secrets to your lambda

Add an environment section to your serverless.yml

environment:

EMAIL_ADDRESS: ${ssm:emailAddress-${opt:stage, self:provider.stage}~true}

EMAIL_PASSWORD: ${ssm:emailPassword-${opt:stage, self:provider.stage}~true}

Note that the ~true suffix is needed where the value is encrypted, which it will be if you use the SecureString value type — see below. Also, by including the stage in the names of these values, you can easily set up different values for dev , test and prod stages.

The implementation above picks up these values from the environment with these lines:

const emailAddress = process.env.EMAIL_ADDRESS;

const emailPassword = process.env.EMAIL_PASSWORD;

Finally, use the AWS cli to set the values, e.g.

aws ssm put-parameter --name emailPassword-dev --type SecureString --region us-east-1 --value PLw-q6x-PdX-Ab1

Note that you can set different values in different regions so make sure you specify the region you are deploying to.

Storing files in S3

As mentioned in the downsides list above, you are limited to passing no more that 32K of data between steps. To pass the pdf files between steps, we will write them to S3 and pass the file path to the next step rather than passing the file’s content.

Step functions currently supports direct integrations with various AWS services such as DynamoDB, but at the time of writing there is no support for writing files to S3. To allow us to read and write S3 files, let’s add an S3 implementation to our FileSystemService .

We also need to configure access to the S3 bucket where the files will be stored by adding the following to serverless.yml

environment:

STEP_FUNCTIONS_DATA_BUCKET: ${self:service}-${opt:stage, self:provider.stage}-step-functions-data-bucket

resources:

Resources:

StepFunctionsDataBucket:

Type: AWS::S3::Bucket

Properties:

BucketName: ${self:provider.environment.STEP_FUNCTIONS_DATA_BUCKET} iamRoleStatements: # permissions for all functions are set here

- Effect: Allow

Action:

- s3:PutObject

- s3:GetObject

Resource:

Fn::Join:

- ""

- - "arn:aws:s3:::"

- Ref: "StepFunctionsDataBucket"

- "/*"

Here we define an environment variable to hold the bucket name (which includes the stage), create the bucket resource and allow PutObject and GetObject calls to it.

Defining a workflow

Now we have all of our building blocks in place we can move on to defining a step function state machine.

You define state machines using the JSON-based Amazon States Language. The serverless-step-functions plugin adds support for including such definitions in our serverless.yml file. The serverless-pseudo-parameters plugin is also useful for constructing ARNs.

In the terminal run:

npm install --save-dev serverless-step-functions

npm install --save-dev serverless-pseudo-parameters

Then add the following section to your serverless.yml

plugins:

- serverless-step-functions

- serverless-pseudo-parameters

Add the following state machine definition to your serverless.yml

The events section sets up an https endpoint, a POST to which will kick off the flow with the body of the POST providing the input data to the first step.

The definition defines 3 task steps for GeneratePdf , Encrypt and SendByEmail . These steps each define a ResultPath, eg $.pdfFileName . This adds the output of the step as an attribute to the input object. In this way the data passed between steps can accumulate.

The definition also includes a Choice step which looks at the encryptionRequired attribute of the input. If true, the next step is encrypt , otherwise sendByEmail .

We can define an interface to describe the parameters that can be passed to our workflow:

export interface CreatePdfRequest {

url: string;

reportName: string;

toAddress: string;

encryptionRequired: boolean;

ownerPassword?: string;

userPassword?: string;

pdfFilePath?: string;

encryptedFilePath?: string;

};

The last job is to modify handler.ts to define the lambda functions used by each task step

and add the mappings to the serverless.yml

functions:

generatePdf:

handler: lib/handler.generatePdf

encrypt:

handler: lib/handler.encrypt

emailPdfReport:

handler: lib/handler.emailPdfReport

Each of the functions follows the same pattern:

cast the incoming event to the request interface

read input data if required

call the service that supports the function of the step

construct the step’s output

pass the output to the callback.

Now deploy your code with npm run deploy and if all is well, you will see the endpoint that has been created:



endpoints:

POST - Serverless StepFunctions OutPutsendpoints:POST - https://j80szm0d9i.execute-api.us-east-1.amazonaws.com/dev/pdf/create

To test the service using curl:

The POST should return something similar to this:

{"executionArn":"arn:aws:states:us-east-1:175240006580:execution:lambda-puppeteer-dev-pdf-workflow:eaacf7ac-9bc4-11e9-8a16-abe9d3e2566a","startDate":1.561960566573E9}

Hopefully you will receive the email but if not, go to the Step Functions console and click on the execution. If you haven’t successfully set up your email server credentials you might see something like this:

If at first you don’t succeed…

Once we’ve got the mail server credentials right, what happens if the mail server is down when we try to send the email. We simply add a retry section to our step definition, for example:

EmailPdfReport:

Type: Task

Resource: ...

ResultPath: $.email

Next: GetPaymentType

Retry:

- ErrorEquals:

- States.ALL

IntervalSeconds: 30

MaxAttempts: 4

BackoffRate: 4

This configuration will retry at 30 seconds, 2 minutes, 8 minutes and 32 minutes.