Simple Workflow

Today we are introducing the Amazon Simple Workflow service, SWF for short. This new service gives you the ability to build and run distributed, fault-tolerant applications that span multiple systems (cloud-based, on-premise, or both). Amazon Simple Workflow coordinates the flow of synchronous or asynchronous tasks (logical application steps) so that you can focus on your business and your application instead of having to worry about the infrastructure.

Why?

We want to make it easier for you to build distributed, fault-tolerant, cloud-based applications! In our own work with systems of this type, we have learned quite a bit. For example:

The applications often incorporate a workflow — A series of steps that must take place in a predefined order, with opportunities to adjust the workflow as needed by making decisions and by handling special cases in a structured fashion. The workflow often represents a business process – Think about all of the steps involved in processing an order on your favorite e-commerce site. Charging your credit card, updating your order history, arranging for the items to be shipped, shipping the items, tracking the shipment, replenishing inventory, handling returns, and much more. Processes can be complex – Years ago, I was told that a single Amazon.com order needed to make its way through at least 40 different states or steps before it was considered complete. I am sure that the process has become even more complex over time. Flexibility is key – Earlier attempts to specify and codify a workflow in declarative form have proven to be rigid and inflexible. At some point, procedural code becomes a necessity. Ease of use is important – It should be possible to design and implement these applications without spending a lot of time acquiring specialized skills.

You can use Simple Workflow to handle many types of multi-stage operations including traditional business processes (handling an order or adding a new employee), setting up a complex multi-tiered application, or even handling the decision-making process for a multi-player online game.

Some Definitions

Let’s start by defining a couple of terms:

A Workflow is the automation of a business process.

is the automation of a business process. A Domain is a collection of related Workflows.

is a collection of related Workflows. Actions are the individual tasks undertaken to carry out a Workflow.

are the individual tasks undertaken to carry out a Workflow. Activity Workers are the pieces of code that actually implement the tasks. Each kind of Worker has its own Activity Type .

are the pieces of code that actually implement the tasks. Each kind of Worker has its own . A Decider implements a Workflow’s coordination logic.

Let’s say that we have an image processing workflow, and that it has the following tasks:

Accept uploaded file. Store file in Amazon S3. Validate file format and size. Use Amazon Mechanical Turk to classify the image. If the image is unacceptable, send an error message using Amazon SES and terminate the workflow. If the image is acceptable, check the user’s balance in the accounting system. Launch an EC2 instance. Wait for the EC2 instance to be ready, and then configure it (keys, packages, and so forth). Convert the image to PNG format and generate a series of image thumbnails. Upload the PNG image and the thumbnails to Amazon S3. Adjust the user’s balance in the accounting system. Create an entry in the appropriate database table. Send a status message to the user, again using Amazon SES.

Tasks 1 through 12 make up the Workflow. Each of the tasks is an Action. The code to implement each action is embodied in a specific Activity Worker.

The workflow’s Decider is used to control the flow of execution from task to task. In this case, the Decider would make decisions based on the results of steps 4 (Mechanical Turk) and 6 (balance check). There’s a nice, clean separation between the work to be done and the steps needed to do it.

Here’s a picture:

What Does Simple Workflow Do?

Simple Workflow provides you with the infrastructure that you need to implement workflows such as the one above. It does all of the following (and a lot more):

Stores metadata about a Workflow and its component parts.

Stores tasks for Workers and queues them until a Worker needs them.

Assigns tasks to Workers.

Routes information between executions of a Workflow and the associated Workers.

Tracks the progress of Workers on Tasks, with configurable timeouts.

Maintains workflow state in a durable fashion.

Because the Workers and Deciders are both stateless, you can respond to increased traffic by simply adding additional Workers and Deciders as needed. This could be done using the Auto Scaling service for applications that are running on Amazon EC2 instances the AWS cloud.

Your Workers and your Deciders can be written in the programming language of your choice, and they can run in the cloud (e.g. on an Amazon EC2 instance), in your data center, or even on your desktop. You need only poll for work, handle it, and return the results to Simple Workflow. In other words, your code can run anywhere, as long as it can “see” the Amazon Simple Workflow HTTPS endpoint. This gives you the flexibility to incorporate existing on-premise systems into new, cloud-based workflows. Simple Workflow lets you do “long polling” to reduce network traffic and unnecessary processing within your code. With this model, requests from your code will be held open for up to 60 seconds if necessary.

Inside the Decider

Your Decider code simply polls Simple Workflow asking for decisions to be made, and then decides on the next step. Your code has access to all of the information it needs to make a decision including the type of the workflow and a detailed history of the prior steps taken in the workflow. The Decider can also annotate the workflow with additional data.

Inside the Workers

Your Worker code also polls Simple Workflow, in effect asking for work that needs to be done. It always polls with respect to one or more Task Lists, so that one Worker can participate in multiple types of Workflows if desired. It pulls work from Task Lists, does the work, updates the Workflow’s status, and goes on to the next task. In situations that involve long-running tasks, the worker can provide a “heartbeat” update to Simple Workflow. Deciders can insert Markers into the execution history of a Workflow for checkpointing or auditing purposes.

Timeouts, Signals, and Errors

Simple Workflow’s Timeouts are used to ensure that an execution of a Workflow runs correctly. They can be set as needed for each type of Workflow. You have control over the following timeouts:

Workflow Start to Close – how long an execution can take to complete.

– how long an execution can take to complete. Decision Task Start to Close – How long a Decider can take to complete a decision task.

– How long a Decider can take to complete a decision task. Activity Task Start to Close – How long an Activity Worker can take to process a task of a given Activity Type.

– How long an Activity Worker can take to process a task of a given Activity Type. Activity Task Heartbeat – How long an Activity Worker can run without providing its status to Simple Workflow.

– How long an Activity Worker can run without providing its status to Simple Workflow. Activity Task Schedule to Start – How long Simple Workflow waits before timing out a task if no workers are available to perform the task.

– How long Simple Workflow waits before timing out a task if no workers are available to perform the task. Activity Task Schedule to Close – How long Simple Workflow will wait between the time a task is scheduled to the time that it is complete.

Signals are used to provide out-of-band information to an execution of a Workflow. You could use a signal to cancel a Workflow, tell it that necessary data is ready, or to provide information about an emergent situation. Each Signal is added to the Workflow’s execution history and the Workflow’s Decider controls what happens next.

Simple Workflow will make sure that execution of each Workflow proceeds as planned, using the Timeouts mentioned above to keep the Workflow from getting stuck if a task takes too long or if there is no activity code running.

Getting Started

Here’s what you need to do to get started with Amazon Simple Workflow:

Write your Worker(s) using any programming language. Write your Decider, again in any programming language. Register the Workflow and the Activities. Run the Workers and the Decider on any host that can “see” the Simple Workflow endpoint. Initiate execution of a Workflow. Monitor progress of the Workflow using the AWS Management Console.

The AWS Flow Framework

In order to make it even easier for you to get started with Amazon Simple Workflow, the AWS SDK for Java now includes the new AWS Flow Framework. This new framework includes a number of programming constructs that abstract out a number of task coordination details. For example, it uses a programming model based on Futures to handle dependencies between tasks. Initiating a Worker task is as easy as making a method call, and the framework takes care of the Workers and the Decision Tasks behind the scenes.

Simple Workflow API Basics

You can build your Workflow, your Workers, and your Decider, with just a handful of Simple Workflow APIs. Here’s what you need to know to get started:

Workflow Registration – The RegisterDomain, RegisterWorkflowType, and RegisterActivityType calls are used to register the various components that make up a Workflow.

– The RegisterDomain, RegisterWorkflowType, and RegisterActivityType calls are used to register the various components that make up a Workflow. Implementing Deciders and Workers – The PollForDecisionTask call is used to fetch decision tasks, and the RespondDecisionTaskCompleted call is used to signal that a decision task has been completed. Similarly, the PollForActivityTask call is used to fetch activity tasks and the RespondActivityTaskCompleted call is used to signal that an activity task is complete.

– The PollForDecisionTask call is used to fetch decision tasks, and the RespondDecisionTaskCompleted call is used to signal that a decision task has been completed. Similarly, the PollForActivityTask call is used to fetch activity tasks and the RespondActivityTaskCompleted call is used to signal that an activity task is complete. Starting a Workflow – The StartWorkflowExecution call is used to get a Workflow started.

The Amazon Simple Workflow API Reference contains information about these and other APIs.

AWS Management Console Support

The AWS Management Console includes full support for Amazon Simple Workflow. Here’s a tour, starting with the main (dashboard) page:

Register a workflow domain

Register a workflow for a workflow domain

Register an activity within a workflow:

Initiate execution of a workflow:

Provide input data to an execution:

See all of the currently executions of a given workflow:

Pricing

Like all of the services in the AWS Cloud, Amazon Simple Workflow is priced on an economical, pay-as-you-go basis. First, all AWS customers can get started for free. You can initiate execution of 1,000 Workflows and 10,000 tasks per month and you can keep them running for a total of 30,000 workflow-days (one workflow active for one day is equal to one workflow-day).

Beyond that, there are three pricing dimensions:

Executions – You pay $0.0001 for every Workflow execution, and an additional $0.000005 per day if they remain active for more than 24 hours.

– You pay $0.0001 for every Workflow execution, and an additional $0.000005 per day if they remain active for more than 24 hours. Tasks, Signals, and Markers – You pay $0.000025 for every task execution, timer, signal, and marker.

– You pay $0.000025 for every task execution, timer, signal, and marker. Bandwidth – There is no charge for data transferred in to Simple Workflow. There is no charge for the first Gigabyte of data transferred out, and the usual tiered AWS charges apply after that.

Amazon Simple Workflow in Action

Here are some ways that people are already putting Amazon Simple Workflow to use:

RightScale is using it to ensure fault-tolerant execution of their server scaling workflow. Read Thorsten von Eicken’s post, RightScale Server Orchestration and Amazon SWF Launch, for more information.

NASA uses Simple Workflow to coordinate the daily image processing tasks for the Mars Exploration Rovers. Read our new case study, NASA JPL and Amazon SWF, to see how they do it.

Sage Bionetworks coordinates complex, heterogeneous scientific workflows. Check out the new case study, Sage Bionetworks and Amazon SWF, for complete information.

Go With the Flow

I am very interested in hearing about the applications and Workflows that you implement with Amazon Simple Workflow. Please feel free to leave a comment or to send me some email.

— Jeff;