At DoorDash, providing a fast, on-demand logistics service would not be possible without a robust computing infrastructure to power our backend systems. After all, it doesn’t matter how good our algorithms and application code are if we don’t have the server capacity to run them. Recently, we realized that our existing Heroku-based infrastructure wasn’t meeting our needs, and that we would need to upgrade to Amazon Web Services. With the help of an up-and-coming technology called Docker, we were able to make this transition in much less time and effort than would have otherwise been possible.

DoorDash was originally hosted on Heroku because it was a simple and convenient way to get our app up and running. Instead of worrying about the low-level complexities of server infrastructure, we could focus our time on developing product features. However, using a “platform-as-a-service” was not without tradeoffs, and as our traffic scaled up, we started to face some serious problems and limitations with Heroku.

Performance : The most pressing issue was the lackluster performance of Heroku server instances (aka “dynos”). Each dyno is extremely constrained in its CPU performance and memory resources (which is not surprising considering that Heroku runs multiple dynos on a single Amazon EC2 instance). Even after extensive performance tuning of our Django app, we were still forced to run a lot more dynos than we would have liked and didn’t see how this would continue to scale.

: The most pressing issue was the lackluster performance of Heroku server instances (aka “dynos”). Each dyno is extremely constrained in its CPU performance and memory resources (which is not surprising considering that Heroku runs multiple dynos on a single Amazon EC2 instance). Even after extensive performance tuning of our Django app, we were still forced to run a lot more dynos than we would have liked and didn’t see how this would continue to scale. Cost Efficiency : Heroku dynos were very expensive for the computing resources we were getting. For roughly the same price as a Heroku “2x” dyno with 1GB RAM, we could have rented an Amazon c3.large EC2 instance with 3.75GB RAM.

: Heroku dynos were very expensive for the computing resources we were getting. For roughly the same price as a Heroku “2x” dyno with 1GB RAM, we could have rented an Amazon c3.large EC2 instance with 3.75GB RAM. Reliability : Surprisingly, we found that Heroku was plagued by reliability issues. An outage in Heroku’s deployment API would seem to pop up every week or two. One memorable half-day incident prevented us from pushing a critical hotfix and permanently eroded our trust in the platform.

: Surprisingly, we found that Heroku was plagued by reliability issues. An outage in Heroku’s deployment API would seem to pop up every week or two. One memorable half-day incident prevented us from pushing a critical hotfix and permanently eroded our trust in the platform. Control: With Heroku, you lose fine-grained control and visibility over your servers. Installing custom software in your server stack is far from straightforward, and it’s not possible to SSH into a server to debug a CPU or memory issue.

In order to overcome these issues, we knew that we needed to move off of Heroku and find a new hosting provider. The logical upgrade choice was Amazon Web Services and its Elastic Compute Cloud (EC2) service. Amazon EC2 virtual server instances come in a wide variety of CPU and memory configurations, feature full root access, and offer much better performance-per-dollar than Heroku.

While AWS looked like an attractive solution, migrating from a high-level, managed platform provider like Heroku can be a daunting task. Simply put, there is a nontrivial amount of work needed to administer a cluster of servers and set up continuous code deployment. To automate the process of setting up our servers, you would normally need to set up “configuration management” software such as Chef or Puppet. However, these tools tend to be clunky and require learning a domain-specific language. We would need to write scripts to perform tasks such as installing all of our app’s third party dependencies or configuring all the software in our server stack. And to test that this code works, we would need to set up a local development environment using Vagrant. All of this amounted to a significant amount of complexity that was not looking too appealing.

Docker

We wondered if there was an easier way to get onto AWS, and had been hearing a lot about this relatively new virtualization technology called Docker (www.docker.com). Docker allows you to package an application, complete with all of its dependencies, into a “container” environment that can run in any Linux system. Because Docker containers are simply snapshots of a known and working system state, it’s finally possible to “build once, run anywhere”.

Docker containers encapsulate a virtual Linux filesystem, providing much of the portability and isolation offered by a virtual machine. The big difference is in the resource footprint. Whereas a virtual machine needs to run an entire operating system, containers share the host machine’s Linux kernel and only need to virtualize the application and its dependent libraries. As a result, containers are very lightweight and boot up in seconds instead of minutes. For a more technical explanation about containers, check out this article from RightScale.

Implementation

After learning what Docker was capable of, we knew that it could play a key role in accelerating our AWS migration. The plan was that we would deploy Docker containers running our Django app on to Amazon EC2 instances. Instead of spending effort configuring EC2 hosts to run our app, we could move this complexity into the Docker container environment.

Building the Docker Image

The first step was to define the Docker image that would house our app. (To clarify some terminology, a Docker “image” is basically a template for containers, and a container is a running instance of an image). This mostly involves writing a simple configuration file called a Dockerfile. In contrast to complex Chef or Puppet scripts written in a custom DSL, a Dockerfile closely resembles a series of Bash commands and is easy to understand.

Docker images are composed in layers and can be based off of other images. As a starting point, you would typically use a stable Docker build of a popular Linux distro such as Ubuntu or CentOS. These “base images” are hosted on Docker Hub, a public repository of predefined Docker images.

In the case of our Django app, most of the work was figuring out how to get tricky third-party Python dependencies to install (particularly those with C-extensions) and setting up the more complex software components in our stack (such as our web server or a database connection pooler). Normally this process can be tedious and filled with trial and error, but being able to test everything in our local environment was a huge boon. It takes next to no time to spin up a new Docker container, allowing for a super fast development pace.

Preparing the Docker Environment

The second step was to set up a Docker runtime environment on EC2. To save us time, we decided to use AWS Opsworks, a service which comes with a built-in Chef layer to help manage EC2 instances. While we couldn’t avoid Chef, we didn’t have to spend as much time wrangling with it because we had already defined the vast majority of system configuration inside our Docker image.

Our code deployment flow was straightforward, and mostly consisted of building a Docker image off our latest codebase and distributing it to our server instances, with the help of a Docker image server. We only needed to write a few short Chef scripts to download the latest Docker image build and start up a new Docker container.

Our Docker container-based deployment flow.

Results

From conception to completion, our migration from Heroku to AWS took two engineers about one month. Amongst other things, this included learning Docker, AWS and Chef, integrating everything together, and testing (and more testing). What we appreciated most about Docker was that once we got a container working locally, we were confident that it would work in a production environment as well. Because of this, we were able to make the switch with zero glitches or problems.

On our new Docker+AWS environment, we were able to achieve an over 2x performance gain compared to our Heroku environment. DoorDash’s average API response time dropped from around 220ms to under 100ms, and our background task execution times dropped in half as well. With more robust EC2 hardware, we only needed to run half the number of servers, cutting our hosting bill dramatically. The extra degree of control over our software stack also proved useful, as it allowed us to install Nginx as a reverse proxy in front of our application servers and improve our web API throughput.

Final Thoughts

Needless to say, we were pretty happy about our decision to move to AWS, and Docker was a key reason why we were able to do it quickly and easily. We were able to greatly improve our server performance and gain much more control over our software stack, without having to incur a ton of sysadmin overhead. Being able to develop and tweak a Docker container locally and knowing it will run exactly the same in any environment is a huge win. Another big advantage is portability — rather than being than being too tied to our existing hosting provider, we can easily deploy to any other cloud infrastructure in the future.

It’s clear that Docker is a powerful technology that should empower developers to take control over how they deploy their apps, while shrinking the marginal benefit that managed PaaS solutions like Heroku provide. We’re definitely looking forward to see what else we can do with it.

- Alvin Chow