Amazon Linux is a distribution that evolved from Red Hat Enterprise Linux (RHEL) and CentOS. It is available for use within Amazon EC2: it comes with all the tools needed to interact with Amazon APIs, is optimally configured for the Amazon Web Services ecosystem, and Amazon provides ongoing support and updates. You should not use this distribution for your EC2 instances, however, as the convenience of Amazon Linux does not outweigh this significant operational downside: there is no official distribution of Amazon Linux for use outside the Amazon cloud.

Why does this matter? The obvious first answer is that use of Amazon Linux creates additional migration costs when moving to another cloud service. By using it you are locking yourself in just that little bit more, and entirely unnecessarily. There are any number of other stable, well supported distributions you can use in EC2, and installing the tools for working with Amazon APIs is a trivial few lines of shell script for any of them. So why make life difficult for yourself when you don't have to?

There is a much more important consideration resulting from the inability to run Amazon Linux outside EC2, however, and one that directly imposes significant additional costs on development. For any server application the development environment should replicate the deployment environment as closely as possible. Developing on distribution A and deploying to distribution B is just asking for trouble. The least of what you are letting yourself in for is spending twice as much time and money on critical devops tooling throughout your process. The much more likely outcome is a whole range of bugs and problems that would never have occurred if you stuck to the same distribution and configuration throughout your varied environments. The inability to run Amazon Linux outside EC2 means that if you use this distribution then you cannot have the same environment locally as in EC2, and all of these problems begin to rear their ugly heads.

A Note on Access to AWS APIs and Instance Metadata

A local environment cannot be exactly the same of course: it will most likely use different configuration parameters and service endpoints, and a different setup for access to the instance metadata service, given that this doesn't exist outside EC2. The locally running application will call all of the same AWS services as a production deployment, but:

Instance metadata is accessed through hand-rolled abstraction layers (such as Bash scripts or something more sophisticated) that are mocked in the local development environment.

AWS APIs are accessed via the AWS Command Line Interface or AWS SDK that allow environment-specific configuration.

For services such as ElastiCache that are inaccessible outside EC2, either proxy servers or local Redis or Memcached servers are used instead.

Is it Possible to Recreate Amazon Linux Outside EC2?

Opinions differ on whether it is possible as a practical matter to create and maintain a good-enough hand-rolled mock Amazon Linux server outside EC2. It is certainly going to be hard and an ongoing headache. In theory a fair replica for use with VMWare or Virtualbox might be created by mashing together an appropriate Linux kernel with a copy of an Amazon Linux userspace obtained from a running instance. As a a StackExchange post on the subject shows, it might be technically possible but ending up with a stable outcome is going to require a fair degree of effort. To me this looks like a high risk undertaking with a good chance of failure for any specific development use case: too many unknowns.

Assuming you get as far as a stable instance in a local VM, however, then it isn't hard in comparison to sort out other necessary details. Those include setting up a proxy server inside EC2 to provide access to Amazon Linux Yum repositories and altering the repository definitions in /etc/yum.repos.d to point to the proxy. The same goes for access to ElastiCache if you are using that: you will need to set up simple pass-through proxy instances inside EC2 to allow access to ElastiCache from outside EC2.

Is CentOS a Suitable Substitute?

As of the the time of writing, using CentOS 6.4 as a development environment substitute for Amazon Linux is a much more practical goal than trying to run Amazon Linux outside EC2. It isn't free from added cost, however. Amazon Linux now differs significantly from its nearest neighbor distributions such as CentOS and RHEL, using more recent versions of a range of important packages. Some of the more significant differences include:

Ruby version 2.0.0 rather than 1.8.7.

Glibc version 2.17 rather than 2.12.

As a result of these and other changes, RPM packages in the Amazon Linux repositories for numerous common server applications are not the same and not compatible with those in the Extra Packages for Enterprise Linux (EPEL) repositories. This includes Nginx and Memcached, among many others.

Thus if you use CentOS as a substitute for Amazon Linux in your development environment then you will almost certainly have to create and maintain two distinct and different sets of instance provisioning and deployment configurations. If you maintain your own deployment-specific Yum repository then you will quickly find it necessary to maintain two Yum repositories, one for CentOS and one for Amazon Linux. This is going to be the case even if you are building web applications that don't use Ruby and don't otherwise much care about the underlying server technologies.

Moving the Development Environment into EC2

If you really don't care about vendor lock-in then it is perfectly possible to move your development environments off local machines and into EC2. Every developer gains a personal deployment environment that replicates production: in effect everyone has their own staging environment in the cloud. This solves the Amazon Linux problem, but introduces a number of other issues and potential slowdowns to the development process.

Most of these issues are illustrated by the present state of the Vagrant AWS Provider plugin, which allows you to use local Vagrant with remote EC2 instances. The most pressing problems are that the plugin (a) isn't compatible with CloudFormation stack deployment and (b) doesn't offer full ongoing synced folder functionality. Whatever your development setup you will have to solve these problems in a way that doesn't introduce significant roadblocks and slowdowns to the process of developing new features and fixes. How do you efficiently deliver code to the development environment as it is being changed locally by the developer, and how do you update the provisioning setup without requiring a twenty minute break in work?

This may or may not be a mountain of work, depending on the details of your situation. For most applications these are not insolvable problems, but solutions do require additional significant work on development tools. Further, in my experience you are not going to escape without some additional imposition of inconvenience and delay on the developers, and thus raised costs.

Don't Use Amazon Linux

Alternatively you could just not use Amazon Linux, at which point all of these problems go away and your local development environments can be made exactly the same as the deployment environments in EC2. There will be no more issues than is the case with any cloud deployment setup, and you won't have added unnecessary costs and vendor lock-in. So why make life difficult for yourself?