1 Introduction

Today we're are going to talk about a fairly hot topic these days, Continuous Deployment. We'll talk about what is, what it's not, and most importantly, how you can find the parts that work best for you and your organization, without necessarily having to deploy to production 50 times per day like Etsy or Flickr.

Over the last few years, there has a been a lot of great progress in the development community around DevOps, Continuous Integration, Continuous Delivery, and Continuous Deployment. Technology companies have been embracing these ideas of smaller, faster development cycles, and getting code written, integrated, tested, and deployed as quickly as possible.

However, there has been a growing opinion that in order to do continuous deployment "right", you need to meet some minimum requirement of developers deploying directly to production all day long. Unfortunately this approach can be a little overly dogmatic for many companies and scares them off before they can find some less extreme ways to improve their development process by incrementally adding the continuous deployment principles that can help them the most.

2 Continuous Integration vs. Continuous Delivery

Continuous deployment is related to continuous integration and delivery, and could be considered the next step in that evolution.

Continuous integration is the process of having developers merge their code together on a near-constant process, to identify potential integration problems. This usually involves a build server that is steadily watching the code repository for changes, pulling and compiling code, and running automated tests. This gives a great visibility into the ongoing health of your code and reduces the communication friction between teams working on the same code, and should probably be considered a minimum requirement for any large or distributed software project, regardless of your delivery and deployment needs.

Continuous delivery is the process of having your development flow steadily creating a deployable version of the system. Where and when (and even if) you deploy that package depends on your situation, but the key point is to keep your code in an always-deployable state. While this does take a lot of work, it gives you a great flexibility being to get fixes and features in front of your QA or users as quickly as possible and ensures that you are maintaining a high level of quality and stability at all times.

3 So What Is Continuous Deployment?

Continuous deployment is the next step past continuous delivery, where you are not just steadily creating a deployable package, but you are actually deploying it steadily.

Note that we did not say "deploying to production immediately". Several cutting edge companies like Etsy and Flickr and Facebook are doing this, having developers deploy to production several times per day, which is a great goal to work towards for several companies, but it is not an absolute requirement and may even be counterproductive for some companies. There are a lot of benefits you can accomplish by pursuing continuous deployment principles, without having to drink the whole gallon of deploy-to-production-10x-per-day Kool-Aid.

So we think a more throughout and welcoming definition for continuous deployment would be: consistently deploying code to production as features are completed, and as soon as you have met the release criteria for those features. That release criteria depends on your situation, and may be running some automated tests, code reviews, load tests, manual verification by a QA person or business stakeholder, or just having another pair of eyes look at your feature and make sure it doesn't explode. Again, the specific criteria can vary, but the key idea is to have a steadily flowing pipeline pushing changes to production, always moving the code forward, and keeping the pipeline as short as realistically possible.

4 So Why Bother?

This may sound like a huge pain. Yes it is indeed a lot of work, and requires a lot of discipline. So what benefit do we get out of this? Some of the benefits include:

Developers must always ensure consistent quality and releasable code at all times. There is nowhere to hide bad/lazy code, as it will probably cause problems right away. Too often in large release cycles a feature is "mostly" done, with a few cleanup items that everyone intends to get back to. Deep down we all know that the most efficient way to handle those items are to do them immediately before moving on to the next tasks, but we talk ourselves out of that because we are anxious to get started on our next task. In a continuous deployment environment, that is not an option, and it forces your developers to keep themselves honest because there is no "later in the sprint"; there is only completing the feature and getting it shipped right now. Even if you are pursing continuous "delivery" and saying every build is technically releasable, if most of those builds are not actually getting deployed anywhere it can get real easy to be cut a few corners today because you probably have time to clean it up before the real release is deployed later that week. The business users (whoever they are in your company), can get the features they are requesting in a minimum amount of time, and they can even change their mind after the fact. Granted, it tasks time to define and implement and test a feature, and much of that process you can't and don't want to rush. However, too often that part is 20% of the delivery time, and another 80% is documentation and UAT signoffs and change management forms and regression testing, leaving weeks or even months of unnecessary time between when the business decides they want a feature and when they actually get it. Heck, you're lucky if the same business people are even still working there by the time the feature is delivered. However, in a continuous delivery model, you are constantly working to minimize that time as much as possible, squeezing out any unnecessary inefficiencies, so there is as little as possible standing between your code and production. Also the business folks don't need to spend 3 weeks agonizing over the best way to add a specific feature, because they can try the best looking option, and adjust a few hours or days later if it's not working. Developers are less scared of the code. It seems odd that experienced, professional developers would be scared of the code they are working on every day, but the reality is that as systems grow larger and are steadily built upon, as other developers come and go, and as time goes on and brain cells float away, all code tends to get a little scary with age and neglect. In any large system, there is usually more than one section that nobody wants to touch, and usually won't touch if at all avoidable. Granted, if folks just sucked it up and dove into it, and actively worked to improve it, it would all stay a lot less scary. But the blistering visibility of a continuous delivery/deployment model enforces this, because nothing can be neglected or avoided for too long, or it will become too painful to do any incremental changes. Fast production-critical bug fixes. So after 4 weeks of design meetings and development and regression testing and UAT testing and load testing and integrating feedback and scheduling a 2:00 AM Change Management Window, the next morning after your feature ships, someone notices that the user's computer bursts into flames when Gaelic users try to load your site using a homemade browser on a custom refabricated iPad. Sure that fell a wee bit outside of your test coverage, but fire is bad, so you need to get a fix deployed. How quickly can you get that out? In some situations, you need a convince some Vice President who knows nothing about your application to approve an emergency "some developer is an idiot" change management ticket, and make all of the same operations folks come back in for yet another awful overnight deployment because some developer is an idiot (did we cover that everyone thinks the developer is an idiot? Because everything now thinks that, and boy is that healthy for team morale). Or, in a continuous deployment model, we can do another deployment before most of the Gaelic users are out of bed, so everything is going to be fine. The end of week-long end-to-end regression testing. To be clear, regression testing is great. It's really important, but it is so important and time-consuming that you really need to work to automate it. What you absolutely cannot afford in a continuous deployment model is to complete your all of your development and testing, and then have a QA team spend the next 2 weeks doing regression testing of every feature in the application. Instead, you need to make sure you that you have reasonably thorough level of automated testing around your application, sufficient logging and monitoring to know if something goes wrong, and the zen peacefulness or intestinal fortitude to accept that all code has bugs, and that when problems eventually arise, you can get them fixed in hours or days instead of week or months. Of course, this is not a complete abdication of testing, but when you can get another deployment out the door quickly, it removes the need for much of the near-paranoid paralysis that comes from having to have everything perfect before it goes out the door. Most of us are not shipping out CDs of our software or putting embedded software inside pacemakers, we're usually putting continuously-updated code on centralized servers that we control, and processes should optimize for that reality.

One of the most important things we need to remember in this business is the very first value of the Agile Manifesto:

"Individuals and interactions over processes and tools"

There is a common pattern in software development where people try to imitate success trying to copy the tools or the process that seem to have made other teams successful, rather than embracing the underlying ideas that originally drove that success.

In order to have a successful continuous deployment pipeline, you will definitely need some automated tools, but your success will not be directly dependent on exactly which tool you use. More important than the tools is the process that you use, but that is not the core deciding factor either. Instead, the single most important factor in a successful continuous deployment strategy is the buy-in, support, and collaboration of everyone involved.

Again, everyone's situation is different, and no amount pre-packaged process and out-of-the-box tools are going to magically turn your team into well-oiled continuous deployment machine. To be truly successful with it, and to maximize the benefit for your given situation, you need a fundamental cultural shift. Everyone on the team must believe that keeping the code in a shippable state is critically important, and must have the guts to actually ship it. Everyone must believe that they must leave the code cleaner than they found it. Everyone must be focused on building the best software they can as efficiently and transparently as possible, instead of hiding behind status reports and bug counts and plausible deniability. If a problem gets out the door to production, you don't have a post-mortem meeting to determine whether it was the fault of QA or development or requirements, you focus on fixing the problem.

Without that buy-in from your teams, you will be fighting an uphill battle to install a process that everyone will hate because it makes more work without having tangible benefits. But if you can get everyone to buy-in to the ideas of continuous delivery and deployment, like any other aspect of shipping software, you will be consistently amazed to see the seemingly impossible level of productivity and quality a development team can produce when they truly believe in what they are doing and are motivated to do the best job they can.

So don't agonize over finding the perfect tool. Don't analyze every detail of Facebook's and Flickr's deployment processes. You are not them, and they are not you. Instead, identify your goal, and start working incrementally to get there, building as much support from your team and company as possible. You will need them far more than any automation tool or strategy book.

6 Ok, So How Do I Do This?

Again, there is no one true formula, but there are a lot of important ideas, most of which you will want to incorporate at some point.

6.1 Automate Deployments From Step Zero

From the point that you first start setting up your project, start figuring out how you are going to deploy it, and what configuration parameters you will need. Before your project does anything useful, have some scripts in place to deploy it to a development server to ensure that you have a provable and repeatable process.

There are several reasons for this. The most obvious is that this is a step that usually gets lost until the end, and then you are racing to put something together, or you skip it all together. That does not lead to happy deployments.

Another reason is so that your project is built from a DevOps mindset from day one. Too often it's easy for developers to get lost in the specific code that they are building, losing track of the idea that the software needs to get released and deployed downstream. This may not seem like a big deal, but can have a lot of impacts on how you build a feature, especially when it involves configuration or third-party dependency dependencies. Developers should always instinctively be asking, "How will this affect the deployment process?". To accomplish that, developers must be comfortable with that process. To be comfortable with it, most importantly it must exist.

Lastly, coming up with a smooth development process is not something you do the first time, it is a gradual process of identifying pain points and actually working to remove them. To do this, you must have the discipline and dedication to pursue this, but you also need the time. If you are slapping your deployment process together as a last step, it will not go through any iterative rounds of developer-approval-testing, so it will never have the opportunity to mature into something that doesn't suck.

6.2 Automated Testing

This is probably one of the most common and obvious ones. Your journey into continuous delivery and deployment will be greatly helped by having as much useful automated testing as is practical. Unit testing, integration testing, system testing: all of these can be very helpful in ensuring that you system is healthy.

There are endless arguments to be had about the right amount of testing, but many people would say that the more tests you have, the better. I would humbly say those people are gravely mistaken. Even more so than code comments that don't say anything useful, the very presence of unnecessary or unhelpful tests can be counterproductive. While tests are good for ensuring correct functionality, if they become too verbose, or too specific, or two slow, they can slow down your release pipeline, and also can make it very time-consuming to maintain and refactor your code. Pretty soon, you have a few tests failing even though the system is actually working correctly, so you start ignoring some failures, and all of the sudden your whole testing suite is virtually worthless because you can't trust the results. Instead, I would suggest taking all of the dollars you spent building those tests, and burn them in a fire pit in your backyard, because at least you can roast some s'mores that way. Because s'mores are delicious. And go ahead and invite over your coworkers who have been trying to achieve 99% unit test coverage for the last two weeks, because they probably need to get out more.

Again, this is not to say that automated tests are bad, they just need to be used with restraint, and must be constantly and critically reviewed to ensure that they are still worthwhile. If a batch of tests are causing problems because they are not testing the correct functionality, make sure you remove or rewrite them, but definitely don't leave them sitting there polluting your test results.

Given that you do need to do some testing though, unit tests are a great place to start. As a part of developing non-trivial features, developers should be writing tests that ensure that a component satisfies the business need. Make sure these are as simple as possible, run as fast as possible, and have as few dependencies as possible, so that any developer can download and build a copy of the application code, run the tests, and have them pass. From that point on, a really useful way for developers to learn how the system works, and how it is supposed to work, is to read the tests. If possible run an automated test runner (like Guard for Ruby or NCrunch for .NET) to re-run your tests every time you make a change, so you know immediately once you've broken a test. Also, you're build server should be running the tests constantly as a part of your continuous integration builds.

But as soon as your tests are calling a whole bunch of objects, or talking to third party systems, or talking to a database (that is not embedded right in your development environment), you are not building unit tests anymore, you are building integration tests. Integration tests definitely are useful as well, but due to their nature should be run in a different way. If they depend on a database or a third party system or extensive scenario setups, they will probably be time-consuming. You still want to make it easy for developers to run them locally when they want to, but these tests are better for running from a build server, which is specifically set up with the necessary database and configuration, maybe a few times a day, or before a deployment, instead of running constantly. Sometimes these integration tests can test code that has a bunch of dependencies, or sometimes they can test the actual UI of the application (using tools like Selenium or Watir/Watin).

Interestingly, integration tests are often dismissed as being useless or impractical, with the idea that if you have fast, well-written unit tests, you don't need to waste effort on integration tests, which are usually slower and more fragile. However, way too often, all of the unit tests are passing fine, but once the whole system is deployed it explodes because the perfect little components don't fit together correctly, or make different assumptions about how the system is going to work. Integration tests get you much closer to verifying what one of my colleagues calls "the #$% system actually #$% does what the #$% system is supposed to #$% do". In the end, a healthy mix of both unit tests and integration tests will probably serve you well, but again you must constantly be reviewing them to make sure they are testing the right thing and are providing value.

Lastly, you need a lot of production system-level testing. This can involve monitoring log files, exception reports, and response times, and can also involve running some of your integration tests against the production environment to ensure that everything is working correctly. New Relic is a great tool to get started with this very easily, but you should also be constantly looking for more ways to add monitoring to your system.

6.3 Excessive System Monitoring and Logging

So system testing leads us to the next point, which is system monitoring and logging. In addition to tools like New Relic and MiniProfiler, you should be building as many dashboards and health check jobs as possible to look for problems and notify someone who can look into it. There should never be a situation in your system where a user is encountering issues, or the system is slowing down, or data is being corrupted, but nobody knows about it.

This is a critically important point. When you start using continuous deployment, you are not doing crazy amounts of QA and regression testing and user acceptance testing before shipping features, so you really need to know if a new feature introduced a problem, what that problem is affecting, and exactly when it started. You also need to know that as soon as possible, so if something you deployed an hour ago introduced the issue, you can make an intelligent decision about how to solve the problem, whether it is disabling the new feature or racing to fix it.

6.4 Short Lived Code Branches

Arguments could rage forever about the right way to branch source code, especially around whether to use branches per feature, per release, per team, or per component.

In many companies with fixed release cycles (e.g. monthly or quarterly), a common approach is to have a branch for each release, so that patches can be created for that released code base without worrying about mix it with in-process development changes. However, for continuous deployment, the idea of release branches goes away, because you no longer have the idea of a few big fat deployments. Instead, if you have a problem in the current deployment that you need patch, the next deployment is right around the corner.

Another common reason for branching is to create a branch for every feature. While this can be useful for small features, once you start have large, long-lived branches for a feature, you run the risk of having some really expensive merges. Instead, it is usually best to develop features in the trunk, keeping that feature disabled with a feature toggle (which we'll discuss shortly), and saving feature branches for experimental features that you may just end up throwing away so you don't want that cluttering up the trunk code base.

6.5 Feature Toggles

Feature toggling is the idea that you can turn new features on or off by using configuration. This can control when and how functionality appears in the application.

Usually people think of this in terms of business-driven toggling, where a business user decides that we don't want to use a feature any more or that it should work differently, so they want to flip a switch to make that happen.

However, in terms of continuous deployment, the much more important usage is release-driven toggling, which is hiding a new feature for some or all users until that feature is ready to go. As we mentioned, we're trying to minimize our code branching and keep our branch lives short. To accomplish that, developers need to be comfortable gradually introducing changes to the trunk of the code repository, with the ability to keep those changes hidden until they are ready to go live. This allows them to build out whole features that can be interactively reviewed and approved by business users on an internal system, only to be enabled once the business users are happy with it, without having to keep that codebase from going to production with other fixes, and without having to introduce expensive long-lived branches for those features. Once the decision is made it introduce a feature, you'll be well served to have feature toggling library that allows you to gradually introduce any invasive changes to subsets of users, so that you can test it out with those users before unleashing it on the whole world.

To accomplish this, you really need to be disciplined about keeping your features small and focused. It can be very easy to misestimate the impact of turning a feature on and off, and could cause side effects on other components. Thankfully, if you are deploying early and often with several in-process features turned off, your are constantly validating that your disabled features are self-contained, otherwise it will break production tomorrow, in which case you will really need to fix that. Again, the searing light of transparency that continuous deployment gives you forces you to keep your code clean.

6.6 Zero Downtime Deployments

"Zero downtime deployments". That sounds like a pipe dream right? But it is not actually all that hard, assuming that you are always focusing on following a few rules to make it possible. But if you are going to be deploying continuously, especially during business hours when your site is experiencing a lot of traffic, you need to be able do it without the slightest hiccup for your users.

The idea is that you can steadily deploy changes without having to take your site down. For some types of sites, taking the site down for a few minutes or an hour or even a night is not a critical issue. However, many other types of sites start losing money immediately when their site is offline, or it could have crippling effect on a company's business or reputation any time they are down.

The simplest case are web servers. First we'll assume that you have at least two web servers running your website (if not, GO FIX THAT NOW). So you have multiple servers in a web farm behind a load balancer, or better yet multiple web farms. As you update the system, you pull individual servers out of the load balancer, update them, and them put them back. If you have multiple web farms, you can take farm A offline, update it, and then swap the load balancer to point to farm A while you update farm B. Definitely not rocket science there.

Next are application servers. If your application servers are running web services, see the previous paragraph about updating load balanced web sites. If you are introducing any breaking changes to those web services, consider using one of the common API versioning techniques, like introducing a whole new copy under a versioned directory.

For any background processes on the application servers, you need to ensure that they are designed so that one or 1000 of them can be running at a time, and that whatever they are doing, they use a shared data source (such as a database or file share), and have proper queue processing logic to ensure that they are not conflicting with each other. Too often a service or process is built assuming that it is the only one running at a time, which makes it really awkward when it has to stop for whatever reason.

Lastly, are your databases. Web and application servers are usually easy, as you usually have many of them running anyway. Databases are much more difficult, because you have a single source of data that is constant changing. Even if you have a multiple clustered database servers, there is usually only one active instance of the data at a given time. If you take snapshot of a database and update it, you can't just add it right back because it could have changed in the meantime. Even if you are mirroring all of you changes to another database instance, there is always a lag there, and the mirrored database will always be behind, just like a snapshot or backup. In the end, one of the only reliable ways to keep your application running the whole time while updating your database is to do nothing destructive. Always err on the side of adding new fields and tables, while altering and dropping fields as rarely as possible. Even then, if you need to do serious schema refactoring, consider using views or version-specific schemas support the new and old versions at the same time; granted this is quite a pain, but can be a life-saver when you need to really change your database schema. Of course, consider your situation, if this is rare enough it may be worthwhile to punt on these types of changes for a time until you can schedule an explicit downtime to make the changes.

6.7 Constant Smoothing Of Release Process

As we mentioned above, a key part of a solid continuous deployment process is to start early, so that you have as much opportunity of as possible to identity and fix any impediments to the process. You must vigilantly be looking for any case where you need to manually copy files, send emails, run batch scripts, edit configuration files, or even manually exclude a changed file from your source code repository. There is always the next thing that is causing you pain in your deployment process. Find it. Find it any kill it. There is a saying that "the software is not finished until the last user is dead." In this case, you and your development/operations teams are the users of the deployment software, if that is at all comforting.

Most importantly, talk to the rest of the team. Constantly be asking them what is causing them problems and how you can solve those problems. Given how often you are going to be deploying, any slight annoyance in the process will add up to an enormous cost over time if you don't address it. If you find yourself saying that it would be great to automate some specific process but you don't have time, stop and do it anyway. Usually not automating ends up taking even more time in the long run.

Also keep an eye out for automation steps that didn't quite work out or just are not cutting it any more. There could be several places where the first attempt to automate something wasn't just right, or was based on some faulty assumptions, or your applications and environments have just changed making it not a good solution any more. While automating is helpful, you're goal should be optimizing, not automating. Like automated tests, if an automated deployment task doesn't actually help, fix it or replace it, but definitely don't leave it there to continue to suck the life from your team.

And one more key point, this is not just smoothing automation steps. Look for inefficiencies in every step of your process. Is it hard for QA to track what has been fixed and deployed? Does nobody understand which features are turned on or off in production? Do you need too many approvals to perform a deployment? Does Jimmy always mess up the deployments because keeps checking in his wonky hard-coded connection string? Make sure you are reserving some time to review these things and come up with a plan to address all of them. Sprint retrospectives are a great time for this, if you're into that kind of thing.

6.8 Repeatability Is Critical

To be continuously deploying effectively, your deployments must be streamlined and effortless. One of the key rules for this is that you deploy and run the software the same way everywhere. You cannot have a fully automated deployment to DEV and QA, but then a bunch of manual steps for production, and expect to be successful. That may sound obvious, but a lot of the time we become too used to saying "Oh, that UAT environment is a little special, we need to remember to do some extra steps for that server." Every time you find yourself having to do an extra manual step for a specific environment, remember Mooney’s Law Of Guaranteed Failure (TM):

In the software business, every manual process will suffer at least a 10% failure rate, no matter how smart the person executing the process. No amount of documentation or formalization will truly fix this, the only resolution is automation.

6.9 Embrace Failure

One of the trickiest parts of this process is that none of us really know what we are doing. While we all bring our experiences and ideas to the table, at the end of the day you never know what will actually work until you try, which means you will fail, a LOT. Granted, you can frame this as experimentation and iterative learning, but a lot of people may see this as a string of abject failures. You must be OK with this. As we've covered, one of the most important parts of the smoothing process for automating your tests and your deployments is to admit when something is not working, and fix it, replace it, or just get rid of it, and you must be able to do this without looking to blame someone.

Of course, you must be working in an environment where you can experiment with ways to make your company better without worrying about getting a political knife in your back. If you are in one those unpleasant situations, fix that first. We've said several times that you need to adopt your continuous deployment approach to fit your company and environment, but there are always limits. There is no sense figuring out how to get more fiber into your diet while someone is chasing you with an ax.

6.10 Document The Right Things At The Right Time

Many companies require extensive documentation and review before a feature can be built, and then detailed QA test plans are based on this, and the expectation is that the feature will be built exactly based on the documentation, and any deviation from this design requires as Project Change Request form approved by 2 directors and a Vice President. It would be safe to say that this level of documentation is probably not the most efficient way to develop software that can rapidly respond to user's needs. Just about all of this documentation is out of date almost as soon as it's written, and everyone wonders why nothing gets done.

On the other hand, many companies misuse "Agile" as an excuse to eschew documentation all together, and features are built based on one-line descriptions and 5-layer whisper-down-the-lane conversations between the developer building the code and the business user requesting the functionality. Even if you have a short and iterative feedback loop to ensure that the customer can verify the feature is working as they expect, you're putting a lot of faith in the customer can completely remember what they want and why they want it. Too often you have an hour long meeting about why we need to move the reporting functionality button to another screen, covering a whole list of considerations and opinions, and decisions are made, but nobody writes anything down. As someone who can barely remember what I had to breakfast on given day, this is terrifying. Sending out a few notes on what was decided and why would be priceless, whether they go into a wiki or a OneNote/Evernote or even to sit and collect dust in a folder in your email account.

Either of these situations can cause some serious problems for a continuous deployment strategy. You probably want to err on the side of minimizing documentation to reduce the friction of building and shipping features, but you should have at least something that says when something was built, how it was supposed to work, and why it works that way. Many people say that the current working code is the best documentation, but that only tells part of the story. It does not capture why it works that way, or how it used to work, or whether it's even really supposed to be doing that in the first place.

A strategy that I found works very well is to keep a OneNote notebook on a shared drive somewhere, and create a new entry for each meeting or feature. Then, anytime you are discussing how or why a feature works a certain way, just transcribe/summarize what is being said and what is being decided. Then in 6 months when someone asks why the user management screen is so ugly, you can search your notes for any mention of it, find the latest notes, and give an actual meaningful answer.

7 Conclusion

So that's probably a lot to take in. Again, the goal here is not to dictate a set of steps that you must follow to achieve continues deployment nirvana, but rather to highlight a lot of benefits of the a continuous deployment approach, and some practical steps to work towards that goal. Don't worry too much about how you are measuring up to other companies, there is no prize for having the coolest deployment process (or actually there probably is, but that's not important right now).

In the end, never forget that the single important thing is that you are building the best software that you can, as efficiently as possible, and delivering it to your users as effectively as possible, while flexibly responding to the needs of the business and users. Deploying to production 10 times per day may help you accomplish that, but for a lot of companies it would just be overkill and a waste of resources. You need to find what works best for you company. But the key point is that you would probably be better served by shortening up your release cycles to the point where they are obsolete, which means deploying a lot more often than you are now.