You need experienced people to support your infrastructure and solve complex problems. That said, it makes less sense to hire in-house experts to simply push the same buttons to do daily upgrades and to grow your infrastructure on an as-needed basis. And what happens one day when those in-house experts decide to leave?

That’s why automated infrastructure is key. It’s about saving time while having fewer dependencies. Let’s examine three areas in more detail: config management, deployment automation and auto scaling.

Config Management

Config management once focused on applications, but today your infrastructure needs to be under config management so you can build in fault tolerance. Using the AWS example, if your infrastructure is under config management and you can run it in a standard, automated way, all you would have to do is decide to rebuild your infrastructure in another region (say, us-west), and your infrastructure would automatically be rebuilt.

If your infrastructure changes, your applications can break, and vice versa. In the past, a systems operator would make a change to the server, and developers might then deploy code that wouldn’t work. When the infrastructure isn’t under config management, code isn’t always tested alongside infrastructure changes.

The same principle applies in DevOps, where everything changes on a daily basis. Nothing ever remains the same, so it’s best to stay on the front lines. Without automation, you need to do upgrades by hand, which could take 30–40 minutes daily and act as a huge drain on your resources.

Deployment Automation

If your cluster is under config management and you use a tool like kops (kubernetes-ops, the one-stop, open source solution for deploying Kubernetes clusters from the command line), then upgrades are automatic and can save a ton of time, as well as reduce human errors.

That brings us to a larger discussion about deployment automation.

Your application and your infrastructure have to mesh, which is where DevOps comes in. Automating your application deployment strengthens your infrastructure. If the application code is containerized and deployed to your chosen environments the same way every time, then you don’t have to wonder what went wrong — and where. Automation gives you a clear path, and if you follow it you’ll rapidly identify issues and track problems to their source.

Automated deployments also save onboarding time for new developers. Developers used to write applications, test them, load them on the server, configure them, run them and then fix them when they were broken. It was a time-consuming process. Now developers can work on the code, and automation takes care of the rest.

Automated Scaling

Automated scaling helps you ensure that you have the correct number of instances available to handle the load for your application. You can designate the minimum and maximum number of instances (or a specific number of instances) in auto scaling groups, and auto scaling automatically ensures that your group meets those criteria.

Let’s say you put all your application containers in one cluster filling the cluster completely. If the process of adding new resources isn’t automated, an operator has to spin up another node in the cluster before you can start using those resources. This manual process takes time. If the process is automated, you fill your cluster with a bunch of containers, and when they no longer fit the cluster knows more resources are needed and will spin up a new node automatically.

Kubernetes can auto scale your infrastructure at the container level automatically. Say you have 10 nodes in a cluster and you normally only need to run 10 replicas of your application. Your commercial just aired on Good Morning America, and all of a sudden your website visitors go through the roof. Those 10 containers of your application are no longer sufficient, but there is still plenty of space (resources) available on your 10-node cluster. Rather than scaling a new instance, which can take 5–7 minutes, Kubernetes can scale a container in 15–30 seconds. You can grow your infrastructure as needed, and all it’s doing is putting more containers in the cluster. As long as you have enough space, it handles your growth needs for you.

Fault Tolerance

As previously mentioned, fault tolerance is important as well. With human scaling, what happens if (and when) the operator doesn’t notice something? If you don’t know there’s a spike in traffic coming, you certainly can’t get ahead of it. An automated container cluster like Kubernetes will notice changes in behavior for you.

Say you get loads of requests on a certain application container, and it’s starting to run out of memory and dies. If this scenario occurs on a regular server instance, your application would die too. If you have an automated cluster, then you also have automated fault tolerance. If your container is overwhelmed and runs out of memory and dies, the cluster will notice (it knows to watch for such occurrences), and it’ll spin up a new replica automatically.

Kubernetes can do the same thing at the instance level. Say you have a big pool of hardware resources and AWS terminates an instance. Without automation, if the server were to die and a bunch of containers are running on it, then somebody has to first notice and then figure out what to do (spin up a new cluster node and reschedule the containers onto it). With automation, Kubernetes will recognize an issue, preemptively start a new cluster and move resources over. Kubernetes will also wait to see if the cluster comes back, and if it does it will cancel the node scaling so as to not waste resources. This kind of self-healing means a lot of problems are solved efficiently and cost effectively — without manual effort.

Further, Kubernetes has something called a Kubernetes Cluster Federation, which enables users to group multiple clusters across different regions into a single logical compute federation. Such a federation simplifies the deployment of highly available, geographically distributed services. If an entire region were to go down, then all of your traffic would shift to the other region.