Since 2014, Insight has helped nearly 1,000 engineers transition to data careers. Recognizing the emergence of DevOps and importance of site reliability engineering, Insight recently created a DevOps program to provide Fellows hands-on engineering experience in many of these tools and techniques that will allow them to transition to DevOps roles.

In the last couple of years, the term DevOps has become ubiquitous in engineering and operational teams. The demand for agile yet fault tolerant and scalable platforms has risen dramatically. Simultaneously, a plethora of new tools and best practices have emerged that help build such platforms. To adjust to these changes, more teams are restructuring their engineering and operation teams to incorporate a new culture and workflow.

But what really is DevOps? What skills are necessary to be a successful DevOps, site reliability, infrastructure or data platform engineer? The goal of this blog post is to shed light on the current DevOps landscape and serve as a first stepping stone toward learning more.

What is DevOps?

The term DevOps originated in Belgium in 2009. There is no shortage of definitions, many of which could be summarized as follows:

Rather than describing a role or a set of tools, DevOps is a holistic approach to unifying software engineering and software operations.

This culture was born out of necessity: When operations and software engineering were strictly separated, the different goals of operations (e.g. reliability) and development (e.g. features) led to conflicts. This problem was exacerbated by the popularity of agile software development. While software engineers wanted to iterate quickly and ship code frequently, system administrators could not manually deploy code quickly without sacrificing the stability of their platform. This issue led to internal conflicts and lots of unnecessary finger pointing.

One of the first companies to tackle this problem was Google when they coined the term, site reliability engineer. Their philosophy was to let software engineers be responsible for operations and allow them to automate processes as much as possible. This includes automatic recovery of crashed web servers and automatic deployment of the newest versions of Google’s apps.

Action items:

Read this great blog post by Cindy Sridharan about the DevOps movement.

Read Part 1 of Google’s free SRE book.

The DevOps Landscape

We recommend first getting a bird’s eye view on the concepts and technologies that are considered part of DevOps. This includes:

Source Control/Versioning: Git is the de-facto standard for version control of source code. Source control is vital for enabling software engineers to work in a collaborative environment.

Git is the de-facto standard for version control of source code. Source control is vital for enabling software engineers to work in a collaborative environment. Infrastructure as Code (IaC): The main idea behind IaC is to write code that deploys your infrastructure. This allows the transfer of many useful concepts of software engineering (versioning, modularity, unit testing) to the realm of infrastructure. Terraform is the most popular tool for IaC.

The main idea behind IaC is to write code that deploys your infrastructure. This allows the transfer of many useful concepts of software engineering (versioning, modularity, unit testing) to the realm of infrastructure. Terraform is the most popular tool for IaC. Configuration Management (CM): Once we deploy our servers using IaC, we need to make sure they are properly configured. It is the goal of configuration management to automate and unify this process. Common tools include Puppet, Chef and Ansible.

Once we deploy our servers using IaC, we need to make sure they are properly configured. It is the goal of configuration management to automate and unify this process. Common tools include Puppet, Chef and Ansible. Service Discovery: If we have an agile data platform where servers are created and destroyed on the fly, we need a service that keeps track of what is currently available. Service discovery tools such as Consul offer this functionality.

If we have an agile data platform where servers are created and destroyed on the fly, we need a service that keeps track of what is currently available. Service discovery tools such as Consul offer this functionality. Container and Orchestration: Especially popular for microservice architectures, containers have emerged as a powerful alternative to deploying services on virtual machines. Container technologies such as Docker in combination with orchestration tools such as Kubernetes offer a powerful platform that includes CM and service discovery.

Especially popular for microservice architectures, containers have emerged as a powerful alternative to deploying services on virtual machines. Container technologies such as Docker in combination with orchestration tools such as Kubernetes offer a powerful platform that includes CM and service discovery. Monitoring, Observability, Distributed Tracing and Log Aggregation: When managing complex data platforms, monitoring the health of the systems becomes a difficult task. In the case of a failure or poor performance, it also becomes challenging to find the source of the issue. DevOps engineers use tools such as Prometheus, Honeycomb, Zipkin or the ELK stack to collect metrics and trace logs to understand the behavior of complex platforms.

When managing complex data platforms, monitoring the health of the systems becomes a difficult task. In the case of a failure or poor performance, it also becomes challenging to find the source of the issue. DevOps engineers use tools such as Prometheus, Honeycomb, Zipkin or the ELK stack to collect metrics and trace logs to understand the behavior of complex platforms. Continuous Integration/Continuous Deployment (CI/CD): In order to enable an agile development environment, the integration of new code and its subsequent deployment into production needs to be automated. This process is subsumed under CI/CD. Some advanced topics include integration testing and canary deployment.

Action item:

Read the following articles to get a deeper understanding of the concepts discussed above:

Systems Engineering Fundamentals

In order to manage a complex data platform, a fundamental understanding of operating systems (especially Linux-based ones) is essential. As a DevOps engineer, you will spend a lot of time debugging applications — this requires an understanding of threads, processes, memory management, etc.

Action item:

Work through this tutorial on operating systems.

As you will work on distributed systems, it is just as important to understand the fundamentals of networking. This is important for setting up a secure platform as well as building proficiency in debugging distributed platforms.

Action Item:

Coding/Scripting Fundamentals

A core piece of DevOps is to include software engineering principles and concepts into operations. That means it is vital for a DevOps engineer to be knowledgeable in software engineering and coding best practices. If you feel a little rusty on the basics, it is a good idea to brush up on these concepts.

Action items:

Read this blog post on coding fundamentals.

Code review a small codebase (preferably one of your own old codebases) and try to improve it.

Many DevOps tools are written in the Go programming language. While it is not necessary to be proficient in Go, it is a good idea to at least know the basics.

Action item:

Use A Tour of Go and Go by Example to learn the basics of Go

Lastly, in order to efficiently parse and analyze log files, it is important to be proficient in scripting. Two most common languages to analyze log files are Bash and Python. You should at least be familiar with one of them. An important tool to parse log files are regular expressions and you should know how to use them in Bash or in Python.

Action items:

Further Reading

A couple of useful resources for DevOps include: