By Andrew Chen and Dominik Tornow

Kubernetes is a Container Orchestration Engine designed to host containerized applications on a set of nodes, commonly referred to as a cluster. Using a systems modeling approach this series aims to advance the understanding of Kubernetes and its underlying concepts.

For this blog post, a basic understanding of Kubernetes is recommended.

Kubernetes is a scalable and reliable Container Orchestration Engine. Here scalability is defined as responsiveness in the presence of load, whereas reliability is defined as responsiveness in the presence of failure.

Note that the scalability and reliability of Kubernetes does not imply the scalability and reliability of an application hosted on Kubernetes. Kubernetes is a scalable and reliable platform, yet any application hosted on Kubernetes has to take its own steps towards scalability and reliability and therefore carefully avoid bottle necks and single points of failure.

For example, if an application is deployed as a Kubernetes ReplicaSet or a Kubernetes Deployment, Kubernetes (re)schedules and (re)executes Pods affected by node failures. However, if an application is deployed as Pods, Kubernetes will take no action in the presence of node failures.

Therefore, while Kubernetes itself remains responsive, the responsiveness of your application depends on your design and deployment decisions.

This blog post focuses on the reliability of Kubernetes and explores how Kubernetes maintains responsiveness in the presence of failure.

Kubernetes Architecture

Figure 1. Master & Worker

Conceptually, Kubernetes’ components are grouped into two distinct classes, the Master Components and the Worker Components.

Masters are responsible for managing everything except the execution of Pods. Master components include

Workers are responsible for managing the execution of Pods. Worker components include

Workers are trivially reliable: The temporary or permanent failure of any worker in the cluster does not affect the master or other workers in the cluster. If your application is deployed accordingly, Kubernetes (re)schedules and (re)executes any Pod that is affected by a worker failure.

Single Master Configuration

Figure 2. Single Master Configuration

In a single master configuration, the Kubernetes cluster consists of one master and multiple workers. Workers directly connect to and communicate with the master’s kube-apiserver.

In this configuration Kubernetes responsiveness depends on

the single master, and

the connection from the workers to the single master

Due to the single master being a single point of failure, the single master configuration is not considered to be a high availability configuration.

Multiple Master Configuration

Figure 3. Multiple Master Configuration

In a multiple master configuration, the Kubernetes cluster consists of multiple masters and multiple workers. Workers connect to and communicate with any master’s kube-apiserver via a high availability load balancer.

In this configuration, Kubernetes does not depend on

a single master, or

a connection from the workers to a single master

Due to the absence of a single point of failure the multiple master configuration is considered to be a high availability configuration.

Kubernetes Leader vs Follower

In a multiple master configuration, there are multiple kube-controller-managers and kube-schedulers. Conflicts may arise if two components modify the same objects .

Therefore, to avoid potential conflicts, Kubernetes implements the leader/follower pattern for kube-controller-manager and kube-scheduler. Each group elects one leader, then the other group members assume follower roles. At any point in time, only the leader is active; the followers are passive.

Figure 4. Redundant Deployment of Master Components In Depth

Figure 4. illustrates a detailed example where kube-controller-1 and kube-scheduler-2 are the leaders of the kube-controller-managers and kube-schedulers. Because each group elects its own leader, the leaders do not necessarily reside in the same master.

Leader Election

On startup, or in case of a leader failure, a new leader is elected from the members of the group. Leader is the member holding the Leader Lease.

Figure 5. Master Component Leader Election Process

Figure 5. depicts the leader election process of kube-controller-manager and kube-scheduler. The logic of the leader election process is as follows:

'Try acquire lease' is successful if and only if

- the Leader Lease does not exist or

- the Leader Lease is timed out 'Try renew lease' is successful if and only if

- the Leader Lease exists and

- the Leader Lease is not timed out and

- the Leader Lease holderIdentity is 'self'

Keeping Track of Leaders

Leader leases of kube-controller-manager and kube-scheduler are persisted in the Kubernetes Object Store as Kubernetes Endpoints Objects in the kube-system namespace. Because no two Kubernetes objects can have the same name, kind, and namespace at the same time, at most one kube-scheduler and one kube-controller-manager Endpoints may exist.

This can be demonstrated using the kubectl command line tool.

$kubectl get endpoints -n kube-system NAME ENDPOINTS AGE

kube-scheduler <none> 30m

kube-controller-manager <none> 30m

The Endpoints kube-scheduler and kube-controller-manager store the leader information in the control-plane.alpha.kubernetes.io/leader annotation.

$ kubectl describe endpoints kube-scheduler -n kube-system Name: kube-scheduler

Annotations: control-plane.alpha.kubernetes.io/leader=

{

"holderIdentity": "scheduler-2",

"leaseDurationSeconds": 15,

"acquireTime": "2018-01-01T08:00:00Z"

"renewTime": "2018-01-01T08:00:30Z"

}

Although Kubernetes guarantees that there will only be one leader at a time, Kubernetes does not guarantee that two or more master components may not mistakenly believe to be the leader at the same time, a situation commonly referred to as split brain.

An enlightening discussion about split brain and possible remedies can be found in Martin Kleppmann’s article How to do distributed locking.

Kubernetes does not employ any counter measures to guard against split brain. Instead Kubernetes relies on its ability to converge to the desired state over time mitigating the effects of conflicting decisions.

Conclusion

In a multiple master configuration, Kubernetes is a scalable and reliable Container Orchestration Engine. In this configuration Kubernetes ensures reliability by utilizing multiple masters and multiple workers alongside each other. Multiple masters operate according to a leader/follower pattern, whereas multiple workers operate concurrently. Kubernetes implements a custom leader election process, storing leader information as Kubernetes Endpoints Objects.

For information regarding the provisioning of a high availability Kubernetes cluster, refer to the official documentation.

About this post

This blog post is part of a collaborative effort between the CNCF, Google, and SAP to advance the understanding of Kubernetes and its underlying concepts.