Try-out

Three Kubernetes clusters in different regions are created. One of the cluster ‘cluster-atlanta’ is the primary scheduler-cluster (contains multicluster-scheduler) and ‘cluster-seattle’, ‘cluster-dallas’ are the member clusters (contains multicluster-scheduler-agent). As some or all of the delegate pods are scheduled on different clusters, there is a need for the service to route traffic to them.

Cilum cluster mesh and global services can perform global routing across clusters. Services that target proxy pods are rerouted to their delegates, replicated across clusters, multicluster-scheduler will annotate the service with ‘io.cilium/global-service=true’ and replicate it across clusters which will be load-balanced across a Cilium cluster mesh.

Multicluster/Multiregion Topology

Multicluster Topology with Cilium Cluster Mesh

A primary scheduler cluster contains a multicluster-scheduler and an agent can be also deployed on the same cluster alongside a scheduler. In the topology above ‘cluster-atlanta’ acts as a primary cluster:

Multicluster Scheduler and Agent on cluster-atlanta

A virtual-kubelet node is created as multicluster-scheduler is deployed on the cluster alongside the actual Kubernetes node.

Virtual Kubelet Node created by Multicluster Scheduler

Virtual Kubelet Node created by Multicluster Scheduler

As shown above cluster-atlanta contains a virtual-kubelet node called admiralty.

The multicluster-scheduler on primary cluster should be able to talk to member cluster’s Kubernetes API servers. Service account tokens are extracted from the member clusters as kubeconfig files, and saved as secrets in the scheduler’s cluster.

Admiralty’s Multicluster-service-account, a part of multicluster toolkit enables users to call the Kubernetes APIs of other clusters and import remote service account tokens into local secrets, and auto mounts them inside the annotated pods.

cluster-atlanta being the primary cluster contain the service accounts of cluster-dallas and cluster-seattle as shown below:

Service Accounts of Member clusters exported as secrets using MCSA

Service Accounts of Member cluster as a Kubernetes Secret

A configmap mounted to multicluster-scheduler deployment holds the secret information as shown below:

Scheduler ConfigMap with Service Account Information

Apart from the service accounts, all member clusters are joined to the primary cluster using invitations. Invitations create (cluster) role bindings between users named after the invited clusters, e.g., admiralty:cluster-seattle, and the multicluster-scheduler-agent-for-cluster cluster role. The scheduler impersonates the users created when creating/updating/deleting delegate pods and services on behalf of invited clusters.

Cluster Roles Created for all Cluster users

With multicluster-scheduler and agents deployed on the three clusters, creating a deployment of nginx with five replicas and annotation: ‘multicluster.admiralty.io/elect’ on one the member cluster ‘cluster-seattle’:

Pod annotation to enable multicluster-scheduling

This configuration launches five proxy pods on cluster2-seattle scheduled on the virtual-kubelet node:

Proxy pods scheduled on VIrtual Kubelet Node

As seen in the proxy-pod configuration below, each proxy pod will have a reference to the delegate pod manifest running in other member clusters as proxy pods reflect the spec of delegate-pods (actual running containers). Below, the pod spec contains annotation specifying the delegate-pod cluster and sourcepod-manifest:

Proxy pod -> Delegate pod Reference

Proxy pod with sourcepod-manifest as an annotation

Three delegate-pods (nginx replicas) are scheduled on cluster-atlanta and other two are scheduled on cluster-dallas:

Proxy pods created on cluster-seattle and corresponding Delegate pods on cluster-atlanta and cluster-dallas

As shown above, delegate pods on member clusters atalanta and dallas are scheduled on the Kubernetes nodes as these are the actual running/functional containers. Every delegate-pod configuration contains mapping to the parent-uid (proxy pod uid) and the controller reference as shown below.

Three replicas (delegate pods) on ‘cluster-atlanta’ scheduled on Kubernetes node ‘cluster1-atlanta’ (all the three clusters are AIO Kubernetes clusters with one node).

Delegate pods created on cluster-atlanta (as it includes both scheduler and agent)

As shown below, all the delegate pods on cluster-atlanta contain a label with reference to parent (proxy pod) running on cluster-seattle.

Delegate pod -> Proxy pod parent-uid mapping

The annotation ‘multicluster.admiralty.io/controller-reference’ includes the mapping information (delegate pod — -> proxy pod).

Delegate pod controller-reference mapping to Proxy pod

As shown above the delegate pod nginx-5466* running on cluster-atlanta contains a parent-uid of proxy pod nginx-5466* running on cluster-seattle.

Delegate pod controller-reference mapping to Proxy pod

Enforcing Placement — Cluster specific scheduling

Multicluster-scheduler facilitates users to specify a target cluster, rather than letting the scheduler decide. Enforcing placement can be done using the ‘multicluster.admiralty.io/clustername’ annotation.

A Nginx deployment is deployed with clustername annotation mapping to clustername: cluster3-dallas as target cluster from cluster2-seattle:

Enforcing Placement using clustername in pod annotation

This configuration creates all five delegate pods (running containers) on cluster3-dallas and all five proxy pods are created on cluster2-seattle.

Enforcing placement — creating all delegate pods on cluster-dallas

proxy pods on cluster2-seattle: