Throughout the lifecycle of your Kubernetes cluster, you may need to access a cluster worker node. This access could be for maintenance, configuration inspection, log collection, or other troubleshooting operations. More than that, it would be nice, if you could enable this access whenever it’s needed and disable when you finish your task.

SSH Approach

While it’s possible to configure Kubernetes nodes with SSH access, this also makes worker nodes more vulnerable. Using SSH requires a network connection between the engineer’s machine and the EC2 instance, something you may want to avoid. Some users set up a jump server (also called bastion host) as a typical pattern to minimize the attack surface from the Internet. But this approach still requires from you to manage access to the bastion servers and protect SSH keys. IMHO, managing supporting SSH infrastructure, is a high price to pay, especially if you just wanted to get a shell access to a worker node or to run some commands.

Kubernetes Approach

The Kubernetes command line tool, kubectl , allows you to run different commands against a Kubernetes cluster. You can manipulate Kubernetes API objects, manage worker nodes, inspect cluster, execute commands inside running container, and get an interactive shell to a running container.

Suppose you have a pod , named shell-demo . To get a shell to the running container on this pod , just run:

kubectl exec -it shell-demo -- /bin/bash # see shell prompt ...

root@shell-demo:/#

How Does exec Work?

kubectl exec invokes Kubernetes API Server and it "asks" a Kubelet "node agent" to run an exec command against CRI (Container Runtime Interface), most frequently it is a Docker runtime.

The docker exec API/command creates a new process, sets its namespaces to a target container's namespaces and then executes the requested command, handling also input and output streams for created process.

The Idea

A Linux system starts out with a single namespace of each type (mount, process, ipc, network, UTS, and user), used by all processes.

So, we need to do is to run a new pod , and connect it to a worker node host namespaces.

A Helper Program

It is possible to use any Docker image with shell on board as a “host shell” container. There is one limitation, you should be aware of — it’s not possible to join mount namespace of target container (or host).

The is a small program from util-linux package, that can run program with namespaces (and cgroups ) of other processes. Exactly what we need!

Most Linux distros ship with an outdated version of util-linux . So, I prepared the alexeiled/nsenter Docker image with nsenter program on-board. This is a super small Docker image, of 900K size, created from scratch image and a single statically linked nsenter binary ( v2.34 ).

Use the helper script below, also available in alexei-led/nsenter GitHub repository, to run a new nsenter pod on specified Kubernetes worker node. This helper script create a privileged nsenter pod in a host's process and network namespaces, running nsenter with --all flag, joining all namespaces and cgroups and running a default shell as a superuser (with su - command).

The nodeSelector makes it possible to specify a target Kubernetes node to run nsenter pod on. The "tolerations": [{"operator": "Exists"}] parameter helps to match any node taint , if specified.

Helper script

# get cluster nodes

kubectl get nodes # output

NAME STATUS AGE

ip-192-168-151-104.us-west-2.compute.internal Ready 8d

ip-192-168-171-140.us-west-2.compute.internal Ready 7d11h # open superuser shell on specified node

./nsenter-node.sh ip-192-168-151-104.us-west-2.compute.internal # prompt

[root@ip-192-168-151-104 ~]# # pod will be destroyed on exit

...

nsenter-node.sh

#!/bin/sh

set -x node=${1}

nodeName=$(kubectl get node ${node} -o template --template='{{index .metadata.labels "kubernetes.io/hostname"}}')

nodeSelector='"nodeSelector": { "kubernetes.io/hostname": "'${nodeName:?}'" },'

podName=${USER}-nsenter-${node} kubectl run ${podName:?} --restart=Never -it --rm --image overriden --overrides '

{

"spec": {

"hostPID": true,

"hostNetwork": true,

'"${nodeSelector?}"'

"tolerations": [{

"operator": "Exists"

}],

"containers": [

{

"name": "nsenter",

"image": "alexeiled/nsenter:2.34",

"command": [

"/nsenter", "--all", "--target=1", "--", "su", "-"

],

"stdin": true,

"tty": true,

"securityContext": {

"privileged": true

}

}

]

}

}' --attach "$@"

Management of Kubernetes worker nodes on AWS

When running a Kubernetes cluster on AWS, Amazon EKS or self-managed Kubernetes cluster, it is possible to manage Kubernetes nodes with [AWS Systems Manager] https://aws.amazon.com/systems-manager/). Using AWS Systems Manager (AWS SSM), you can automate multiple management tasks, apply patches and updates, run commands, and access shell on any managed node, without a need of maintaining SSH infrastructure.

In order to manage a Kubernetes node (AWS EC2 host), you need to install and start a SSM Agent daemon, see AWS documentation for more details.

But we are taking a Kubernetes approach, and this means we are going to run a SSM Agent as a daemonset on every Kubernetes node in a cluster. This approach allows you to run an updated version SSM Agent without a need to install it into a host machine and do it only when needed.

Pre-request

Option 1 (more secure)

It is possible to associate AWS IAM role with a Kubernetes service account and use this service account to run SSM Agent DaemonSet.

This is the most secure option. You assign AmazonEC2RoleforSSM IAM role to SSM Agent only and create SSM DaemonSet when you need to access cluster nodes. You can also target specific nodes with nodeSelector .

Create a new Kubernetes service account ( ssm-sa for example) and connect it to IAM role with the AmazonEC2RoleforSSM policy attached.

$ export CLUSTER_NAME=gaia-kube

$ export SA_NAME=ssm-sa # setup IAM OIDC provider for EKS cluster

$ eksctl utils associate-iam-oidc-provider — region=us-west-2 — name=$CLUSTER_NAME — approve # create K8s service account linked to IAM role in kube-system namespace

$ eksctl create iamserviceaccount — name $SA_NAME — cluster $CLUSTER_NAME — namespace kube-system \

— attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM \

— override-existing-serviceaccounts \

— approve [ℹ] using region us-west-2

[ℹ] 1 iamserviceaccount (kube-system/ssm-sa) was included (based on the include/exclude rules)

[!] serviceaccounts that exists in Kubernetes will be excluded, use — override-existing-serviceaccounts to override

[ℹ] 1 task: { 2 sequential sub-tasks: { create IAM role for serviceaccount “kube-system/ssm-sa”, create serviceaccount “kube-system/ssm-sa” } }

[ℹ] building iamserviceaccount stack “eksctl-gaia-kube-addon-iamserviceaccount-kube-system-ssm-sa”

[ℹ] deploying stack “eksctl-gaia-kube-addon-iamserviceaccount-kube-system-ssm-sa”

[ℹ] created serviceaccount “kube-system/ssm-sa”

Configure the SSM DaemonSet to use this service account.

apiVersion: apps/v1

kind: DaemonSet

metadata:

name: ssm-agent

labels:

k8s-app: ssm-agent

namespace: kube-syste

spec:

…

template:

…

spec:

serviceAccountName: ssm-sa

containers:

— image: alexeiled/aws-ssm-agent

name: ssm-agent

…

Now, deploy the SSM DaemonSet and access your cluster nodes.

kubectl create -f daemonset.xml

Option 2 (less secure)

First, you need to attach the AmazonEC2RoleforSSM policy to Kubernetes worker nodes instance role. Without this policy, you wont be able to manage Kubernetes worker nodes with AWS SSM.

Setup

Then, clone the alexei-led/kube-ssm-agent GitHub repository. It contains a properly configured SSM Agent daemonset file.

The daemonset uses the alexeiled/aws-ssm-agent:<ver> Docker image that contains:

AWS SSM Agent, the same version as Docker image tag Docker CLI client AWS CLI client Vim and additional useful programs

Run to deploy a new SSM Agent daemonset:

kubectl create -f daemonset.yaml

Once SSM Agent daemonset is running you can run any aws ssm command.

Run to start a new SSM terminal session:

AWS_DEFAULT_REGION=us-west-2 aws ssm start-session --target <instance-id> starting session with SessionId: ... sh-4.2$ ls

sh-4.2$ pwd

/opt/amazon/ssm

sh-4.2$ bash -i

[ssm-user@ip-192-168-84-111 ssm]$ [ssm-user@ip-192-168-84-111 ssm]$ exit

sh-4.2$ exit Exiting session with sessionId: ...

The daemonset.yaml file



kind: DaemonSet

metadata:

name: ssm-agent

labels:

k8s-app: ssm-agent

namespace: kube-system

spec:

selector:

matchLabels:

name: ssm-agent

template:

metadata:

labels:

name: ssm-agent

spec:

# use IAM role associated with K8s service

serviceAccountName: ssm-sa

# join host network namespace

hostNetwork: true

# join host process namespace

hostPID: true

# join host IPC namespace

hostIPC: true

# tolerations

tolerations:

- effect: NoExecute

operator: Exists

- effect: NoSchedule

operator: Exists

containers:

- image: alexeiled/aws-ssm-agent:2.3.680

imagePullPolicy: Always

name: ssm-agent

securityContext:

runAsUser: 0

privileged: true

volumeMounts:

# Allows systemctl to communicate with the systemd running on the host

- name: dbus

mountPath: /var/run/dbus

- name: run-systemd

mountPath: /run/systemd

# Allows to peek into systemd units that are baked into the official EKS AMI

- name: etc-systemd

mountPath: /etc/systemd

# This is needed in order to fetch logs NOT managed by journald

# journallog is stored only in memory by default, so we need

#

# If all you need is access to persistent journals, /var/log/journal/* would be enough

# FYI, the volatile log store /var/run/journal was empty on my nodes. Perhaps it isn't used in Amazon Linux 2 / EKS AMI?

# See

- name: var-log

mountPath: /var/log

- name: var-run

mountPath: /var/run

- name: run

mountPath: /run

- name: usr-lib-systemd

mountPath: /usr/lib/systemd

- name: etc-machine-id

mountPath: /etc/machine-id

- name: etc-sudoers

mountPath: /etc/sudoers.d

volumes:

# for systemctl to systemd access

- name: dbus

hostPath:

path: /var/run/dbus

type: Directory

- name: run-systemd

hostPath:

path: /run/systemd

type: Directory

- name: etc-systemd

hostPath:

path: /etc/systemd

type: Directory

- name: var-log

hostPath:

path: /var/log

type: Directory

# mainly for dockerd access via /var/run/docker.sock

- name: var-run

hostPath:

path: /var/run

type: Directory

# var-run implies you also need this, because

# /var/run is a synmlink to /run

# sh-4.2$ ls -lah /var/run

# lrwxrwxrwx 1 root root 6 Nov 14 07:22 /var/run -> ../run

- name: run

hostPath:

path: /run

type: Directory

- name: usr-lib-systemd

hostPath:

path: /usr/lib/systemd

type: Directory

# Required by journalctl to locate the current boot.

# If omitted, journalctl is unable to locate host's current boot journal

- name: etc-machine-id

hostPath:

path: /etc/machine-id

type: File

# Avoid this error > ERROR [MessageGatewayService] Failed to add ssm-user to sudoers file: open /etc/sudoers.d/ssm-agent-users: no such file or directory

- name: etc-sudoers

hostPath:

path: /etc/sudoers.d

type: Directory apiVersion: apps/v1kind: DaemonSetmetadata:name: ssm-agentlabels:k8s-app: ssm-agentnamespace: kube-systemspec:selector:matchLabels:name: ssm-agenttemplate:metadata:labels:name: ssm-agentspec:# use IAM role associated with K8s serviceserviceAccountName: ssm-sa# join host network namespacehostNetwork: true# join host process namespacehostPID: true# join host IPC namespacehostIPC: true# tolerationstolerations:- effect: NoExecuteoperator: Exists- effect: NoScheduleoperator: Existscontainers:- image: alexeiled/aws-ssm-agent:2.3.680imagePullPolicy: Alwaysname: ssm-agentsecurityContext:runAsUser: 0privileged: truevolumeMounts:# Allows systemctl to communicate with the systemd running on the host- name: dbusmountPath: /var/run/dbus- name: run-systemdmountPath: /run/systemd# Allows to peek into systemd units that are baked into the official EKS AMI- name: etc-systemdmountPath: /etc/systemd# This is needed in order to fetch logs NOT managed by journald# journallog is stored only in memory by default, so we need# If all you need is access to persistent journals, /var/log/journal/* would be enough# FYI, the volatile log store /var/run/journal was empty on my nodes. Perhaps it isn't used in Amazon Linux 2 / EKS AMI?# See https://askubuntu.com/a/1082910 for more background- name: var-logmountPath: /var/log- name: var-runmountPath: /var/run- name: runmountPath: /run- name: usr-lib-systemdmountPath: /usr/lib/systemd- name: etc-machine-idmountPath: /etc/machine-id- name: etc-sudoersmountPath: /etc/sudoers.dvolumes:# for systemctl to systemd access- name: dbushostPath:path: /var/run/dbustype: Directory- name: run-systemdhostPath:path: /run/systemdtype: Directory- name: etc-systemdhostPath:path: /etc/systemdtype: Directory- name: var-loghostPath:path: /var/logtype: Directory# mainly for dockerd access via /var/run/docker.sock- name: var-runhostPath:path: /var/runtype: Directory# var-run implies you also need this, because# /var/run is a synmlink to /run# sh-4.2$ ls -lah /var/run# lrwxrwxrwx 1 root root 6 Nov 14 07:22 /var/run -> ../run- name: runhostPath:path: /runtype: Directory- name: usr-lib-systemdhostPath:path: /usr/lib/systemdtype: Directory# Required by journalctl to locate the current boot.# If omitted, journalctl is unable to locate host's current boot journal- name: etc-machine-idhostPath:path: /etc/machine-idtype: File# Avoid this error > ERROR [MessageGatewayService] Failed to add ssm-user to sudoers file: open /etc/sudoers.d/ssm-agent-users: no such file or directory- name: etc-sudoershostPath:path: /etc/sudoers.dtype: Directory

Summary

As you see, it’s relatively easy to manage Kubernetes nodes in a pure Kubernetes way, without taking unnecessary risks and managing complex SSH infrastructure.

Reference

- alexeiled/nsenter Docker image

- alexei-led/nsenter GitHub repository

- nsenter man page

- alexei-led/kube-ssm-agent SSM Agent for Amazon EKS