One thing that’s been a mystery to me. How does Kubernetes know to remove load from a unresponsive Node? Vaguely I thought it has something to do with Kubelet as that is the component of Kubernetes that deals with the Node.

But exactly how does this proccess work?

We did some digging to find out.

Our Nodes and Their secrets on Unsplash

Digging

https://kubernetes.io/docs/concepts/architecture/nodes/#condition

The third is monitoring the nodes' health. The node controller is responsible for updating the NodeReady condition of NodeStatus to ConditionUnknown when a node becomes unreachable (i.e. the node controller stops receiving heartbeats for some reason, e.g. due to the node being down), and then later evicting all the pods from the node (using graceful termination) if the node continues to be unreachable. (The default timeouts are 40s to start reporting ConditionUnknown and 5m after that to start evicting pods.) The node controller checks the state of each node every --node-monitor-period seconds.

After more digging and some command line foo.

Steps:

1. Node is tainted with Disk Pressure (or other status)

2. Status of node becomes `Ready -> Unknown`

3. Pods are scheduled for deletion

4. Pods are evicted

5. Nodes fail the Kubelet health checks and are removed from the cluster

6. Health check fails and Node is kicked out (depending on where health check is EC2, ELB or ASG) (edited)

7. Kubelet has /healthz endpoint which will be health checked in ALB target group

https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/



admin@ip-10.10.10.10~$ sudo netstat -tulpn | grep kubelet

tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 3861/kubelet

tcp 0 0 127.0.0.1:33717 0.0.0.0:* LISTEN 3861/kubelet

tcp6 0 0 :::10250 :::* LISTEN 3861/kubelet

tcp6 0 0 :::10255 :::* LISTEN 3861/kubelet admin@ip-10.10.10.10:~$ curl 127.0.0.1:10248

404 page not found admin@ip-10.10.10.10:~$ curl 127.0.0.1:10248/healthz

ok

And that’s it kubelet exposes a /healthz endpoint! evaluated by either EC2, ELB or ASG in Kops’s case its evaluated in the target group via ASG

One mystery solved!

We’ll publish more of Kubernetes Mysteries and how it all works if you leave a comment below which part of Kubernetes you least understand :)

Follow us on Twitter 🐦 and Facebook 👥 and join our Facebook Group 💬.

To join our community Slack 🗣️ and read our weekly Faun topics 🗞️, click here⬇