Erlang Liveness Checks in Kubernetes

I have an ever-increasing number of small projects and deployments that I use either internally or with some availability to the public, and have been relying on Kubernetes to make managing them easy. Not too long ago, I started adding a liveness probe to each pod definition as a contingency against a hung runtime. My pod definition at the time looked like this example from my Aflame project:

containers: - name: aflame image: docker-registry: 5000 /erlang-aflame livenessProbe: exec: command: - /deploy/bin/aflame - ping initialDelaySeconds: 5

This worked fine, and I was able to verify that nodes that failed the ping test would be taken down. However, some weeks later I was poking around and realized that CPU utilization on my server was significantly higher than I would expect.

Note the recent spike in CPU utilization

Looking at htop , it was difficult to see a precise culprit, except for a large number of processes called erl_child_setup coming into existence, pegging a CPU core and disappearing again. After googling around, I landed on the source code for this task and found this section of main:

/* We close all fds except the uds from beam. All other fds from now on will have the CLOEXEC flags set on them. This means that we only have to close a very limited number of fds after we fork before the exec. */ #if defined(HAVE_CLOSEFROM) closefrom( 4 ); #else for (i = 4 ; i < max_files; i ++ ) #if defined(__ANDROID__) if (i != system_properties_fd()) #endif ( void ) close(i); #endif

According to the ps output on my system, max_files was getting set to 1,048,576 - so every time this program run, it was hot looping over a million possible file descriptors and calling close on each! No wonder it was resulting in so much system time. But what was actually causing all of these erl_child_setup calls? I had initially suspected a misbehaving deployment code, but the spike in load lined up with when I added the liveness checks. It turns out that the relx ping command is surprisingly heavyweight; or at least ends up that way when run in an environment with a very high max file count, which was the case under docker.

The fix

In order to fix this, I altered my liveness probe to run the ping check with a low ulimit on the number of open files. We still need to loop, but we are at least looping over a considerably more restrained number of descriptors. I also took the opportunity to increase the interval between checks, since the default check period is rather quick.

containers: - name: aflame image: docker-registry:5000/erlang-aflame livenessProbe: exec: command: + - softlimit + - -o + - "128" - /deploy/bin/aflame - ping initialDelaySeconds: 5 + periodSeconds: 60

Note that in order to get softlimit in your container, you may need to rebuild with daemontools package installed, or another source that contains a limit utility.

Noticably reduced CPU usage