Today’s post is brought to you by the letter Z .

Z for Zombies

This post is about Linux subreapers, why they exist, what their behaviour is, and how you can use them, via a practical example in Go. If you are comfortable with how the process table and process states work, you can go ahead and skip the preliminary reading section, otherwise let’s go!

Preliminary Reading on Process States

In Linux, a process can progress through various states, however, commonly when you look at ps output, you will only see R (for running) and S (for interruptible sleep). Below you can see some ps output for a parent and child relationship, they are both in S state with some extra information you can read about on the man7 ps page.

Sleeping Processes

Less commonly however, you may see processes in Z (for zombie) state; zombie processes are those that have exited but not been reaped by an ancestor via a system call from the wait family. They are no longer running but have left some state around in the kernel’s process table. They are often seen with the associated <defunct> string, as below:

Ahhh! Zombies

Ok so why are we seeing a zombie process here? Why do we ever see them in ps output? Well, ps output is produced by looking at the contents of files found under the /proc/<pid>/ directory. The procfs filesystem is a special beast, a virtual filesystem that can read the process table - you can read more about it in the man7 proc page. After a process exits, and transitions to Z state, it still has an entry in the process table until it is reaped, and thus it appears in the ps output. So in this case we can point the finger at the ./main process for not reaping correctly!

By default, when child processes become orphaned (because their parent process dies :( so sad), they are reparented to the init process (pid 1 ) in their pid namespace. The init process is responsible for cleaning up after any debris, and will attempt to reap any zombies that have been reparented under it.

pid 6 putting up a fight

Sometimes reparenting to init is undesirable behaviour; Sometimes we want a process to be responsible for any and all descendants, even if intermediary processes in the tree have exited. The canonical example here is to support userspace supervisor processes such as systemd , where services often double-fork to daemonize — based on the behaviour above, this would result in the services being reparented directly to init, bypassing the supervisor process.

So what mechanism exists to allow supervisor processes to work as intended?

Don’t Fear The Subreaper

Since version 3.4, Linux has had the concept of a subreaper. No resource could say it better than the man7 page on prctl:

A subreaper fulfills the role of init(1) for its descendant processes. When a process becomes orphaned (i.e., its immediate parent terminates) then that process will be reparented to the nearest still living ancestor subreaper. Subsequently, calls to getppid() in the orphaned process will now return the PID of the subreaper process, and when the orphan terminates, it is the subreaper process that will receive a SIGCHLD signal and will be able to wait(2) on the process to discover its termination status.

So let’s get to the fun part, and play around with subreapers practically. I created a little library to work from at https://github.com/williammartin/subreaper, which provides three functions:

The Prctl function from the unix library allows interaction with a variety of attributes on a process. Depending on the value passed in the first argument, the following arguments can have different significance. For subreaper attributes, we care about the PR_SET_CHILD_SUBREAPER and PR_GET_CHILD_SUBREAPER values.

When setting a process as a subreaper, a nonzero second argument sets an attribute, and if it’s zero the attribute is unset. When fetching the subreaper setting of a process, the second value is a pointer into which the value of the subreaper attribute is set ( 0 or 1 ).

Alright, so now we have some utilities that play with the subreaper setting for a process, we can see the impact it has upon the process tree and reaping!

This is a simple demonstration program using the previous library. Let’s break down the runtime structure; This program does a double-fork as discussed earlier when executed with run as the first argument.

The parent spawns and waits on the child (via cmd.Run() ), then sleeps The child spawns the grandchild, but does not wait (via cmd.Start() ) The grandchild process goes to sleep

It should be noted that /proc/self/exe is a special file that links to the currently executing binary. We can use ps to compare the process table with the default reparenting behaviour (by commenting the subreaper.Set() line out), versus setting the parent as a subreaper:

Default Reparenting Behaviour

The grandchild has been reparented to init

We can see here that the /proc/self/exe grandchild process has been reparented to the init process. If we check again later, it has been reaped and no longer appears in the ps output.

Subreaper Reparenting Behaviour:

The grandchild has been reparented to our subreaper!

Wow! This time when the child exited, the grandchild was reparented to the subreaper rather than init! If we check back 20 seconds later the process will be a zombie…

The grandchild has become a zombie!

Now all our subreaper needs to do is wait on any descendants, and it can act like a real supervisor process!

The only change we’ve made in here is to listen for SIGCHLD signals and call the Wait4 function with -1 in the first argument to reap any child process that has exited. You can read about the various arguments on the man7 wait page. Now our grandchild gets reaped successfully!

The grandchild has been reaped!

Note that if you were to write this in a non-trivial manner, you would probably want a loop to manage SIGCHLD signals from many descendants, rather than exiting after one. You would probably also want to catch the child process exiting in your loop, rather than via cmd.Run() .

Conclusion

Hopefully, you now have a good understanding of how the linux model works when it comes to process states, the process tree and reaping. If you have any questions, let me know!