So, okay. That's maybe a tad confusing, but not too bad. For those of you who don't write command shells in Unix all day, the call to fork is copying your Ruby process into two nearly-identical ones. There's a new process (called the "child") that will now only run the code inside that block, and your first process (called the "parent") will ignore the block and carry on as though the fork method didn't do anything. The "pid" above stands for "Process ID", and it's a big number like 32741. It's used to identify the process. You've probably seen them before in "ps" or "top" output or the Mac Activity Monitor.

The IO.pipe method returns two IO objects. If you write to one, the other can read it. We'll pass one into the child and keep one in the parent. Well, okay -- actually, we copied both sides of the pipe when we copied our process. So we'll close one side in the child, and close the other side in the parent.

The child does some work ("do_something" above) and then passes the result, as a string, through the pipe. The parent reads the result from the pipe. If necessary it will hang out doing nothing on that "read" for as long as necessary, or until the child dies.

You need the pipe because the two processes are completely separate. So you can't just assign a returned value in the child and have it show up in the parent. That's why we write in the child and read in the parent. It's also why we don't need to use a Mutex like we would with threads - the two processes are more separate than that, so they use a pipe instead.

The final Process.waitpid is telling your operating system, "when that child is finished, I'm done with it. You don't need to save me any more pipe output or whether the child succeeded or failed or anything. I'm all finished with that child process, you can clean it up." You can also call it without a process ID as the argument and it will just wait for any child to die.

That's not a coordinator. But it's the building block we'll be using to make one.

Mo Processes Mo Problems...

If you want many processes to each do work, you can do the above many times. That's the same basic principle behind app servers like Unicorn, or Puma's cluster mode, for instance. The traditional Unix "fork server" is exactly that. Normally you call the original process the "master" and the forked child processes "workers".

One difficulty of all this is cleanup. Normally a master process only manages the workers, because failures are hard. If you want to do other things in your master process (like Rails Ruby Bench does,) you'll want a coordinator process. Then the master process can calculate and handle input, the coordinator process can watch for dead children, and of course the children (a.k.a. "workers") will do the work. But you still have to keep track of everything to clean it up.

Unix and MacOS have a great tracking mechanism for watching and cleaning this stuff up: process groups. You may have used them before without knowing it. You know how if a process gets way out of hand you can use "kill -9" to kill it? The "9" part means the unblockable, uncatchable SIGKILL signal. Did you ever wonder what the minus is for?

It turns out the minus means "don't just kill this one process. Kill everything in its process group." A process group, by default, includes any other processes it forks off, so it cleans everything up, including all the child processes. This is magic that you, too, can use.

In other words: the top-level "master" process forks a coordinator. The coordinator sets up a new process group, then forks some (or many!) workers. When the task is done, terminate the coordinator's whole process group with extreme prejudice. Now you can be sure: it's all cleaned up.

Here's one way that can look: