Managing operating system processes in Java has often been a daunting task. The reason for that is the poor tooling and poor API that are available. To be honest, that is not without reason: Java was not meant for that purpose. If you wanted to manage OS processes, you had the Shell, Perl script, whatever you wanted. For larger applications that faced tasks that are more complex, you were supposed to program the issue in C or C++.

When you really had to manage processes from Java, you had to create operating-system-dependent code. It was possible — you could query some environment variables and then implement different behavior depending on the operating system. This approach works, but it has several drawbacks. Testing costs more and development is more complex. As Java became more and more mature and widespread, the demand for this type of applications arose. We can clearly see, for example, that this question put up on StackOverflow in 2011 had more than 100,000 views. Some applications, and thus some developers, needed a solution to this problem — not a workaround.

In this case, providing an API in the JDK is a solution. It will not make process-handling OS independent. Operating systems differ, and process handling is an area very much tied to the OS. The system-dependent part of the code is, however, moved to the JDK runtime, and the Java development team tests it, and not the applications separately. It eases the burden of testing on their side. In addition, development becomes cheaper, as the API is already there and we do not need to program it separately for BSD, OSX, Linux, and Windows, not to mention OpenVMS. Finally, the application may run faster.

Again, let's see an example. If we needed the list of running processes, then we had to start an external process that dumps the list of the processes to a standard output. The output of this process had to be captured and analyzed as a string. Now, with the advent of Java 9, we will have a simple call for that that invokes the appropriate operating system call and does not need the execution of a separate process, nor the parsing of a string output for information that was already there and just not available in Java.

To read about all the details of process handling of Java 9, you can read the documentation currently available here, or you can soon read the book Mastering Java 9 from Packt, in which I wrote the chapter about process handling. In this article, I will talk about why we need the new class, ProcessHandle. It may not be that evident to some developers who are not that experienced with operating system processes and how the operating system works.

ProcessHandle

In short, an instance of ProcessHandle represents an operating system process. All operating systems identify live processes using PIDs, which is a TLA abbreviating Process Identifier. These are small (or not-so-small) integer numbers. Some operating systems could use something else, like names, or some cryptic strings, but they do not. There is no benefit, and it happens that all of them use numbers to identify processes.

When we program in an OO manner, we abstract the problem so that it better explains the problem we model. There is a rule, however, that we should not make our model more abstract than the problem itself. That just introduces unnecessary complexity to the application, increasing cost. In this case, it seems to be obvious (or rather oblivious) to use int to identify a process. If the operating system does not do it more abstractly then why should we? Just because in Java everything is an object? (Not true, by the way.)

The reason for that is there is no one-to-one match between PIDs and ProcessHandle instances. Let’s re-read the first two sentences of this section:

“… ProcessHandle represents an operating system process. All operating systems identify live processes using PIDs …”

There is that little word, “live,” in the second sentence, and believe me that makes a difference. Being alive is very different from being dead, although we do not have a direct, firsthand comparison. A ProcessHandle instance may keep a reference to a process that is already wiped from memory.

Imagine a situation where you look at a list of the processes on Linux, issuing the ‘ps –ef’ command, and then you see that Tomcat is eating the CPU and consumes ever-increasing memory, most likely because the application you deployed has a bug looping. You decide to kill the process, so you look at the PID displayed and issue the command ‘kill -9 666’ if the PID happens to be 666. By that time, the process has eaten up all the memory it could have from the OS and, because you did not configure any swap file on the machine, the JVM disappears without a trace. The kill process will complain that there is no process for the defined PID.

It might also happen that the operating system has already started a totally different process that happens to have that PID. Has that ever happened to you? Now, you might shake your head because it has never happened in your experience. On Linux, by default, the maximum number a PID can be is 32768. When will that ever wrap around? Actually not a long time, but usually not so far that the PID is reused between issuing the ‘ps’ and ‘kill’ commands. And what happens if a small embedded system sets the /proc/sys/kernel/pid_max smaller? Say much smaller, like 16, that fits in four bits? It may not be a big problem when you issue the command interactively because you are there. If the system crashes, you can restart the process or the whole system, if needed. You can do the corrective action if you made a “mistake.” But Java applications are not that intelligent, and we should not have the chance, even in an embedded system, to kill a process we did not want to.

Process handling based on PID

To handle that situation, Java has the interface ProcessHandle. Instead of PIDs, we have ProcessHandles. If we need the ProcessHandle of the currently running process (the JVM), then we can call the static method ProcessHandle::current (note that I used the nice Java 8 method handle notation). You can get the PID of the current process by calling getPid() on that instance of ProcessHandle, but after a while, you won't do that. It's just an old habit wanting the PID of a process. You do not need it when you have the handle.

When you have a process handle, say processHandle, you can get a Stream calling processHandle.children(). This will list the immediate offspring processes. If you want a “transitive closure” so you want to list not only the children but also the children of children and so on, you have to call processHandle.descendants().

But what if you are really greedy and want to get a hand(le) on all processes? Then you should call the static method ProcessHandle::allProcesses.

Streams are famous for being populated lazily, creating the next element only when needed. In the case of a process list, it would lead to interesting results. In this case, the dataset backing the Stream of processes is a snapshot created when one of the children(), descendants(), or allProcesses() was called.

Now we have a handle to a process. What can we do with it?

We can processHandle.destroy() it, and we can also call processHandle.destroyForcibly(). That is what everybody was wanting, as per the cited StackOverflow article. We can also check if the process the handle is assigned to is still alive by calling processHandle.isAlive(). You can also get access to the parent process handle by calling processHandle.parent(). Note that not all processes have parent processes. Some might never have had any, and other processes might have been orphaned when their parent processes were terminated. For this reason, the return value of this method is Optional. Java 9 has new features in the Optional class as well, but that is a different story. Here, we're focusing on the processes.

If the process is still alive but we want to wait for the termination of the process, we can do it in a modern, asynchronous way. We can get a CompletableFuture from the process handle, calling processHandle.onExit(), which will complete when the process terminates. Java 9 has new features in the CompletableFuture class as well, but that is a different story. Again, we're focusing on the processes. Am I repeating myself?

There is an interface inside the interface ProcessHandle called Info. We can get an instance of the information from the process handle by calling processHandle.info(). Through this instance, we can access the arguments as an Optional string array, the command line as an Optional string, the command as a string, and the user the process belongs to as an Optional string as well. We can also get information about when the process was started and also about the total CPU usage in the form of an Optional Instant and Optional Duration. These new classes were introduced in Java 8, and Java 9 has new features, too… But okay, that's where it starts to be boring.

Summary

What can we do with all these features? In the book, I mention that I created a simple process-controlling application. I had to create a similar one around 2006 in Perl. It starts processes as described in a configuration file and, if any of them fails, it restarts. But this is only one example. There are other scenarios where process handling can be handy. You want to fill in forms and convert them to PDF? To do that, you start some word processor with command line parameters. The tasks are queueing — they are started one after the other to keep reasonable performance. You convert, at most, n configurable documents in n processes. If a process takes too long, you kill it, send a message about it to the person who started the request to your conversion server, and schedule it to run during the night or some less busy period.

We can develop such programs in Java without using external Shell, Python, or Perl scripts, and it simply makes the project simpler and cheaper.