Introduction

Deadlocks are caused by many threads locking the same resources

Deadlocks can also occur if thread pool is used inside a task running in that pool

Modern libraries like RxJava/Reactor are also susceptible

A deadlock is a situation where two or more threads are waiting for resources acquired by each other. For example thread A waits for lock1 locked by thread B, whereas thread B waits for lock2 , locked by thread A. In worst case scenario, the application freezes for an indefinite amount of time. Let me show you a concrete example. Imagine there is a Lumberjack class that holds references to two accessory locks:

import com.google.common.collect.ImmutableList; import lombok.RequiredArgsConstructor; import java.util.concurrent.locks.Lock; @RequiredArgsConstructor class Lumberjack { private final String name; private final Lock accessoryOne; private final Lock accessoryTwo; void cut(Runnable work) { try { accessoryOne.lock(); try { accessoryTwo.lock(); work.run(); } finally { accessoryTwo.unlock(); } } finally { accessoryOne.unlock(); } } }





Every Lumberjack needs two accessories: a helmet and a chainsaw. Before he approaches any work , he must hold the exclusive lock to both of these. We create lumberjacks as follows:

import lombok.RequiredArgsConstructor; import java.util.concurrent.locks.Lock; import java.util.concurrent.locks.ReentrantLock; @RequiredArgsConstructor class Logging { private final Names names; private final Lock helmet = new ReentrantLock(); private final Lock chainsaw = new ReentrantLock(); Lumberjack careful() { return new Lumberjack(names.getRandomName(), helmet, chainsaw); } Lumberjack yolo() { return new Lumberjack(names.getRandomName(), chainsaw, helmet); } }





As you can see, there are two kinds of lumberjacks: those who first take a helmet and then a chainsaw and vice versa. Careful lumberjacks try to obtain a helmet first and then wait for a chainsaw. A YOLO-type of lumberjack first takes a chainsaw and then looks for a helmet. Let’s generate some Lumberjacks and run them concurrently:

private List<Lumberjack> generate(int count, Supplier<Lumberjack> factory) { return IntStream .range(0, count) .mapToObj(x -> factory.get()) .collect(toList()); }





generate() is a simple method that creates a collection of lumberjacks of a given type. Then, we generate a bunch of careful and YOLO:

private final Logging logging; //... List<Lumberjack> lumberjacks = new CopyOnWriteArrayList<>(); lumberjacks.addAll(generate(carefulLumberjacks, logging::careful)); lumberjacks.addAll(generate(yoloLumberjacks, logging::yolo));





Finally, let’s put these Lumberjacks to work:

IntStream .range(0, howManyTrees) .forEach(x -> { Lumberjack roundRobinJack = lumberjacks.get(x % lumberjacks.size()); pool.submit(() -> { log.debug("{} cuts down tree, {} left", roundRobinJack, latch.getCount()); roundRobinJack.cut(/* ... */); }); });





This loop takes Lumberjacks — one after another — in a round-robin fashion and asks them to cut a tree. Essentially, we are submitting the howManyTrees number of tasks to a thread pool ( ExecutorService ). In order to figure out when the job was done, we use a CountDownLatch :

CountDownLatch latch = new CountDownLatch(howManyTrees); IntStream .range(0, howManyTrees) .forEach(x -> { pool.submit(() -> { //... roundRobinJack.cut(latch::countDown); }); }); if (!latch.await(10, TimeUnit.SECONDS)) { throw new TimeoutException("Cutting forest for too long"); }





The idea is simple. We will let a bunch of Lumberjacks compete over a helmet and a chainsaw across multiple threads. The complete source code is as follows:

import lombok.RequiredArgsConstructor; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ExecutorService; import java.util.concurrent.TimeUnit; import java.util.concurrent.TimeoutException; import java.util.function.Supplier; import java.util.stream.Collectors; import java.util.stream.IntStream; @RequiredArgsConstructor class Forest implements AutoCloseable { private static final Logger log = LoggerFactory.getLogger(Forest.class); private final ExecutorService pool; private final Logging logging; void cutTrees(int howManyTrees, int carefulLumberjacks, int yoloLumberjacks) throws InterruptedException, TimeoutException { CountDownLatch latch = new CountDownLatch(howManyTrees); List<Lumberjack> lumberjacks = new ArrayList<>(); lumberjacks.addAll(generate(carefulLumberjacks, logging::careful)); lumberjacks.addAll(generate(yoloLumberjacks, logging::yolo)); IntStream .range(0, howManyTrees) .forEach(x -> { Lumberjack roundRobinJack = lumberjacks.get(x % lumberjacks.size()); pool.submit(() -> { log.debug("{} cuts down tree, {} left", roundRobinJack, latch.getCount()); roundRobinJack.cut(latch::countDown); }); }); if (!latch.await(10, TimeUnit.SECONDS)) { throw new TimeoutException("Cutting forest for too long"); } log.debug("Cut all trees"); } private List<Lumberjack> generate(int count, Supplier<Lumberjack> factory) { return IntStream .range(0, count) .mapToObj(x -> factory.get()) .collect(Collectors.toList()); } @Override public void close() { pool.shutdownNow(); } }





Now, let's take a look at the interesting part. If you only create careful Lumberjacks , the application completes almost immediately, for example:

ExecutorService pool = Executors.newFixedThreadPool(10); Logging logging = new Logging(new Names()); try (Forest forest = new Forest(pool, logging)) { forest.cutTrees(10_000, 10, 0); } catch (TimeoutException e) { log.warn("Working for too long", e); }





However, if you play a bit with the number of Lumberjacks , e.g. 10 careful and one yolo , the system quite often fails. What happened? Everyone in the careful team tries to pick up a helmet first. If one of the Lumberjacks picked up a helmet, everyone else just waits. Then, the lucky guy picks up a chainsaw, which must be available. Why? Everyone else is waiting for the helmet before they pick up a chainsaw. So far, so good. But, what if there is one yolo Lumberjack in the team? While everyone competes for a helmet, he sneakily grabs a chainsaw. But, there’s a problem. One of the careful Lumberjacks gets his safety helmet. However, he can’t pick up a chainsaw, because it’s already taken by someone else. To make matters worse, the current owner of the chainsaw (the yolo guy) will not release his chainsaw until he gets a helmet. There are no timeouts here. The careful guy waits infinitely with his helmet, unable to get a chainsaw. The yolo guy sits idle forever, because he can not obtain a helmet — a deadlock.

Now, what would happen if all the Lumberjacks were yolo , i.e., they all tried to pick the chainsaw first? It turns out that the easiest way to avoid deadlocks is to obtain and release locks always in the same order. For example, you can sort your resources based on some arbitrary criteria. If one thread obtains lock A followed by B, whereas the second thread obtains B first, it’s a recipe for a deadlock.

Thread Pool Self-Induced Deadlocks

This was an example of a deadlock, rather than a simple one. But, it turns out that a single thread pool can cause a deadlock when used incorrectly. Imagine you have an ExecutorService , just like in the previous example, that you use, as shown below:

ExecutorService pool = Executors.newFixedThreadPool(10); pool.submit(() -> { try { log.info("First"); pool.submit(() -> log.info("Second")).get(); log.info("Third"); } catch (InterruptedException | ExecutionException e) { log.error("Error", e); } });





This looks fine — all messages appear on the screen as expected:

INFO [pool-1-thread-1]: First INFO [pool-1-thread-2]: Second INFO [pool-1-thread-1]: Third





Notice that we block, see get() , waiting for the inner Runnable to complete before we display "Third." It’s a trap! Waiting for the inner task to complete means it must acquire a thread from a thread pool in order to proceed. However, we already acquired one thread, therefore, the inner will be blocked until it can get the second. Our thread pool is large enough at the moment, so it works fine. Let’s change our code a little bit, shrinking the thread pool to just one thread. Also, we’ll remove get() , which is crucial:

ExecutorService pool = Executors.newSingleThreadExecutor(); pool.submit(() -> { log.info("First"); pool.submit(() -> log.info("Second")); log.info("Third"); });





This code works fine, but with a twist:

INFO [pool-1-thread-1]: First INFO [pool-1-thread-1]: Third INFO [pool-1-thread-1]: Second





Two things to notice:

Everything runs in a single thread (unsurprisingly)

The "Third" message appears before "Second"

The change of order is entirely predictable and does not come from some race condition between threads (in fact, we have just one). Watch closely what happens: we submit a new task to a thread pool (the one printing "Second"). However, this time we don’t wait for the completion of that task. Great, because the very single thread in a thread pool is already occupied by the task printing "First" and "Third" . Therefore, the outer task continues, printing "Second." When this task finishes, it releases the single thread back to a thread pool . The inner task can finally begin execution, printing "Second." Now, where’s the deadlock? Try adding blocking get() to inner task:

ExecutorService pool = Executors.newSingleThreadExecutor(); pool.submit(() -> { try { log.info("First"); pool.submit(() -> log.info("Second")).get(); log.info("Third"); } catch (InterruptedException | ExecutionException e) { log.error("Error", e); } });





Deadlock! Step-by-step:

Task printing "First" is submitted to an idle single-threaded pool

is submitted to an idle single-threaded pool This task begins execution and prints "First"

We submit an inner task printing "Second" to a thread pool

to a thread pool The inner task lands in a pending task queue. No threads are available since the only one is currently being occupied

We block waiting for the result of the inner task. Unfortunately, while waiting for the inner task, we hold the only available thread

get() will wait forever, unable to acquire thread

will wait forever, unable to acquire thread deadlock

Does it mean having a single-thread pool is bad? Not really. The same problem could occur with a thread pool of any size. But, in that case, a deadlock may occur only under high load, which is much worse from a maintenance perspective. You could technically have an unbounded thread pool, but that’s even worse.

Reactor/RxJava

Notice that this problem can occur with higher-level libraries, like Reactor:

Scheduler pool = Schedulers.fromExecutor(Executors.newFixedThreadPool(10)); Mono .fromRunnable(() -> { log.info("First"); Mono .fromRunnable(() -> log.info("Second")) .subscribeOn(pool) .block(); //VERY, VERY BAD! log.info("Third"); }) .subscribeOn(pool);





Once you subscribe, this seems to work, but it is terribly non-idiomatic. The basic problem is the same. Outer Runnable acquires one thread from a pool , subscribeOn() in the last line, and at the same time, inner Runnable tries to obtain a thread as well. You need to replace the underlying thread pool with a single-thread pool, and this produces a deadlock. At least with RxJava/Reactor, the cure is simple — just compose asynchronous operations rather than blocking inside each other:

Mono .fromRunnable(() -> { log.info("First"); log.info("Third"); }) .then(Mono .fromRunnable(() -> log.info("Second")) .subscribeOn(pool)) .subscribeOn(pool)





Prevention

There is no 100 percent sure way of preventing deadlocks. One technique is to avoid situations that may lead to deadlocks, like sharing resources or locking exclusively. If that’s not possible (or deadlocks are not obvious, like with thread pools), consider proper code hygiene. Monitor thread pools and avoid blocking indefinitely. I can hardly imagine a situation when you are willing to wait an indefinite amount of time for a completion. And, that’s how get() or block() without timeout are working.

Thanks for reading!