November 26, 2015

Fibers are threads -- sequential processes that we can spawn and synchronize with others. However, usually when we say “thread” we mean those threads implemented by the operating systems, while fibers (AKA lightweight threads or user-mode threads) are implemented in user mode.

Actually, as we’ll see with JVM fibers (or, more accurately, fibers for the JVM) and threads, both the abstraction and the implementation are the same; the only difference is the use case: OS threads can be used in any language but require a lot of RAM and are slow to synchronize and to spawn, while fibers are specific to a certain language or runtime, but are very lightweight, and are synchronized with virtually no overhead.

In this post I’ll only cover plain JVM threads, which have been mapped 1:1 to the operating system threads since basically forever, and Quasar fibers, which are the JVM-specific implementation of lightweight threads.

How Is a Quasar Fiber Used?

A Quasar fiber is used exactly like a Java thread, and by that I mean it implements the thread API. Quasar abstracts both thread implementations -- Java’s Thread and Quasar’s Fiber -- into Strand. This allows fibers and threads to interoperate seamlessly.

For example your Java application will start in the main thread but if you want to use a fiber instead you can just spawn it from there and then join it; of course you can also do the opposite, that is spawning a thread from a fiber and then joining it. Or you could write methods that work generically with strands and then use them as either fibers or threads: in fact Quasar also includes a fiber-compatible, strand-based port of java.util.concurrent as well as Go-like channels, Erlang-like actors and Dataflow.

What Is a JVM Thread?

A thread is a continuation scheduled to run on a CPU core at the appropriate time by a scheduler. A continuation is nothing more than a program counter, marking our point in the sequence of instructions, and a stack, storing the value of our variables. Quasar fibers are implemented just like OS threads, only in JVM bytecode rather than in the OS kernel.

The OS sees and uses the hardware runtime: CPUs, application data registers, code registers (i.e. the program counter), memory registers (e.g. the stack pointer), virtual memory and CPU modes. in particular CPUs can switch between a limited user mode and a more powerful kernel mode when trap events happen or syscalls are executed. OS threads share most kernel resources (e.g. I/O descriptors) and inhabit the same address space: this makes them lighter than several single-thread OS processes and at the same time allows them to share data. They don’t share CPU registers though and have separate stacks, which allows them to execute distinct control flows concurrently.

JVM Threads and Scheduling

There can be more threads than there are available processors, so the OS needs to swap them in and out of CPUs by means of scheduling. A scheduling event can be triggered either by a preemption event trap (when the thread has consumed its CPU lease) or when the code explicitly invokes a syscall trap (to perform a kernel routine, like I/O). The CPU switches to kernel mode and transfers execution to a special memory area containing the trap handler: the kernel has now the power of snapshotting the CPU registers, including the program counter and the thread stack pointer, and package them into a Thread Control Block continuation.

After that the scheduler gets invoked to select the next thread to run, possibly in a different process; the kernel then restores all the relevant CPU registers and sets the processor up to switch back again into user mode and to resume execution at the restored program counter’s address.

Why Quasar Fibers?

OS threads give us everything we want, but for a heavy performance penalty: switching between threads involves jumping back and forth from user to kernel mode, possibly even across address space boundaries. These are expensive operations partly because they cause TLB flushes, cache misses and CPU pipelining havoc: that’s also why traps and syscalls can be orders of magnitude slower than regular procedure calls.

In addition, the kernel schedules threads (i.e. assigns their continuation to a CPU core) using a general-purpose scheduling algorithm, which might take into account all kinds of threads, from those serving a single transaction to those playing an entire video. Fibers, because they are scheduled at the application layer, can use a scheduler that is more appropriate for their use-case. As most fibers are used to serve transactions, they are usually active for very short periods of time and block very often. Their behavior is often to be awakened by IO or another fiber, run a short processing cycle, and then transfer control to another fiber (using a queue or another synchronization mechanism). Such behavior is best served by a scheduler employing an algorithm called “work-stealing”; this kind of scheduler is indeed employed by Erlang, Go and Quasar (by default). When fibers behave this way, work-stealing ensures minimal cache misses when switching between fibers.

The kernel can be our high-concurrency bottleneck. Suppose that thread switches were as fast as normal procedure calls, and we could avoid maintaining kernel data structures for threads: then we could gain a lot in terms of both memory footprint and switch efficiency. We want in-process, user-mode threads or… fibers.

Fibers for the JVM

Some language runtimes, such as Erlang’s, offer fibers natively but the JVM is not (yet) one of those. So how can we implement fibers on the JVM ourselves?

The Quasar approach is exactly the same used by the kernel to implement OS threads, only “reviewed” for the JVM runtime, which is stack-based rather than a typical register based machine: we “freeze” the fiber’s whole stack up to its root run method into a continuation task that we can re-schedule later on a regular Java executor. This process is called fiber suspension and I’ll now detail a bit more how it works.

The JVM has a standard, statically typed virtual instruction set (the bytecode format) that can be generated and edited with ASM at build-time, load-time or even run-time via agents. Each method in a call stack has an activation frame containing a current bytecode index (akin to the program counter), a local variables table and an operand stack. Thanks to the statically typed instructions and to the lexical scoping, the locals table and the operand stack can always be determined at compile-time for any method and any location in a method’s body.

Quasar’s Fiber.park suspends the current fiber and is used by all potentially fiber-suspending methods, which we call “suspendable”. These methods will throw a SuspendException to suspend the fiber. Exceptions undo the thread stack one frame at a time and they can be controlled via catch/finally blocks. Quasar contains several of them already, for example Fiber.sleep, channel and actor communication. Our code will include new suspendable methods built upon the existing ones and the Quasar instrumentor will:

Fully analyze the bytecode to find all the calls into suspendable methods. A method that (potentially) calls into other suspendable methods is itself considered suspendable, transitively.

Inject minimal bytecode in suspendable methods (and only them) that will manage an user-mode stack, in the following places: At the beginning we’ll check if we’re resuming the fiber and only in this case we’ll jump into the relevant bytecode index. Before a call into another suspendable method we’ll push a snapshot of the current activation frame, including the resume bytecode index; we can do it because we know the structure statically from the analysis phase. After a call into another suspendable method we’ll pop the top activation frame and, if resumed, we’ll restore it in the current fiber.



In general the suspendable methods can be found either through annotation of suspendable methods via @Suspendable or throwsSuspendExecution, or through bytecode graph analysis (which may instrument too much and adversely affect performance).

Quasar and OpenJDK

The Quasar team is currently working with the OpenJDK developers to make the instrumentation process completely transparent and not require manual annotation or bytecode graph analysis by having the JVM expose more information about the stack at runtime. This would also make Quasar automatically support all JVM languages (at present Quasar includes support for Java 7, 8 and Kotlin while Pulsar extends it to Clojure and also offers an idiomatic API).

It turns out that a suspended fiber, just like an unscheduled OS thread, is simply a stack that can be resumed, or a continuation, “packaged” as a task that can be re-scheduled: the only difference is that the fiber stack as well as the scheduler are user-mode and run in the JVM rather than in the kernel. Of course this is only the general idea and there are many more subtleties to take into account, for example:

How do we implement the fiber stack, which is clearly a performance-critical piece of the puzzle?

Which task executors are best for which scenarios, how do we choose them and how do we let the developer choosing them?

How do we change a developer’s catch clauses that would trap our “special” suspension exception?

Are there cases that can be optimized by avoiding (or reducing) an activation frame?

What About JVM Fiber Performance?

Managing user-mode stacks for fibers brings some overhead; how much exactly depends on how often instrumented methods are called and how deep the fiber call stack is. For example Quasar integrations such as Comsat's are often based on the FiberAsync class which parks the fiber after an async API is invoked and unparks it when the completion callback is invoked. In this case the stack is very shallow and the call frequency is very low, because fibers are used only to perform I/O operations (which are orders of magnitude slower than a method call).

You can find an Akka-Quasar performance analysis in this blog post: even for very small workloads Quasar performs comparably to Akka, which is async and thus has no user-mode stack management overhead.

This benchmark analysis shows that serving HTTP requests with fibers rather than threads also has the benefit of significantly increasing a server’s capacity and makes it much more resilient. As for the client side, an upcoming post will discuss the benefits of fibers through a comparative load test. Profiling the concurrency-intensive Spaceships Demo has revealed an overall Quasar overhead around 2%. Quasar itself includes some JMH benchmarks you can examine and run.

Our experience is that Quasar fibers work very well in many concrete situations: they allow you to write straightforward fiber-blocking code, yet at the same time develop highly (and finely) concurrent systems that simply couldn’t run on bulky OS threads. The overhead of fibers is higher but still very low even when compared to async and monadic APIs, which have the disadvantage of introducing a cumbersome, infectious programming style and don’t interoperate with imperative control flow constructs built into a language.

So Aren’t JVM Fibers Generators or async/awaits?

No, as we have seen fibers are real threads: namely a continuation plus a scheduler. Generators and async/awaits are implemented with continuations (often a more limited form of continuation called stackless, which can only capture a single stack frame), but those continuations don’t have a scheduler, and are therefore not threads.

Final Thoughts

I hope this deep-dive into fibers and threads has dissipated some doubts. I’ll look forward to feedback and questions! Ping me on twitter: @ftudone and I’ll try my best to respond swiftly.

Additional Resources

If you’re interested in continuations this recorded session from 2015’s Curry On and the subsequent blog post will give you a lot more insight.

Get more help with your Java projects by checking out our blog. We cover everything including unit testing, microservices, and tips and best practices for various Java tools.

Skip Redeploys During Java Development

And if you're working in Java, you need to try JRebel.

JRebel skips redeploys and rebuilds while maintaining application state. This allows developers to instantly check the results of their code changes, allowing them to stay in rhythm while developing.

Want to see how JRebel works on your project?

Try JRebel Free