Asynchronicity should be a property of how the program is executed, not what it does.

Ruby currently implements mutually exclusive threads and exposes both blocking and non-blocking operations. It also supports Fibers which can be used to implement cooperatively scheduled event-driven IO. The cognitive burden of dealing with these different APIs is left as an exercise to the programmer, and thus we have a wide range of IO libraries with varying degrees of concurrency. Composability of components build on different underlying IO libraries is generally poor because each library exposes its own API and has its own underlying event loop. We present an approach to concurrency that scales well and avoids the need to change to existing programs.

Improving Concurrency and Composability

Fibers are a negative overhead abstraction for concurrency, with each fiber representing a synchronous set of operations, and multiple fibers executing cooperatively in a single thread. This design provides concurrency with none of the overheads of parallel (multi-threaded) programming. Programmers can write their code as if it were sequential, which is easy to reason about, but when an operation would block, other fibers are given a chance to execute. Excellent scalability on Ruby is achieved by running multiple processes, each with its own event loop, and many fibers.

Basic Operations

Here is an example of a basic asynchronous read() operation. It is possible to inject such wrappers into existing code and they will work concurrenty without any further changes:

class Wrapper # ... initialze, write, close, etc def read(*args) while result = @io.read_nonblock(*args, exception: false) case result when :wait_readable @selector.wait_readable(@io) else return result end end end end

What does wait_readable() look like? In a simple select() -based implementation:

class Selector # initialze, wait_writable, etc def wait_readable(io) @readable[io] = Fiber.current Fiber.yield @readable.delete(io) return true end def run while @readable.any? or @writable.any? readable, writable = IO.select(@readable.keys, @writable.keys, []) readable.each do |io| @readable[io].resume end writable.each do |io| @writable[io].resume end end end end

The problem with this design is that everyone has to agree on a wrapper and selector implementation. We already have a core IO layer in Ruby that practically everyone uses. Along with IO.select(...) we have a ton of options for event driven concurrency, including but not limited to: NIO4R (alive), Async (alive), LightIO (experimental), EventMachine (undead), ruby-io (experimental).

Extending Ruby

The best boundary for event-drive IO loops in Ruby is per-thread (or taking the GIL into account, per-process). Event driven IO is naturally cooperative, and scheduling operations across threads makes it needlessly complicated. We can leverage Ruby's existing IO implementation by intercepting calls to io.wait_readable() and io.wait_writable() and redirect them to Thread.current.selector .

We add an appropriate C API for Thread.current.selector and add a layer of indirection to int rb_io_wait_readable(int f) (and others):

int rb_io_wait_readable(int f) { VALUE selector = rb_current_thread_selector(); if (selector != Qnil) { VALUE result = rb_funcall(selector, rb_intern("wait_readable"), 1, INT2NUM(f)); return RTEST(result); } /* existing implementation ... */ }

Here is an example of how this fits together:

thread = Thread.new do selector = Selector.new Thread.current.selector = selector i, o = IO.pipe i.nonblock = true # this could be default o.nonblock = true Fiber.new do message = i.read(20) end.resume Fiber.new do o.write("Hello World") o.close end.resume selector.run # could be invoked implicitly end

This design has a very minimal surface area, allows reuse of existing event loops (e.g. EventMachine, NIO4r). It's also trivial for other Rubies to implement (e.g. JRuby, Rubinius, TruffleRuby, etc).

Performance

While it's hard to make objective comparisons since this is a feature addition rather than a performance improvement, we can at least look at some benchmarks from async-http and async-postgres which implement the wrapper aproach discussed above.

Puma scales up to its configured limits. Falcon scales up until all cores are pegged.

Further Reading

The code is available here and the Ruby bug report has more details. There is a PR tracking changes.

The goal of these improvements is to improve the composability and performance of async. I've implemented the wrapper approach in async-io and it's proven itself to be a good model in several high level libraries, including: async-dns, async-http and async-http-faraday.