This month, I merged the Ruby's first JIT compiler.

Then I gave a talk about it yesterday at my company's conference.

As it isn't ultra-fast yet, I haven't shared it deeply until the conference. But some of you guys kindly wrote great articles about it.

Probably they are easier to understand than my article. But I found several things to point out or announce. So I decided to write this article.

What's "MJIT"?

Let me share the summary about recent JIT topics on Ruby. Those who read the above articles may want to skip this section.

The RTL instruction project

In July 2016, Vladimir Makarov, or Vlad, who optimized Hash at Ruby 2.4 filed a ticket "VM performance improvement proposal" (Feature #12589). It's a project to replace all Ruby VM's stack-based instruction set with register-based one, which is named RTL (Register Transfer Language) instructions.

He is also known as the author of GCC's register allocator and a maintainer of its instruction scheduler. I assumed that he wanted to apply some optimization knowledge in GCC to Ruby.

While it's such a big change to replace many parts of Ruby VM, he successfully passed `make test` with RTL instruction implementation. It succeeded to improve some micro benchmarks, but a larger benchmark named Optcarrot wasn't improved.

MJIT: JIT compiler that runs on RTL instructions

To approach the problem, in March 2017, he also published JIT compiler which compiles RTL instructions, which is named MJIT (MRI JIT).

He discussed some possible JIT methods in rtl_mjit_branch and chose the approach that generates C code to a disk, lets C compiler compile it to .so and loads machine code from the object file.

JIT succeeded to make the Optcarrot result 2 or 3 times faster.

On the other hand, as a side effect of largely rewriting the implementation of VM, although the `make test` is passed, a more severe test such as make test-all` is not passed regardless of whether JIT is enabled or disabled, and Rails fails to start on the RTL instructions.

YARV-MJIT: Yet another JIT compiler that runs on YARV instructions

In this way, RTL instruction conversion was a somewhat risky change, but my company implements cloud service as Rails application and provides it, so very high availability is required. If high risk changes were made to Ruby, upgrading Ruby would be very difficult.

After Vlad's session at RubyKaigi last year, I thought about how to reduce the risk to introduce JIT compiler to Ruby. Then I decided to implement another approach called YARV-MJIT, which compiles original stack-based YARV instructions instead of RTL instructions.

The JIT approach is very safe because it can be achieved without modifying VM implementation at all. Even if you find some bug after next Ruby release, you can ensure Ruby works as before by disabling optional JIT compiler.

As it doesn't change bytecode dynamically to avoid risks, it can't reach the performance of original MJIT. But with some improvements, I succeeded to make its performance comparable with MJIT.

By removing some premature aggressive optimizations, YARV-MJIT could pass all Ruby test cases with and without JIT enablement, keeping the performance improvement in some level. I proposed to merge it to Ruby in late 2017, and it's successfully merged this month after fixing some potential bugs and random crashes.

Ruby's JIT status

Ruby 2.6 merged following 2 components.

JIT Compiler: YARV-MJIT

JIT Infrastructure: MJIT

I call the infrastructure part as "MJIT infrastructure" in source code or ticket because "MJIT" was originally the combination of the part and the built-in JIT compiler. Sometimes people call both of them just as MJIT, but I think it's okay. Even Ruby committers don't say YARV-MJIT.

I'll share how Ruby's JIT implementation is going.

JIT compiler benchmarks

Even though the conservative JIT compiler has the aim to minimize risks of JIT introduction, of course it must improve the performance because the change is for optimization. I'll share some benchmark results on my machine: Intel 4.0GHz i7-4790K 8 Cores, 16GB memory, x86_64 Linux.

Optcarrot

This is the latest result of Optcarrot benchmark. While YARV-MJIT achieved about 68fps previously, it became a little slower because it dropped some premature optimizations.

In my Pull Request to merge JIT compiler, I attached the benchmark result with 63fps. Shannon's article seems to quote the result, but as I previously updated in the PR's description, it's temporarily changed to 58fps. I know the cause of the gap between 63fps and 58fps, and now I'm working on fixing it.

While we can't know which commit contributes to VM performance because recently RubyBench stops to be updated, recent shyouhei's VM improvements (like this) seem to make Ruby 2.6 much faster even without JIT compiler.

It's a shame for me that my JIT compiler doesn't contribute to Ruby 2.6's performance improvements so much, but the improvements of VM performance sounds a good news for sure.

Benchmark in “Playing with ruby’s new JIT”

Is the JIT compiler's optimization limited to only about 11%?

The article "Playing with ruby's new JIT: MJIT" has a benchmark script. And the author of truffleruby shared improved version of the benchmark.

def calculate(a, b, n = 40_000_000)

i = 0

c = 0

while i < n

a = a * 16807 % 2147483647

b = b * 48271 % 2147483647

c += 1 if (a & 0xffff) == (b & 0xffff)

i += 1

end

c

end Benchmark.ips do |x|

x.iterations = 3

x.report("calculate") do |times|

calculate(65, 8921, 100_000)

end

end

Here is its result:

$ ruby -v

ruby 2.6.0dev (2018-02-15 trunk 62410) [x86_64-linux]

$ ruby bench.rb

Warming up --------------------------------------

calculate 13.000 i/100ms

calculate 13.000 i/100ms

calculate 13.000 i/100ms

Calculating -------------------------------------

calculate 1.800k (± 2.7%) i/s - 8.996k in 5.002504s

calculate 1.785k (± 7.4%) i/s - 8.853k in 5.003616s

calculate 1.802k (± 4.0%) i/s - 8.996k in 5.006199s

$ ruby --jit bench.rb

Warming up --------------------------------------

calculate 13.000 i/100ms

calculate 18.000 i/100ms

calculate 27.000 i/100ms

Calculating -------------------------------------

calculate 7.182k (± 9.1%) i/s - 35.397k in 5.000332s

calculate 7.296k (± 2.9%) i/s - 36.450k in 5.001392s

calculate 7.295k (± 3.1%) i/s - 36.450k in 5.002572s

1.802k is improved to 7.296k. It's about 4x faster. It seems that we already achieved Ruby 3x3!

Obviously the benchmark looks designed for JIT propaganda, but it suggests the potential performance improvements in the real world if we tune the benchmark for it.

Even for such an arbitrarily created benchmark, we should aim to improve CRuby in the order of hundred because truffleruby shows 370x performance on my machine.

Other benchmarks

The commit merging JIT compiler shows other benchmark results in the commit message. Some of micro benchmarks are improved about 2x faster compared to Ruby 2.6's VM execution.

It also includes discourse's benchmark result. I wanted to use rails_ruby_bench by Noah Gibbs, but for now I'm failing to run it on my machine and so using discourse's script/bench.rb. For now, JIT compiler even makes Rails a little slower. Sam Saffron pointed out that Unicorn is forking and MJIT is disabled in the child process. There are many things to be improved. I continue to investigate Rails performance with JIT.

VM-Generated JIT Compiler

As JIT compiler needs to generate code that works in the same way as VM implementation, a very simple JIT compiler would have many copy-pastes of VM implementation. It would be bad for maintainability.

Ruby VM is generated by the special template format "insns.def". ko1 suggested to generate JIT compiler from insns.def too. Just before merging JIT compiler to Ruby trunk, I tried it and it worked fine.

It's so simple that you can understand immediately. Ruby has mjit_compile.inc.erb like following one. (%-starting line is the special mode of ERB. Did you know that?)

switch (insn) {

% RubyVM::BareInstructions.to_a.each do |insn|

case BIN(<%= insn.name %>):

<%= render 'mjit_compile_insn', locals: { insn: insn } -%>

break;

% end

}

Branching by the type of VM instruction, _mjit_compile_insn_body.erb prints the content of insns.def with the following code.

% expand_simple_macros.call(insn.expr.expr).each_line do |line|

% if line =~ /\A\s+JUMP\((?<dest>[^)]+)\);\s+\z/

/* Dynamic generation of JUMP code */

% else

fprintf(f, <%= to_cstr.call(line) %>);

% end

% end

It converts macros for JIT beforehand, and then JUMP macro is dynamically generated because it requires special conversion. From this ERB file, following C code is generated, which runs as JIT compiler on runtime.

switch (insn) {

case BIN(nop):

fprintf(f, "{

");

{

fprintf(f, " reg_cfp->pc = original_body_iseq + %d;

", pos);

fprintf(f, " reg_cfp->sp = (VALUE *)reg_cfp->bp + %d;

", b->stack_size + 1);

fprintf(f, " {

");

fprintf(f, " /* none */

");

fprintf(f, " }

");

b->stack_size += attr_sp_inc_nop();

}

fprintf(f, "}

");

break;

case BIN(getlocal):

/* ... */

}

Just listing many `fprintf`s. It's so easy to understand, right? When Ruby's main method requests to JIT a method in hotspot, the code in MJIT thread generates the following code.

VALUE

_mjit0(rb_execution_context_t *ec, rb_control_frame_t *reg_cfp)

{

VALUE *stack = reg_cfp->sp;

static const VALUE *const original_body_iseq = (VALUE *)0x5643d9a852a0;

if (reg_cfp->pc != original_body_iseq) {

return Qundef;

} label_0: /* nop */

{

reg_cfp->pc = original_body_iseq + 1;

reg_cfp->sp = (VALUE *)reg_cfp->bp + 2;

{

/* none */

}

} /* snip... */ } /* end of _mjit0 */

`rb_execution_context_t` is a kind of Thread context and `rb_control_frame_t` is one frame of call stack. The interface allows to execute a frame that holds a method's bytecode on a specific Thread context.

By compiling this and calling `dlopen` and `dlsym`, you can get the function pointer of it and VM calls the function instead of executing bytecode on VM.

Why is it made fast? Please read the slides on the top of this article if you are interested.

How is JIT infrastructure?

Support status of platforms

I could easily port pthread to Windows native thread using existing thread abstraction layer for Ruby.

Instead, it was so hard to successfully build a C header file which is used on JIT compilation on some platforms. After merging JIT infrastructure, following platforms were reported to be failing or broken on RubyCI.

NetBSD

Solaris

AIX

Intel C/C++ Compiler

Old Visual Studio

I've tested the JIT infrastructure with Linux, macOS, MinGW and newer Visual Studio. But it wasn't sufficient for Ruby's portability requirement. After installing some VMs or using CI environments, I succeeded to fix them. Now we can build the JIT header on many platforms.

Then I added `--jit-wait` option for JIT compiler testing and wrote unit tests for it. Linux+gcc usually works perfectly, but some tests fail on clang and MinGW environments. Not only the test failure, but also JIT performance with clang is a little worse than gcc. So it may be hard to see improvement on macOS. And MinGW is skipping to transform the JIT header due to bugs in transformation. It results in the slowness of JIT compilation on MinGW. I'm working to fix those problems.

Security

Other core committers (nobu, usa) has improved many parts of JIT infrastructure. One of them is security.

MJIT worker thread does `fopen` with a path like `/tmp/_ruby_mjit_p123u4` when writing a C file to a disk. First of all, initial implementation didn't control the permission of the file (it was a known issue). It also didn't restrict opening the existing file. It means that the opened file may have arbitrary permission. It's fixed to force creating a new file.

When you test the JIT on production before release, please own your risk and be careful.

Others

About "Startup Time"

The article Ruby's New JIT pointed out that:

Startup time is one thing to take into consideration when considering using the new JIT. Starting up Ruby with the JIT takes about six times (6x) longer.

You found a very important part. The cause of it is that `--jit` option always starts to precompile the header for JIT on initializer, and it waits for the finish on Ruby process finalizer. So unfortunately it takes at least the time to compile one C header.

We have following 2 approaches to solve it.

Building precompiled header on Ruby's build, not on runtime

Immediately cancel JIT compiler thread regardless of its state

Even though it's named "precompiled header", it's actually not precompiled. That's because I suspect that the format of precompiled header may change after upgrading a C compiler used for JIT. If it's turned out to be wrong, we can take the approach. For now, I took the conservative strategy.

We can immediately cancel JIT compiler thread ignoring the state of it, but it might be dangerous in some situations.

`--jit-cc` does no longer exist

Sometimes C compiler fails to compile a header generated by another C compiler. It's not Ruby's fault and we're not going to support it.

Now Ruby trunk automatically uses C compiler which was used to build Ruby itself. If you want to test performance with clang, you need to build Ruby with clang.

Acknowledgements

I'm still not sure if my JIT compiler will be in the release of Ruby 2.6, but I want to say "Thank you" for following people.

The inventor of MJIT: Vladimir Makarov

Ruby's father: Matz

Reviewers of YARV-MJIT code: ko1, mame

Many bug reports and fixes: wanabe

The first MinGW support patch: Lars Kannis

Maintainer of many build environments: hsbt

Core committers fixing MJIT infrastructure: nobu, usa, znz, knu

Recent VM improvements: shyouhei

What's next?

We're preparing to release very early preview of Ruby 2.6.0. The performance may not be improved so much at the release, but I hope you enjoy hacking MJIT there.