Racket-on-Chez Status: January 2018

posted by Matthew Flatt with thanks to collaborators Kent Dybvig, Andy Keep, Gustavo Massaccesi, Sarah Spall, Sam Tobin-Hochstadt, and Jon Zeppieri

It’s been almost a year since we decided to try running Racket on Chez Scheme, and it has been almost six months since my last status report. As of RacketCon 2017, DrRacket could start up on Chez Scheme (but just barely, and just in time for the demo).

The current implementation is now considerably more complete. DrRacket still doesn’t run well, but Racket-on-Chez can build a full Racket distribution in reasonable time and space.

With the current Racket-on-Chez compilation strategy, runtime performance is plausible on traditional benchmarks, but cross-module optimization is needed to bring the results fully in line with our expectations. Startup and load times are much slower than we would like. Compile/build times are a factor of 4 slower than the current version of Racket.

Glossary of implementations: Racket — the current Racket release, version 6.x

racket7 — mostly the same as Racket, but using a new, Racket-implemented macro expander; the source is in the https://github.com/racket/racket7 repo

Chez Scheme — a completely different implementation of Scheme that we’re trying to use as a replacement for part of Racket

Racket-on-Chez — Chez Scheme plus additional layers to make it implement the same language as Racket; the source is also in the https://github.com/racket/racket7 repo

Approach

Our overall goal is to improve Racket’s implementation and make it more portable to different runtime systems. To a first approximation, improving Racket and making it more portable means “less C code.” Building on a Chez Scheme is a promising means toward that end.

The picture on the right illustrates the current Racket distribution. The parts in blue are implemented in Racket, while the part in red is implemented in C. The green block is Scribble documentation source. The numbers next to each block are a rough lines-of-code counts, and the blocks are scaled to those counts. The collects block represents the content of the main "collects" directory; along with the core block, it’s roughly the code in a Minimal Racket distribution. The pkgs block is the part of the main distribution that is available in separate packages but bundled with the main distribution. The docs block is spread across the same packages as the pkgs code. Porting Racket to Chez Scheme is all about the core part. The goal is to make no changes to existing Racket and Scribble code, although we’ll inevitably have to make small changes to support multiple Racket implementations.

Let’s zoom into the core part. (From now on, all the pictures on the right are at this scale.) The picture breaks the C code in Racket’s current implementation into several major subsystems. In reality, the implementation is not organized into such tidy layers, but the picture is conceptually close to the truth. The large builtins block implements primitive data structures and their operations, such as numbers (fixnums, bignums, flonums, exact rationals, complex numbers), strings, characters, byte strings, lists, vectors, and hash tables. Most other blocks are self-explanatory. The expander block implements the macro expander, reader, and module system. More than any other part of the current stack, the expander part is better implemented in Racket.

That was exactly the work of 2016: building a new expander to replace the C-implemented expander. You can use that racket7 variant of Racket now by building from the http://github/racket/racket7 repo. This new expander hasn’t made it into the release, because a few interactions haven’t been sorted out: the macro stepper, the demodularizer, and to some degree the decompiler. The expander also runs at about half the speed of the C-implemented expander, and that performance difference was part of the motivation to investigate a new runtime system. Although the compiler+JIT block is drawn the same as before in this picture, it’s a slightly different compiler, because it doesn’t know about modules or syntax objects. Instead, the compiler works in terms of linklets, which are lambda-like blocks of code that import and export variables. That separation of expansion/modules from compilation helps enable the transition to a different compiler and runtime system.

And that brings us to Chez Scheme. The most obvious feature in the picture on the right is that Chez Scheme’s C-implemented kernel is a fraction of the implementation, while most of Chez Scheme is more sensibly written in Scheme. Chez Scheme’s expander is considerably simpler than Racket’s. Although Chez Scheme’s expander is hygienic and hugely influential on Racket’s expander, Racket’s macro and module system does a lot more. Chez Scheme’s compiler, meanwhile, does a lot more than Racket’s compiler+JIT. The builtins functionality in Chez Scheme is similar to Racket’s builtins. Chez Scheme’s GC (at the bottom) is tiny and elegant. It lacks some functionality of Racket’s GC, but mostly it lacks Racket’s historical complexity for trying to cooperate directly with C.

That brings us, finally, to the Racket-on-Chez picture. It starts with the Chez Scheme stack and adds a rumble block, which extends Chez Scheme to match the current Racket’s GC through control+structs blocks. Mostly, rumble is about immutable hash tables, delimited continuations, applicable structures, structure types properties, and impersonators/chaperones. We’ve given this layer of functionality the name Rumble, because it’s useful to have a name for the language that hosts other layers that are built using Racket. The threads, io, and regexp blocks are entirely new implementations of the corresponding blocks in the current Racket implementation. The rktio block is used by io, and it’s the same as in current Racket: a portability layer over OS facilities that is similar to (but more extensive than) code that is in Chez Scheme’s kernel.

The expander block (the same as for racket7) sits on a small schemify adaptor. The schemify layer converts Racket linklets into plain Scheme functions, adjusting expressions as needed to support applicable structures, to enforce left-to-right evaluation, and to implement Racket’s allocation model for local functions (i.e., lift to avoid closure allocation whenever possible).

Naturally, the Racket-on-Chez implementation is bootstrapped by applying the expander and schemify layers to themselves and the other Racket-implemented parts of the stack, and then compiling the schemified linklets using the Chez Scheme compiler. A thin main driver on top handles command-line arguments to load modules and start the read–eval–print loop.

Current Functionality

DrRacket starts up in Chez Scheme, and just starting DrRacket exercises many Racket language constructs and libraries. The largest Racket-on-Chez demonstration to date is building all of the packages and documentation of the main Racket distribution. Building packages and documentation, again, covers more Racket functionality than you may expect, because our macros do interesting things, such as typechecking and generating bitmap images. Also, the build process is too long and complex to tolerate leaks or major inefficiencies.

Still, many pieces are buggy, and closing the gap is mostly a matter of plowing through the Racket test suites. The main missing or incomplete pieces are: futures, places, and a racket executable that starts Racket-on-Chez.

Performance: Benchmark Sanity Check

The overall compilation story is not yet right in Racket-on-Chez, partly because there’s no cross-module optimization. Functions like map, memq, and assv need cross-module optimization to be compiled well in Racket-on-Chez. Benchmarks sometimes do not rely on cross-module optimization, however, and preliminary benchmark measurements can be useful as a sanity check on the ways that schemify mangles individual modules.

Benchmarks are in the racket-benchmark package whose source is in the racket7 repo. The benchmarks run in Chez Scheme as R6RS libraries. The difference between Racket and racket7 is in the noise.

The table below shows a handful of rough benchmark results. You should not take the results seriously, but they line up with my overall impression of where we are.

Rough and unscientific benchmark numbers msecs deriv dynamic sboyer maze nfa Chez Scheme 770 360 1270 930 2100 Racket-on-Chez 1560 1030 1470 1520 2200 Racket 2030 560 2230 1330 4100

All benchmarks were run in safe mode and without Gustavo’s work-in-progress addition to the Chez Scheme compiler that can eliminate some runtime checks and related dead code.

Performance: Startup

Startup time for Racket-on-Chez is determined by Chez Scheme’s base startup time (to load boot files) plus time to load the expander and other parts of the Racket-on-Chez stack. The load times shown below for racket/base and racket include the startup time and add the additional time needed to load the many compiled files that are parts of those libraries.

For Racket and racket7, bytecode parsing can be delayed at the granularity of a procedure, so that a procedure’s bytecode isn’t parsed until the procedure is called. Delayed parsing is enabled by default, and the -d flag disables it.

Startup/load time msecs startup -l racket/base -l racket Racket 16 50 220 Racket -d 16 80 500 racket7 50 110 340 racket7 -d 50 190 650 Chez Scheme 170 Racket-on-Chez 440 550 1200 Racket-on-Chez/JIT 440 550 1600

Compared to the current Racket version, starting up and loading takes longer in racket7, because racket7 has to load embedded bytecode for the expander’s implementation, and because the bytecode encoding of linklets has not been refined as much.

The Chez Scheme row in the table doesn’t actually load racket/base, but that’s the closest reasonable column. That row corresponds to just starting Chez Scheme with its compiler— so it’s a signficantly smaller language than racket/base, but it’s more than the Racket startup rows. It’s also a lower bound on the Racket-on-Chez startup time.

Loading Racket-on-Chez takes significantly longer, still, due to larger and slower-to-load machine code for both the Racket-on-Chez stack and libraries compiled with it. Although individual linklets within a module (roughly, individual phases) are parsed on demand, there is no delayed parsing at the granularity of procedures, so the work performed by loading is closer to the Racket -d and racket7 -d rows.

The last row above, Racket-on-Chez/JIT, refers to a variant of Racket-on-Chez that does not compile whole linklets. Instead, it keeps individual lambdas in source form until called, and it compiles each called lambda on demand. Load times go up, mostly because code is compiled as needed, but also because the source-fragment representation is even larger than machine code right now. The Racket-on-Chez/JIT approach does not look the most promising, so far, but it’s also not crazy. With more effort and tuning (especially in the representation of source fragments), it may be a viable approach and more in line with the way the current Racket implementation works.

Clearly, load time is a direction for further effort.

Performance: Expander and Compiler

The expander runs at about the same speed in Racket-on-Chez and racket7, but the linklet-compilation step takes much longer in Racket-on-Chez. As a result, compilation time for Racket-on-Chez dominates expansion proper for a task like racket -cl racket/base.

Reported times in this section use a Racket-on-Chez stack that is compiled in unsafe mode and run on a 2013 MacBook Pro 2.6GHz Core i5. Unsafe mode improves expand and read performance by about 10%.

Steps within racket -cl racket/base msecs expand schemify compile eval read racket7 2511 662 235 275 Racket-on-Chez 2398 917 2852 390 227 Racket-on-Chez/JIT 2441 974 1044 515 229

For Racket-on-Chez/JIT, compile times go down, showing that about 1/3 of the code that is compiled during expansion is also run during expansion. The fraction would be lower for most programs; the racket/base library is unusual in building up layers of macros to implement itself. The schemify step is certainly a target for improvement, since its job is much simpler than the compile step’s job.

That factor-of-2 slowdown relative to racket7 is compounded by the Racket-implemented expander being twice as slow as the current Racket’s C-implemented expander:

Loading from source with racket -cl msecs racket/base racket Racket 1830 21700 racket7 3490 38140 Racket-on-Chez 7060 70093 Racket-on-Chez/JIT 5400 55050

The gap between the Racket-implemented expander and the C-implemented expander is the one that we most hoped to close by moving to Chez Scheme. As part of the Racket-on-Chez stack, the expander is already compiled in a reasonable way (i.e., no cross-module optimization issues), so it’s not clear how to improve. Still, I still think the expander can be made to run faster in Chez Scheme, and we’ll keep trying.

Performance: Distribution Build

The current version of Racket-on-Chez builds on a 64-bit machine with a peak memory use around 1.25 GB, which is on a similar scale to the current Racket release’s peak memory use around 1 GB.

The following plots show, on the same scale, memory use over time when building a distribution. Each plot has two black lines (easiest to distinguish in the last plot): the top black line describes peak memory use, since it’s before a major GC, while the bottom black line is closer to live-memory use, since it’s after a major GC. The orange region is for compiling libraries (each vertical line starts a collection), the blue region is for running documentation, the green region is for rendering documentation, and the final pink region is for re-rendering documentation as needed to reach a fix-point for cross references. Memory use is expected to grow modestly during the blue region, since the builds collecting cross-reference information for use in the green region. Memory use should grow minimally in the orange and green regions (as a side-effect of caches).

These graphs report builds on a Linux 3.4GHz Core i7-2600 machine with the Racket parts of the Racket-on-Chez stack compiled in safe mode.

Racket C-implemented expander racket7 Racket-implemented expander Racket-on-Chez Same expander as racket7

The plots illustrate that, while memory use is similar, build times are much longer. Expander and compiler performance largely explains the difference.

The build time certainly can be improved some. Maybe build times can be improved significantly, or maybe slower build times will seem worthwhile for a more maintainable implementation of Racket and/or for faster generated code.

Outlook

It would be nice if porting to Chez Scheme made every aspect of Racket magically faster. It hasn’t done that, but we have plenty of room for improvement; the performance results to date are a lower bound on performance, not an upper bound.

Keep in mind that the original goal is not to have a faster Racket, but a better-implemented Racket with acceptable performance. The Racket-on-Chez implementation is far more maintainable and flexible than the current Racket implementation, so maybe we’re half-way there after one year of work.

You can try Racket-on-Chez from https://github.com/racket/racket7/. See racket/src/cs/README.txt in that repo for more information.