Racket-on-Chez Status: February 2020

posted by Matthew Flatt

Racket on Chez Scheme (Racket CS) is ready for production use. Racket CS now passes all of the tests for the main Racket distribution tests, and differences in compile and run times are much reduced. Overall, Racket CS tends to perform about the same as the traditional Racket implementation (Racket BC, “before Chez”)— sometimes better and sometimes worse, but typically using more memory due to larger code sizes.

Racket CS is not yet the default Racket implementation, but it is available as a download option alongside the regular Racket release at https://download.racket-lang.org/ (select CS from the Variant popup).

Run-Time Performance

Run-time performance for Racket CS has continued to improve. Benchmarks show the difference to some degree, but they understate the difference for a typical Racket application. Alex Harsanyi shared his initial experience with Racket CS in December 2018, and he has been kind enough to keep taking measurements. The plots below show his results for Racket CS and Racket BC, where lower is better, and the overall trend here seems typical for applications that I’ve measured.

As the trend lines may suggest, the overall improvement is from many small changes that add up. The plateau around June to October 2019 coincides with a push on correctness and compatibility, as opposed to performance, to make all Racket tests pass.

The plots below show current results for traditional Scheme benchmarks. The top bar is current and unmodified Chez Scheme, while the second bar is Chez Scheme as modified to support Racket. The third bar is Racket CS, and the bottom bar is Racket BC. The last two rows of benchmarks rely on mutable pairs, so they are run in Racket as #lang r5rs programs. There’s not a lot of difference here compared to one year ago, except that a faster path for integer division in the Racket variant of Chez Scheme has eliminated the collatz outlier.

The next set of plots compare Racket CS and Racket BC for Racket-specific implementations over years for The Computer Language Benchmarks Game. Compared to one year ago, the fraction of benchmarks where Racket CS wins over Racket BC is reversed. Much of that improvement happened in the thread and I/O layers that were newly implemented for Racket CS.

About the measurements: Alex Harsanyi’s measurements are for parts of the ActivityLog2 test suite; ActivityLog2 and the CI infrastructure it runs on have both changed over time, so the plots are approximate, but they are generally consistent with fresh runs with the current ActivitlyLog2 implementation on Racket versions 7.3 through 7.6.0.12. All other measurements use a Core i7-2600 at 3.4GHz running 64-bit Linux. Benchmarks are in the "racket-benchmark" package in the "common" and "shootout" directories. We used commit a20e3f305c of the Racket variant of Chez Scheme and commit cdd0659438 of Racket.

Startup and Load Times

Load times have improved for Racket CS. Loading the racket/base library:

Loading the full racket library:

Load times are, of course, directly related to memory use, and Racket CS load times improved primarily through reduced memory use.

About the measurements: These results were gathered by using time in a shell a few times and taking the median. The command was as shown.

Memory Use

Differences in memory use between Racket CS and Racket BC are mostly due to bytecode versus machine code, plus the fact that Racket BC can load bytecode more lazily. Various improvements, including changes to the Chez Scheme compiler to reduce code size, have decreased memory use in Racket CS by around 20% for typical applications.

The following plots show memory use, including both code and data, after loading racket/base or racket, but subtracting memory use at the end of a run that loads no libraries (which reduces noise from different ways of counting code in the initial heap). The gray portion of each bar is an estimate of memory occupied by code, which may be in machine-code form, bytecode form, or not yet unmarshaled.

Racket BC heap sizes here are larger compared to previous reports by about 5 MB. That difference reflects a more accurate measurement of the initial Racket BC heap.

On a different scale and measuring peak memory use instead of final memory use for DrRacket start up and exit:

About the measurements: These results were gathered by running racket with the arguments -l racket/base, -l racket, or -l drracket. The command further included -W "debug@GC" -e ’(collect-garbage)’ -e ’(collect-garbage)’, and reported sizes are based on the logged memory use before the second collection. For the Racket BC bar, the reported memory use includes the first number that is printed by logging in square brackets, which is the memory occupied by code outside of the garbage collector’s directly managed space. Baseline memory use was measured by setting the PLT_GCS_ON_EXIT environment variable and running with -n, which is has the same effect as -e '(collect-garbage)' -e '(collect-garbage)'. DrRacket was initialized with racket/base as the default language; also, background expansion was disabled, because measuring memory use is tricky on Racket BC. Code size was estimate using dump-memory-stats counting bytecode, JIT-generated native code, and marshaled code for Racket BC and machine code, relocations, and bytevectors (which are mostly marshaled code) for Racket CS.

Expand and Compile Times

Compile times have improved substantially for Racket CS. The main change was to use an interpreter for compile-time code within the module currently being compiled, because that’s a better trade-off in (meta) compilation time and (meta) run time. Compile-time code that is exported for use by other modules is compiled and optimized normally.

While the Racket CS interpreter and the one in Racket BC are both intended to be safe for space, the Racket CS one stands a much better chance of achieving that goal.

About the measurements: These results were gathered by using time in a shell a few times and taking the median. The command was as shown.

Build Time

The time and memory used to build a Racket distribution using Racket CS is now much closer to the time and memory used by Racket BC. The following plots are all on the same scale, and they show memory use plotted against time for building the Racket distribution from source:

In each plot, there are two lines, although they are often smashed together. The top line is memory use just before a major garbage collection, and the bottom line is memory use just after a major garbage collection.

Racket CS Racket BC

The Racket CS plot used to be more than twice as wide as the Racket BC plot. About half of the improvement came from fixing a cache that interacted badly with Racket CS’s more frequent minor garbage collections. The rest of the improvement was due to many small improvements.

About the measurements: These plots were generated using the "plt-build-plot" package, which drives a build from source and plots the results.

Run-Time Performance Redux

In some ways, the Racket CS project has validated the Racket BC implementation, because it turns out that Racket BC performs pretty well. With the notable exception of first-class continuations (where Racket BC use a poor strategy), the traditional, JIT-based Racket engine performs close to Chez Scheme.

Then again, development to date has been aimed at making Racket CS match the performance of Racket BC on a code base that was developed and tuned for Racket BC, and that obviously gives Racket BC an advantage. For example, Racket BC makes dormant code relatively cheap, so Racket libraries generate a lot of code. Future Racket libraries will likely shift to take advantage of Racket CS’s cheaper function calls and dramatically cheaper continuations. One day, probably, Racket BC will no longer be a viable alternative to Racket CS for most programs.

To make that prediction more concrete, consider these three ways of counting to 10 million (if N is 10000000):

; self-contained loop ( for/fold ( [ v #f ] ) ( [ i ( in-range N ) ] ) i ) ; indirect function calls for loop control ( for/fold ( [ v #f ] ) ( [ i N ] ) ; no `in-range` i ) ; continuations ( let ( [ g ( generator ( ) ( for ( [ i ( in-range N ) ] ) ( yield i ) ) #f ) ] ) ( for/fold ( [ v #f ] ) ( [ i ( in-producer g #f ) ] ) i ) )

Here are run times normalized to the first one, which is about the same in Racket CS and Racket BC:

While no one will start writing trivial loops the slow way, there’s a big difference between a ×5 and ×25 overhead when choosing how to implement a new abstraction and deciding how much to make static versus dynamic. There’s an even bigger difference between ×335 and ×2672— both irrelevant for a toy loop, but ×335 becomes relevant sooner than ×2672 for more interesting calculations. Overall, when implementation choices start to rely on the R/CS column, the R/BC column will sometimes be unacceptable.

Reflection and Outlook

It took three years to get Racket on Chez Scheme running well enough for production use, and it will take yet more time for Racket CS to fully replace Racket BC. But a certain amount of optimism is necessary to take on a large project like this, and if the timeline gets stretched beyond initial and early projections, then that’s only to be expected.

Racket CS will eventually outpace Racket BC for the reason that originally motivated porting Racket to Chez Scheme: it’s put together in a better way, so it’s easier to modify and improve. Maintainability is difficult to capture with the same clarity as performance benchmarks, but after spending one more year modifying both implementations, I remain as convinced as ever that Racket CS is much better.

This report is the last one for Racket CS. Here are a few things that are on the Racket CS roadmap— but, increasingly, we’ll just call it the Racket roadmap: