2017-06-14 | 1418 words | How devs are ensuring quality of Rakudo compiler releases

As some recall, Rakudo's 2017.04 release was somewhat of a trainwreck. It was clear the quality assurance of releases needed to be kicked up a notch. So today, I'll talk about what progress we've made in that area.

Define The Problem

A particular problem that plagued the 2017.04 release were big changes and refactors made in the compiler that passed all the 150,000+ stresstests, however still caused issues in some ecosystem modules and users' code.

The upcoming 2017.06 has many, many more big changes:

IO::ArgFiles were entirely replaced with the new IO::CatHandle implementation

IO::Socket got a refactor and sync sockets no longer use libuv

IO::Handle got a refactor with encoding and sync IO no longer uses libuv

Sets/Bags/Mixes got optimization polish and op semantics finalizations

Proc was refactored to be in terms of Proc::Async

The IO and Proc stuff is especially impactful, as it affects precomp and module loading as well. Merely passing stresstests just wouldn't give me enough of peace of mind of a solid release. It was time to extend the testing.

Going All In

The good news is I didn't actually have to write any new tests. With 836 modules in the Perl 6 ecosystem, the tests were already there for the taking. Best of all, they were mostly written without bias due to implementation knowledge of core code, as well as have personal style variations from hundreds of different coders. This is all perfect for testing for any regressions of core code. The only problem is running all that.

While there's a budding effort to get CPANTesters to smoke Perl 6 dists, it's not quite the data I need. I need to smoke a whole ton of modules on a particular pre-release commit, while also smoking them on a previous release on the same box, eliminating setup issues that might contribute to failures, as well as ensuring the results were for the same versions of modules.

My first crude attempt involved firing up a 32-core Google Compute Engine VM and writing a 60-line script that launched 836 Proc::Asyncs—one for each module.

Other than chewing through 125 GB of RAM with a single Perl 6 program, the experiment didn't yield any useful data. Each module had to wait for locks, before being installed, and all the Procs were asking zef to install to the same location, so dependency handling was iffy. I needed a more refined solution...

Procs, Kernels, and Murder

So, I started to polish my code. First, I wrote Proc::Q module that let me queue up a bunch of Procs, and scale the number of them running at the same time, based on the number of cores the box had. Supply.throttle core feature made the job a piece of cake.

However, some modules are naughty or broken and I needed a way to kill Procs that take too long to run. Alas, I discovered that Proc::Async.kill had a bug in it, where trying to simultaneously kill a bunch of Procs was failing. After some digging I found out the cause was $*KERNEL.signal method the .kill was using isn't actually thread safe and the bug was due to a data race in initialization of the signal table.

After refactoring Kernel.signal, and fixing Proc::Async.kill, I released Proc::Q module—my first module to require (at the time) the bleedest of bleeding edges: a HEAD commit.

Going Atomic

After cooking up boilerplate DB and Proc::Q code, I was ready to toast the ecosystem. However, it appeared zef wasn't designed, or at least well-tested, in scenarious where up to 40 instances were running module installations simultaneously. I was getting JSON errors from reading ecosystem JSON, broken cache files (due to lack of file locking), and false positives in installations because modules claimed they were already installed.

I initially attempted to solve the JSON errors by looking at an Issue in the ecosystem repo about the updater script not writing atomically. However, even after fixing the updater script, I was still getting invalid JSON errors from zef when reading ecosystem data.

It might be due to something in zef , but instead of investigating it further, I followed ugexe++'s advice and told zef not to fetch ecosystem in each Proc. The broken cache issues were similarly eliminated by disabling caching support. And the false positives were eliminated telling each zef instance to install the tested module into a separate location.

The final solution involved programatically editing zef's config file before a toast run to disable auto-updates of CPAN and p6c ecosystem data, and then in individual Procs zef module install command ended up being:

«zef --/cached --debug install "$module" "--install-to=inst#$where"»

Where $where is a per-module, per-rakudo-commit location. The final issue was floppy test runs, which I resolved by re-testing failed modules one more time, to see if the new run succeeds.

Time is Everything

The toasting of the entire ecosystem on HEAD and 2017.05 releases took about three hours on a 24-core VM, while being unattended. While watching over it and killing the few hanging modules at the end without waiting for them to time out makes a single-commit run take about 65 minutes.

I also did a toast run on a 64-core VM...

Overall, the run took me 50 minutes, and I had to manually kill some modules' tests. However, looking at CPU utilization charts, it seems the run sat idle for dozens of minutes before I came along to kill stuff:

So I think after some polish of avoiding hanging modules and figuring out why (apparently) Proc::Async.kill still doesn't kill everything, the runs can be entirely automated and a single run can be completed in about 20-30 minutes.

This means that even with last-minute big changes pushed to Rakudo, I can still toast the entire ecosystem reasonably fast, detect any potential regressions, fix them, and re-test again.

Reeling In The Catch

The Toaster database is available for viewing at toast.perl6.party. As more commits get toasted, they get added to the database. I plan to clear them out after each release.

The toasting runs I did so far weren't just a chance to play with powerful hardware. The very first issue was detected when toasting Clifford module.

The issue was to do with Lists of Pairs with same keys coerced into a MixHash , when the final accumulative weight was zero. The issue was introduced on June 7th and it took me about an hour of digging through the module's guts to find it. Considering it's quite an edge case, I imagine without the toaster runs it would take a lot longer to identify this bug. lizmat++ squashed this bug hours after identification and it never made it into any releases.

The other issue detected by toasting had to do with the VM-backed decoder serialization introduced during IO refactor and jnthn++ fixed it a day after detection. One more bug had to do with Proc refactor making Proc not synchronous-enough. It was mercilessly squashed, while fixing a couple of longstanding issues with Proc.

All of these issues weren't detected by the 150,000+ tests in the testsuite and while an argument can be made that the tests are sparse in places, there's no doubt the Toaster has paid off for the effort in making it by catching bugs that might've otherwise made it into the release.

The Future

The future plans for the Toaster would be first to make it toast on more platforms, like Windows and MacOS. Eventually, I hope to make toast runs continuous, on less-powerful VMs that are entirely automated. An IRC bot would watch for any failures and report them to the dev channel.

Conclusion

The ecosystem Toaster lets core devs test a Rakudo commit on hundreds of software pieces, made by hundreds of different developers, all within a single hour. During its short existence, the Toaster already found issues with ecosystem infrastructure, highly-multi-threaded Perl 6 programs, as well as detected regressions and new bugs that we were able to fix before the release.

The extra testing lets core devs deliver higher-quality releases, which makes Perl 6 more trustworthy to use in production-quality software. The future will see the Toaster improved to test on a wider range of systems, as well as being automated for continued extended testing.

And most importantly, the Toaster makes it possible for any Perl 6 programmer to help core development of Perl 6, by simply publishing a module.

-Ofun