I don’t want to duel that much

In the last few weeks, I have been risking my life in the corridors at work. I have fit coworkers with a good sense of balance. Here is a picture of the horrible situations I’ve found myself into :

(props to xkcd. It’s the greatest webcomic in the universe. read it regularly)

Some of the code I’m dealing with, though, is not bound by the CPU speed or the number of cores I can throw at it. In particular, the ant distpack-opt task of the Scala compiler, compiling and packaging the documentation, is very much limited by storage transfer speeds. I have state-of-the-art SSDs on some of my machines, but yet found myself frustrated and waiting in front of the machine way too often.

To be fair, as far as I’m concerned, once is too often. I’m not so good when standing on a rolling chair.

The RAM disk solution and its imperfections

The obvious solution, then was to use a RAM disk : insane transfer speeds would definitely help me, right ? Except RAM disks have two well-known problems:

a RAM disk is, fundamentally, a piece of storage — a partition — carved out of your existing RAM. Assuming you have enough of it, it’s very fun until you have no more power. Every time you shut down your machine, you need to transfer the content of your RAM to your hard drive. Every time there is a sudden power outage, you basically lose the whole partition.

moreover, a RAM partition is inherently limited in size. More interestingly, that’s a hard limit. What happens when you overflow the size of that partition is — at least on Linux — a nightmare: because the partition is treated like a fixed storage device, not a piece of memory (that could overflow into swap), it basically blows up in your face when full.

The solutions on a modern Arch

Thankfully, the tools to manage those two problems have come to maturity. At the time of this writing, my machines are running a Linux 3.8rc7 kernel on ArchLinux transitioned to systemd, and running RAM disks in that kind of environment is a breeze.

Ulatencyd (ArchLinux doc) is a daemon that uses cgroups, a recent-ish feature of the Linux kernel that lets you constrain the amount of resources allocated to a process. It watches over your processes, and dynamically gives a fair amount of ressources to them. Every second, it looks for memory pressure, and either relieves it if possible, or kills the guilty party if necessary. We have all had to face the swap of death, that frightening moment when a memory-leaking process overflows into the disk at a scary rate, slowing down — nay, taking down — your entire system. Here, Ulatencyd helps you separate processes on your system into cgroups, that are resized dynamically, and handle memory situations in a simple & graceful by default but amazingly configurable fashion. The Archlinux install is amazingly simple. Install the AUR package, and activate it the systemd way ( systemctl enable ulatencyd.service ) after (optionally) editing the configuration ( /etc/ulatencyd/ulatencyd.conf ) : the default maximal amount of physical memory given to a process is 70%, and since I have 16GB of RAM on all my machines (with a RAM disk moving between 3 and 5 GB), this limit is pretty safe. An added benefit ? In the middle of a long compilation, my machine is entirely responsive. I may be waiting for the compiler, but I can still surf the Internet in the meantime.

Anything-sync-daemon (ArchLinux doc) is a small daemon — in fact, a small bash script — that is aimed at hiding the management of the day-to-day life of a RAM disk from the user. It uses rsync to synchronize the contents of your RAM disk to disk priodically, alleviating the risks coming from the need for your RAM to be powered at all times. It creates the partitions in your RAM and fills them up at boot time, and shelves them at shutdown. Again, the ArchLinux install is insanely simple. There is an AUR package, the activation is trivial with systemd ( systemctl enable asd.service ) and the configuration is also easy. I simply edited /etc/asd.conf and put the following directories under RAMDISK :

WHATTOSYNC=('/home/huitseeker/Scala' '/home/huitseeker/.eclipse' '/home/huitseeker/.m2' '/usr/share/eclipse' '/home/huitseeker/workspace' '/home/huitseeker/junit-workspace' '/usr/lib/jvm' '/home/huitseeker/runtime-EclipseApplicationwithEquinoxWeaving' '/home/huitseeker/.sbt' '/home/huitseeker/.ivy2')

This is more than you need to simply compile Scala. As you probably noticed, this means :

I develop using Eclipse.

I test the Scala IDE using Equinox weaving.

I also work using sbt.

I develop on the JVM

The Results

For the distpack-opt target of the Scala compiler, I have tried this method with :

a slow, 2010-top-of-the-line Lynnfield core iMac (configuration) with a very well-used, slow hard drive.

a reasonable X220 Lenovo Thinkpad (config) with an i5-2540M and an SSD.

Both configurations run on full-disk-encrypted storage using LUKS (but AVX-accelerated) (which explains the bad baseline you’re going to see for that disk-bound compilation).

The slow iMac went from 124 minutes for compilation to 27 minutes.

The Lenovo went from 24 minutes to 8 minutes 30 seconds

Yay ! no more risking my life fencing in corridors any more !