The Emacs dumper dispute

Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

The Emacs editor is, at its core, a C program, but much of the editor's functionality is actually implemented in its special "Elisp" dialect of Lisp. Starting the editor requires loading a great deal of Elisp code and initializing its state, a process that can take a long time. To avoid making users wait for this process, Emacs has long used a scheme whereby the Elisp code is loaded once and a memory image is written to disk; starting Emacs becomes a matter of reading the memory image back in, which is a much faster process. Supporting this "dumping" functionality (also known as "unexec") has never been easy; beyond the technical challenges, it now appears that it may lead to a significant split within the Emacs community.

As covered here in January, the Emacs dumping (and "undumping") mechanism has long depended on some low-level hooks in the GNU C Library's memory allocation subsystem. The Glibc developers would like to modernize and improve this code, improving the library overall but removing the hooks that Emacs depends upon. At the end of the January discussions, the Emacs developers had decided to move to a workaround implementation that allowed the dumper to continue to work in the absence of Glibc support.

Note: it seems the actual breakage happened with : it seems the actual breakage happened with this commit for Glibc 2.24; thanks to Florian Weimer for the correction.

What nobody realized at the time is that the loss of the Glibc hooks, which happened in October for the 2.25 release (expected in February 2017), would affect existing Emacs releases in a surprising way. In particular, they fall back to an older interface called "ralloc", which does not perform well at all. The result is well summarized by Emacs co-maintainer Eli Zaretskii in October:

Based on what we've learned the hard way during the last couple of weeks, I'd say that all the Emacs versions before 25.2 (including 25.1) will be unstable on such GNU systems to the degree of making them almost unusable.

"Unstable" is the sort of behavior that users of text editors normally go well out of their way to avoid; it's also the sort of thing that could give vi a definitive advantage in the interminable editor wars. So something clearly needs to be done to make the Emacs dumping facility more stable and, preferably, more maintainable going forward. What that "something" would be is unclear, and the posting of a possible solution appears to have simply muddied the waters further.

That solution comes in the form of the "portable dumper" patch from Daniel Colascione. This patch is not small; it adds over 4,500 lines of code to Emacs and it is not yet complete. Rather than try to capture the state of the C library's memory-allocation subsystem, it simply marshals and saves the set of Elisp objects known to the editor. The file format is designed for performance and, in some settings at least, Emacs can start by simply mapping the file into memory and initializing a set of pointers.

Colascione describes the result this way:

The point of this gargantuan patch is that we can rip out our unexec implementations and replace them with loading a data file that contains an Emacs heap image. There are no dependencies on executable rewriting, disabling ASLR, or saving and restoring internal malloc state. This system works with fully position-independent executables and with any malloc implementation.

It also, he says, matches the startup performance of the current "unexec" system to within 100ms, and he has not yet had the time to collect a bunch of low-hanging optimization fruit. In other words, it seems like an interesting solution to the problem, but a patch of this size is always going to generate some discussion.

Some of that discussion focused on how this dumper works when address-space layout randomization (ASLR) is in use. Current Emacs binaries must disable ASLR entirely, thus losing the security benefits that ASLR is meant to provide. The new dumper does not require disabling ASLR, but it does contain an optimization that can be applied if the dump file can be successfully mapped at a specific address: most of the data therein can be used directly from the mapped image, without the need to allocate storage for and copy it. That should speed the startup process considerably, at the cost of always mapping the dump image at the same location.

Paul Eggert worried about the potential security implications of losing ASLR protection for the bulk of the editor's data. Colascione responded that, since no part of the data image is marked executable, there is little risk of attackers running code from there. But, as Eggert pointed out, that view overlooks an important detail: that memory is full of Elisp bytecode that is executed in the editor itself, and which can do just about anything an attacker might want. So, if this approach is adopted, the fixed-location mapping might have to be turned off, at least by default.

There is, however, a bigger disagreement involving Zaretskii, who described this work as "a wrong direction." His objection, in short, is that this patch adds a lot of low-level complexity, implemented in C, that will be a maintenance burden going forward. That is, he said, a threat to the future of the project:

The number of people aboard who can matter-of-factly hack the Emacs internals on the C level is consistently going down, and is already so small they can be counted on one hand. We must make Emacs depend less on people from this small and diminishing group, if we want the development pace increased or even kept at its current level. To me, that means keep as many features out of low-level C, and limit futzing with C-level internals of Lisp objects and the Lisp interpreter to the absolute minimum.

It makes sense to put thought into the maintainability of the code base and how it can be evolved to attract more developers. It is not entirely clear, though, that C programmers are actually a dying breed — or that the long-term supply of Elisp developers is more certain. In any case, the Emacs community needs to fix the startup problem; those who oppose the portable dumper solution presumably have something else in mind.

Zaretskii's preferred solution would be to make the Elisp loader faster, to the point that it can be used to read Elisp code directly at startup time. That is a solution that others might like to see as well, but it has one significant shortcoming: no code toward that goal exists, and there are no signs that anybody is working in that area. Colascione's solution, instead, does exist and has an interested developer behind it. In almost any development project, working code and ongoing maintenance carries a lot of weight.

Zaretskii feels strongly enough about this issue that he has threatened to resign as co-maintainer if the portable dumper is adopted. He appears to be nearly alone in this stance, though. Colascione has said repeatedly that he sees no other way to get the required performance. Richard Stallman is guardedly favorable to this solution, noting that it will be far easier to maintain than the current unexec code. John Wiegley, the other Emacs co-maintainer, also favors going with the portable dumper code.

The wind thus appears to be blowing in the direction of adopting the portable dumper patch. Nobody seems to want to see Zaretskii relinquish the co-maintainer role (a role he only accepted last July), so, if the portable dumper is merged, the community can only hope that he will change his mind. Any large development project will occasionally make decisions that are opposed by some of its developers, even when those developers are maintainers. But the venerable Emacs editor will still be there, and will still have no end of other problems to solve.

