JavaScript Memory Usage

Web pages and applications these days are complex and memory-hungry things. A JavaScript VM has to handle megabytes and often gigabytes of objects being created and, sometime later, forgotten. It is critically important for performance to manage memory effectively, by reusing space that is no longer needed rather than requesting more from the OS, and avoiding wasted gaps between the objects that are in use.

This article describes the internal details of what the SpiderMonkey engine is doing in order to move from a conservative mark-and-sweep collector to a compacting generational collector. By doing so, we hope to improve performance by reducing paging and fragmentation, improving cache locality, and shaving off some allocation overhead.

Conservative Garbage Collection

Since June of 2010, Spidermonkey has used a conservative garbage collector for memory management. When memory is getting tight, or any of various other heuristics say it’s time, we stop and run a garbage collection (GC) pass to find any unneeded objects so that we can free them and release their space for other uses. To do this, we need to identify all of the live objects (so then everything left over is garbage.) The JSAPI (SpiderMonkey’s JavaScript engine API) provides a mechanism for declaring a limited set of objects are live. Those objects, and anything reachable from those objects, must not be discarded during a garbage collection (GC). However, it is common for other objects to be live yet not reachable from the JSAPI-specified set — for example, a newly-created object that will be inserted into another object as a property, but is still in a local variable when a GC is triggered. The conservative collector scans the CPU registers and stack for anything that looks like a pointer to the heap managed by the garbage collector (GC). Anything found is marked as being live and added to the preexisting set of known-live pointers using a fairly standard incremental mark-and-sweep collection.

Conservative garbage collection is great. It costs nothing to the main program — the “mutator”1 in GC-speak — and makes for very clean and efficient pointer handling in the embedding API. An embedder can store pointers to objects managed by the Spidermonkey garbage collector (called “GC things“) in local variables or pass them through parameters and everything will be just fine.

Consider, for example, the following code using SpiderMonkey’s C++ API :

JSObject *obj1 = JS_NewObject(cx); JSObject *obj2 = JS_NewObject(cx); JS_DefineProperty(obj2, "myprop", obj1);

In the absence of conservative scanning, this would be a serious error: if the second call to JS_NewObject() happens to trigger a collection, then how is the collector to know that obj1 is still needed? It only appears on the stack (or perhaps in a register), and is not reachable from anywhere else. Thus, obj1 would appear to be dead and would get recycled. The JS_SetProperty() call would then operate on an invalid object, and the bad guys would use this to take over your computer and force it to work as a slave in a global password-breaking collective. With conservative scanning, the garbage collector will see obj1 sitting on the stack (or in a register), and prevent it from being swept away with the real trash.

Exact Rooting

The alternative to conservative scanning is called “exact rooting”, where the coder is responsible for registering all live pointers with the garbage collector. An example would look like this:

JSObject *obj1 = JS_NewObject(cx); JS_AddObjectRoot(cx, obj1); JSObject *obj2 = JS_NewObject(cx); JS_DefineProperty(obj2, "myprop", obj1); JS_RemoveObjectRoot(cx, obj1);

This, to put it mildly, is a big pain. It’s annoying and error-prone — notice that forgetting to remove a root (e.g. during error handling) is a memory leak. Forgetting to add a root may lead to using invalid memory, and is often exploitable. Language implementations with automatic memory management often start out using exact rooting. At some point, all the careful rooting code gets to be so painful to maintain that a conservative scanner is introduced. The embedding API gets dramatically simpler, everyone cheers and breathes a sigh of relief, and work moves on (at a quicker pace!)

Sadly, conservative collection has a fatal flaw: GC things cannot move. If something moved, you would need to update all pointers to that thing — but the conservative scanner can only tell you what might be a pointer, not what definitely is a pointer. Your program might be using an integer whose value happens to look like a pointer to a GC thing that is being moved, and updating that “pointer” will corrupt the value. Or you might have a string whose ASCII or UTF8 encoded values happen to look like a pointer, and updating that pointer will rewrite the string into a devastating personal insult.

So why do we care? Well, being able to move objects is a prerequisite for many advanced memory management facilities such as compacting and generational garbage collection. These techniques can produce dramatic improvements in collection pause times, allocation performance, and memory usage. SpiderMonkey would really like to be able to make use of these techniques. Unfortunately, after years of working with a conservative collector, we have accumulated a large body of code both inside SpiderMonkey and in the Gecko embedding that assumes a conservative scanner: pointers to GC things are littered all over the stack, tucked away in foreign data structures (i.e., data structures not managed by the GC), used as keys in hashtables, tagged with metadata in their lower bits, etc.

Handle API for Exact Rooting

Over a year ago, the SpiderMonkey team began an effort to claw our way back to an exact rooting scheme. The basic approach is to add a layer of indirection to all GC thing pointer uses: rather than passing around direct pointers to movable GC things, we pass around pointers to GC thing pointers (“double pointers”, as in JSObject** ). These Handles, implemented with a C++ Handle<T> template type, introduce a small cost to accesses since we now have to dereference twice to get to the actual data rather than once. But it also means that we can update the GC thing pointers at will, and all subsequent accesses will go to the right place automatically.

In addition, we added a “Rooted<T>” template that uses C++ RAII to automatically register GC thing pointers on the stack with the garbage collector upon creation^n. By depending on LIFO (stack-like) ordering, we can implement this with only a few store instructions per local variable definition. A Rooted pointer is really just a regular pointer sitting on the stack, except it will be registered with the garbage collector when its scope is entered and unregistered when its scope terminates. Using Rooted, the equivalent of the code at the top would look like:

Rooted<JSObject*> obj1(cx, JS_NewObject(cx)); Rooted<JSObject*> obj2(cx, JS_NewObject(cx)); JS_DefineProperty(obj2, "myprop", obj1);

Is it as clean as the original? No. But that’s the cost of doing exact rooting in a language like C++. And it really isn’t bad, at least most of the time. To complete the example, note that a Rooted<T> can be automatically converted to a Handle<T>. So now let’s consider

bool somefunc(JSContext *cx, JSObject *obj) { JSObject *obj2 = JS_NewObject(cx); return JS_GetFlavor(cx, obj) == JS_GetFlavor(cx, obj2); }

(The JS_GetFlavor function is fictional.) This function is completely unsafe with a moving GC and would need to be changed to

bool somefunc(JSContext *cx, Handle<JSObject*> obj) { Rooted<JSObject*> obj2(cx, JS_NewObject(cx)); return JS_GetFlavor(cx, obj) == JS_GetFlavor(cx, obj2); }

Now ‘obj’ here is a double pointer, pointing to some Rooted<JSObject*> value. If JS_NewObject happens to GC and change the address of *obj, then the final line will still work fine because JS_GetFlavor takes the Handle<JSObject*> and passes it through, and within JS_GetFlavor it will do a double-dereference to retrieve the object’s “flavor” (whatever that might mean) from its new location.

One added wrinkle to keep in mind with moving GC is that you must register every GC thing pointer with the garbage collector subsystem. With a non-moving collector, as long as you have at least one pointer to every live object, you’re good — the garbage collector won’t treat the object as garbage, so all of those pointers will remain valid. But with a moving collector, that’s not enough: if you have two pointers to a GC thing and only register one with the GC, then the GC will only update one of those pointers for you. The other pointer will continue to point to the old location, and Bad Things will happen.

For details of the stack rooting API, see js/public/RootingAPI.h. Also note that the JIT needs to handle moving GC pointers as well, and obviously cannot depend on the static type system. It manually keeps track of movable, collectible pointers and has a mechanism for either updating or recompiling code with “burned in” pointers.

Analyses and Progress

Static types

We initially started by setting up a C++ type hierarchy for GC pointers, so that any failure to root would be caught by the compiler. For example, changing functions to take Handle<JSObject*> will cause compilation to fail if any caller tries to pass in a raw (unrooted) JSObject*. Sadly, the type system is simply not expressive enough for this to be a complete solution, so we require additional tools.

Dynamic stack rooting analysis

One such tool is a dynamic analysis: at any point that might possibly GC, we can use the conservative stack scanning machinery to walk up the stack, find all pointers into the JS heap, and check whether each such pointer is properly rooted. If it is not, the conservative scanner mangles the pointer so that it points to an invalid memory location. It can’t just report a failure, because many stack locations contain stale data that is already dead at the time the GC is called (or was never live in the first place, perhaps because it was padding or only used in a different branch). If anything attempts to use the poisoned pointer, it will most likely crash.

The dynamic analysis has false positives — for example, when you store an integer on the stack that, when interpreted as a pointer, just happens to point into the JS heap — and false negatives, where incorrect code is never executed during our test suite so the analysis never finds it. It also isn’t very good for measuring progress towards the full exact rooting goal: each test crashes or doesn’t, and a crash could be from any number of flaws. It’s also a

slow cycle for going through hazards one by one.

Static stack rooting analysis

For these reasons, we also have a static analysis2 to report all occurrences of unrooted pointers across function calls that could trigger a collection. These can be counted, grouped by file or type, and graphed over time. The static analysis still finds some false positives, e.g. when a function pointer is called (the analysis assumes that any such call may GC unless told otherwise.) It can also miss some hazards, resulting in false negatives, for example when a pointer to a GC pointer is passed around and accessed after a GC-ing function call. For that specific class of errors, the static analysis produces a secondary report of code where the address of an unrooted GC pointer flows into something that could GC. This analysis is more conservative than the other (i.e., it has more false positives; cases where it reports a problem that would not affect anything in practice) because it is easily possible for the pointed-to pointer to not really be live across a GC. For example, a simple GC pointer outparam where the initial value is never used could fit in this category:

Value v; JS_GetSomething(cx, &v); if (...some use of v...) { ... }

The address of an unrooted GC thing v is being taken, and if JS_GetSomething triggers a GC, it appears that it could be modified and used after the JS_GetSomething call returns. Except that on entry to JS_GetSomething, v is uninitialized, so we’re probably safe. But not necessarily — if JS_GetSomething invokes GC twice, once after setting v through its pointer, then this really is a hazard. The poor static analysis, which takes into account very limited data flow, cannot know.

The static analysis gathers a variety of information that is independently useful to implementers, which we have exposed both as static files and through an IRC bot3. The bot watches for changes in the number of hazards to detect regressions, and also so we can cheer on improvements. You can ask it whether a given function might trigger a garbage collection, and if so, it will respond with a potential call chain from that function to GC. The bot also gives links to the detailed list of rooting hazards and descriptions of each.

Currently, only a small handful of reported hazards remain that are not known to be false positives. In fact, as of June 25, 2013 an exact-rooted build of the full desktop browser (with the conservative scanner disabled) passes all tests!

It is very rare for a language implementation to return to exact rooting after enjoying the convenience of a conservative garbage collector. It’s a lot of work, a lot of tricky code, and many infuriatingly difficult bugs to track down. (GC bugs are inordinately sensitive to their environment, timing, and how long it has been since you’ve washed your socks. They also manifest themselves as crashes far removed from the actual bug.)

Generational Garbage Collection

In parallel with the exact rooting effort, we4 began implementing a generational garbage collector (“GGC”) to take advantage of our new ability to move GC things around in memory. (GGC depends on exact rooting, but the work can proceed independently.) At a basic level, GGC divides the JS heap into separate regions. Most new objects are allocated into the nursery, and the majority of these objects will be garbage by the time the garbage collector is invoked5. Nursery objects that survive long enough are transferred into the tenured storage area. Garbage collections are categorized as “minor GCs”, where garbage is only swept out of the nursery, and “major GCs”, where you take a longer pause to scan for garbage throughout the heap. Most of the time, it is expected that a minor GC will free up enough memory for the mutator to continue on doing its work. The result is faster allocation, since most allocations just append to the nursery; less fragmentation, because most fragmentation is due to short-lived objects forming gaps between their longer-lived brethren; and reduced pause times and frequency, since the minor GCs are quicker and the major GCs are less common.

Current Status

GGC is currently implemented and capable of running the JS shell. Although we have just begun tuning it, it is visible in our collection of benchmarks on the GGC line at www.arewefastyet.com. At the present time, GGC gives a nice speedup on some individual benchmarks, and a substantial slowdown on others.

Write Barriers

The whole generational collection scheme depends on minor GCs being fast, which is accomplished by keeping track of all external pointers into the nursery. A minor GC only needs to scan through those pointers plus the pointers within the nursery itself in order to figure out what to keep, but in order to accomplish this, the external pointer set must be accurately maintained at all time.

The mutator maintains the set of external pointers by calling write barriers on any modification of a GC thing pointer in the heap. If there is an object o somewhere in the heap and the mutator sets a property on o to point to another GC thing, then the write barrier will check if that new pointer is a pointer into the nursery. If so, it records it in a store buffer. A minor GC scans through the store buffer to update its collection of pointers from outside the nursery to within it.

Write barriers are an additional implementation task required for GGC. We again use the C++ type system to enforce the barriers in most cases — see HeapPtr<T> for inside SpiderMonkey, Heap<T> for outside. Any writes to a Heap<T> will invoke the appropriate write barrier code. Unsurprisingly, many other cases are not so simply solved.

Barrier Verifier

We have a separate dynamic analysis to support the write barriers. It empties out the nursery with a minor GC to provide a clean slate, then runs the mutator for a while to populate the store buffers. Finally, it scans the entire tenured heap to find all pointers into the nursery, verifying that all such pointers are captured in the store buffer.

Remaining Tasks

While GGC already works in the shell, work remains before it can be used in the full browser. There are many places where a post write barrier is not straightforward, and we are manually fixing those. Fuzz tests are still finding bugs, so those need to be tracked down. Finally, we won’t turn this stuff on until it’s a net performance win, and we have just started to work through the various performance faults. The early results look promising.

Credits

Much of the foundational work was done by Brian Hackett, with Bill McCloskey helping with the integration with the existing GC code. The heavy lifting for the exact stack rooting was done by Terrence Cole and Jon Coppeard, assisted by Steve Fink and Nicholas Nethercote and a large group of JS and Gecko developers. Boris Zbarsky and David Zbarsky in particular did the bulk of the Gecko rooting work, with Tom Schuster and man of mystery Ms2ger pitching in for the SpiderMonkey and XPConnect rooting. Terrence did the core work on the generational collector, with Jon implementing much of the Gecko write barriers. Brian implemented the static rooting analysis, with Steve automating it and exposing its results. A special thanks to the official project masseuse, who… wait a minute. We don’t have a project masseuse! We’ve been cheated!

Footnotes