Globals and singletons are already well-known as a design antipattern, but they have an interesting additional cost. Consider a global (I include file-level static in this category) value that has initialization code. That code must be run at startup (which leads to the static initialization order fiasco, though that is not the point of this post).

Because this initialization code is run at startup, before even main() is entered, it is in the critical path for startup. It turns out that even simple code must be paged in off disk, which can lead to disk seeks, and disk seeks murder your startup performance.

This is not hypothetical: with ChromeOS we found that innocuous-seeming static initializers in Chrome were actually affecting the bottom line of startup performance. (Note: that observation comes from a coworker; I'm not sure whether he was using a non-SSD machine at the time or if it also happens on SSDs. Just guessing, but paging in more code, especially code that is non-contiguous, must have some non-zero cost even on the SSDs that ChromeOS relies upon.)

Because of this cost we attempt to track static initialization on our performance bots and prevent new checkins from adding more. (Ideally we'd remove them all but progress is slow.) I recently looked into how this works and I thought it'd be useful to write it down before I forget.

How constructors are implemented

The compiler creates, for each object file, a function that contains the constructors for the file. Pointers to these functions are collected in a table at link time. At startup, __do_global_ctors_aux iterates through the table and calls each function. (Here's a nice page that walks through the disassembly.) Conceptually, to judge the cost of all static constructors you might want to do something like sum the size of all of these functions, but for our purposes we care about disk seeks; even doing more work in a single static constructor is fine if we reduce the total number of functions paged in, which means the size of the constructor table is the statistic of interest.

The table of functions shows up as the .ctors section of the executable. You can dump table via commands like (note that the first entry is -1, the rest are addresses):

$ objdump --full-content --section=.ctors path/to/binary

or in gdb,

(gdb) x/1000xg &__CTOR_LIST__

The gdb output is perhaps useful since it will decode little-endian for you. (N.b. that "g" trailing the "x" command prints 64-bit pointers; adjust as necessary locally.)

For a Chrome binary I glanced at the ctor list appears to be in pointer order, which means you can see how much of the resulting binary they span by subtracting the last entry from the first. From my random debugging build: 30mb, not good.

Constructors versus static initialization

Note that data that is initialized to a constant is implemented in a different way: the constant value can just be placed in the right place at compile time, so there is no cost. In contrast, C++ objects that have constructors involve code and must be computed at runtime. You'll also sometimes encounter code that initializes variables with function calls (like we did with the mysterious IcedTea crash).

You might also notice that static data can be shared between multiple instances of the same executable, while initialized memory is private; see my post about how memory works for more on that.

I noticed with some interest that the Go programming language, designed in part by compiler hackers, neatly sidesteps some of the above problems: by defining initialization order carefully ("The importing of packages, by construction, guarantees that there can be no cyclic dependencies in initialization.") and by only allowing simple values as constant initializers. See their manual for more.

What to do about it

Mozilla hackers have found that Linux is pathologically bad in how it runs the resulting ctor list, and it looks like they have at least considered fixing that manually. We have chatted about doing the same, but fundamentally I believe the way to keep startup fast is to do less. See also my earlier post about performance.

It appears that the generated functions that run these constructors get names starting with _GLOBAL__I_ . This means a call like

$ nm out/Debug/chrome | grep _GLOBAL__I

will dump a list of all files that have a global constructor. Go delete some code!