In short: more adaptation to real-use cases, and more unwind speedups.

Benoit redid time-unit measurements so as to support profiling with sampling intervals of less than one millisecond. This will help profiling graphics and animation code (820048).

Gijs Kruitbosch made it possible to control the size of the profiler’s circular buffer using the MOZ_PROFILER_ENTRIES environment variable (901481).

:roc and :avih taught the compositor to notify the profiler of missed (graphics) frames, so as to help monitor and debug smoothness issues (900785).

glibc backtrace() unwinding on Linux was removed, as it risked deadlock, and the breakpad unwinder made it redundant (880158).

There was work to improve the performance of CFI and EXIDX native unwinding, by improving performance of the data structures accessed by the CFI/EXIDX unwind algorithm:

speedups for GetModuleForAddress — replaced a linear search with a binary search (892774).

don’t generate useless frames via stack scanning, which turned out to be happening even though we thought stack scanning was disabled (894264).

the unwinder spends considerable effort repeatedly re-generating unwind rules for the same relatively small set of code addresses. Caching them makes a big difference (893542).

With the above fixes in place, native unwinding performance for x86_64-linux costs about 8100 instructions per frame, or around 145k instructions/unwind for an average 18-frame unwind.

There are still opportunities to remove inefficiency, but they are becoming scarcer. It might be possible to get to about 5000 insns/frame without too much extra effort, but below that could be difficult.