June 13, 2019

Building Sentry: Symbolicator

Welcome to our series of blog posts about all the nitty-gritty details that go into building a great debug experience at scale. Today, we’re looking at Symbolicator, the service that processes all native crash reports and minidumps at Sentry. At Sentry, we live to provide the best user experience possible. Over the years, this has led us to optimize Sentry for various platforms and frameworks, such as .NET, Java, Python, JavaScript, and React Native. In addition to the SDKs you use every day to collect events and context from your apps, we also built a fair amount of server-side processing to provide you with high-quality reports. Over two years ago, Sentry started supporting its first native platform: iOS. Since then, we’ve added support for many other platforms via minidumps and recently introduced our own SDK for native applications to make capturing all that precious information more accessible. Now, the time has come to lift the curtain and show you how we handle native crashes in Sentry. Join us on a multi-year journey from our first baby-steps at native crash analysis to Symbolicator, the reusable open-source service that we’ve built to make native crash reporting easier than ever. Native code is different Building a solution for native apps has been a particularly interesting task for us. Due to the lack of a runtime, extracting stack traces or other useful context information is much harder. Without a safety net, native applications can easily crash to a point beyond recovery, with little possibility of sending a crash report to Sentry. Build systems are also vastly different from other ecosystems, as there are many more choices and configuration options, making it harder to provide a streamlined solution. The common ground for all native applications is that they compile to machine code. At this level, variable and type names are typically gone. Function names only exist in a mangled form, or even optimized away completely. Call stacks of every running thread exist as binary memory regions, with a format defined by the ABI of the CPU or even left up to the compiler implementation. When native applications crash, for example, due to invalid memory access or illegal instruction, the operating system kernel stops their execution. Depending on the system, a signal is sent to the application, which can be used to react to the crash and run crucial operations before terminating completely. However, since signal handlers aren’t allowed to run any unsafe code, their capabilities are quite limited. Debuggers — and also Sentry — are presented with the challenge of reading a stack trace from the threads’ call stack memory region, symbolicating them into human-readable function names, and, finally, enriching them with additional information.

Stacking your cards frames In our initial implementation for iOS, we relied on an open-source framework called KSCrash to create a signal handler that catches crashes and computes stack traces. Since iOS is particularly restrictive, we dumped this information into a temporary location and let the application terminate. On the next application launch, our iOS SDK created an event and sent it off to Sentry. To understand how KSCrash works, we have to take a look at stack memory: the call stack is a continuous list of variable-length frame records. Such a record is pushed when a subroutine is invoked and contains its parameters, local variables, and temporaries. More importantly, however, it includes a special address — the so-called return address, which is an instruction pointer telling the CPU where to continue execution once the subroutine completes.

At any given time, the CPU keeps track of two special pointers in its registers: the current code instruction that is being executed, and the top of the stack. By looking at the top stack frame, one can obtain the return address, i.e., the instruction pointer of the parent frame once it returns. Repeat that for all frames, and you’re left with your stack trace, right? Well… almost. What we actually have is just the list of return addresses. As a developer, however, you’re interested in where your function was called, and, luckily, there are heuristics to get the actual caller address from the return address. The simplest would be to simply subtract one instruction, as in many cases the call and the return point are subsequent.

Walking with dinosaurs Life is not that simple. In order to walk through frame records in the call stack memory, you need to know the size of each individual frame. To make this easier, compilers emit a frame pointer into every frame pointing to the parent frame. Since the frame pointer is not needed during actual execution, it is usually omitted in release builds to save a few bytes. Fortunately, debuggers aren’t the only tools interested in being able to walk up the stack. Welcome to the stage: exception handlers. That’s right; every time you throw an exception, some built-in routine needs to unwind the call stack until it hits a frame with an exception handler. For this purpose, compilers are emitting so-called unwind information, or call frame information. Unwind information indicates the size and contents of all function frame records so that the application or a debugger can walk the stack and extract values like the return address. The information is stored in a condensed binary form in a separate section of the executable so that it can quickly be processed. Convenient, right? Well… almost. There are still vast differences between various operating systems and CPU architectures. The effective strategy for stackwalking depends on a combination of both and requires reading unwind information in various formats. Unfortunately, the information emitted by compilers is not always accurate or complete due to specific optimizations. Also, certain programs manually manipulate the stack, which can lead to entirely different effects. At Sentry, we’ve incorporated unwind information handling into our symbolic library. It is built on top of amazing open-source Rust libraries goblin and pdb , which provide the lower-level parsing of the different file formats and binary representation. Over time, we also had the pleasure to contribute to those libraries and fix certain edge cases or implement recent additions to the file standards. For platforms other than iOS, we use Google’s Breakpad library to generate minidump crash reports and then process them on our servers. This library contains stack walkers for the most prevalent CPU architectures, which we feed with the unwind information they require to do their job. (Debug) information is gold So far on our journey to native crash analysis, we have obtained a list of instruction pointer addresses. That’s barely a stack trace you could use to debug your applications. You need function names and line numbers. The final executable no longer needs to know the names of variables or the files that your code was declared in. Sometimes, not even function names play a role anymore. To ensure that developers can still inspect their applications, compilers, therefore, output debug information containing data to connect the optimized instructions with their source code. However, this debug information can get large. It’s not uncommon to encounter debug information 10 times the size of the executable. For this reason, debug information is often moved (or stripped) to separate companion files. They are commonly referred to as Debug Information Files, or Debug Symbols. On Windows, they carry a .pdb extension, on macOS, they are .dSYM folder structures, and, on Linux, there is a convention to put them in .debug files.

The internal format of these files also varies — while macOS and Linux generally use the open-source DWARF standard, Microsoft implemented their proprietary CodeView that was eventually open-sourced at the request of the LLVM project. At the heart of each debug information file are tree-like structures explaining the contents of every compilation unit. They contain all types, functions, parameters as well as variables, scopes, and more. Additionally, there are mappings of these structures to instruction pointer addresses in the source code as well as the file and line number where they are declared. This is precisely the information needed to turn instruction addresses into human-readable stack traces. And, this is essentially what debuggers do; they look up the respective instruction address in the debug file and then display all sorts of stored information. For example, inline functions that no longer have their own stack record as their code has been moved into another function.