After many years programming solely on Windows I have recently started working on Linux. This is the third post in a series of tutorials that will share what I have learned about handling Linux symbols. This third post explains how to more easily get symbols, and compares Linux to the situation on Windows.

One advantage of working on multiple platforms is the opportunity to realize what each one is missing. There are parts of Linux that I miss when I am working on Windows and (heresy!) there are things about Windows that I miss when I am working on Linux. Symbol handling is one of those.

Here’s a recap of the posts on this topic to date:

There are also some posts on this topic from the future:

Windows

After working on Windows for many years I have grown accustomed to having symbols show up automatically. Debuggers and profilers on Windows use symbol servers to automatically retrieve symbols. Typically they are configured to retrieve symbols for Windows and other Microsoft products, and perhaps other external products such as Mozilla or SourceMod. Symbol servers should also be used to store and retrieve your company’s own private symbols – if you are a Windows developer and you aren’t doing this then you should fix that right now.

Symbol files are retrieved from the configured symbol stores based on unique identifiers embedded in DLLs and EXEs. When doing post-mortem debugging even the DLLs and EXEs can be automatically retrieved from the symbol store based on a few identifiers recorded in the crash dump. The mechanism is described in more detail here.

This combination is powerful. I can take a crash dump file from a customer running an arbitrary version of our games, running on an arbitrary version of Windows, load it into the debugger and have all of the symbols automatically show up. I don’t need to know or care what version of the game or what service pack of Windows the customer was running, and I don’t need to think about package names or new repositories. The symbols and binaries Just. Show. Up.

The convenience of this is hard to overstate. No special knowledge is required, and no time is wasted tracking down symbols. They just appear, like magic. Anything that requires more knowledge or more time is, by definition, less efficient.

It is certainly true that the symbols you get, from Microsoft anyway, are missing source information and some other details. However the depth of the symbols is a separate issue from how you get the symbols.

Automatic symbol finding is critical for batch processing. When I have a collection of Windows crashes I can run a batch file (batch files are like shell scripts only much less powerful) to do bulk analysis of the crashes, including generating call stacks for all of them. The value of this fully automated system should be obvious.

Occasionally symbol resolution on Windows goes awry – I’ve seen it be unusably slow, and it’s frustrating that proprietary graphics driver symbols are never available – but normally it works perfectly.

Linux as it is

Acquiring symbols on Linux could be equally easy. But it isn’t. It’s taken me months of learning and two blog posts in order to document some ad-hoc methods that will usually allow you to track down the symbols that you need. It’s time consuming, and inconvenient, it raises the barrier to entry, and I know that it can be easier.

In my previous posts I described a series of ad-hoc methods for getting symbols. To get symbols for crashes on your machine or your distribution of Linux this will usually work:

Add the ddebs repository if you haven’t already (Ubuntu specific)

For each shared object find out what package it comes from

Add -dbgsym to that package name and install that, if it exists

If the crash occurred on a different Linux distribution then the steps are different:

Install the distribution where the crash occurred Note that all the following steps are implemented differently for Debian versus Red Hat style package systems

For each shared object find out what package it comes from

For each package find out what URL its debug version can be downloaded from, if it exists

Download the debug package

Extract the contents

Put the symbols where you can use them

In short, tracking down symbols on Linux is time-consuming, requires significant expert knowledge, and sometimes doesn’t work. However it doesn’t have to be this way.

Linux as it could be

In 2007 a proposal was made to embed build IDs in all Linux binaries. These build IDs – typically 40 character hex numbers – were designed to be used as a search key to find a particular version of a particular binary, and its symbols. Since that initial proposal most of the work to implement it has been done. The gcc toolchain defaults to putting build IDs in ELF files, most Linux distributions put build IDs in core files (except Ubuntu, but I’ve filed a bug to get that fixed), and gdb supports finding symbols on the local file system using build IDs.

However what is still missing is a way to download symbols to your local system, based purely on a build ID. And that is the part that could really make build IDs worthwhile.

Fedora has started trying to solve this with their darkserver project. The idea is that you just append the build ID that you are interested in to a URL and you can then download full package details. This works for their canonical sample, shown here:

However it failed when I used it to find symbols that I actually wanted, such as the symbols for libc-2.16.so for Fedora Core 18 (as discussed on the previous installment of this series). The Build ID for libc-2.16.so is cf7bdd994de74c7d4a0cff6a0293d96b64681e06, but as of today nothing is to be found at this address:

Update: darkserver has been fixed and is now (as of March 2013) being kept up to date with Fedora releases, with plans to support finding Ubuntu symbols also. See Fedora Fixes for details.

Another Fedora option is this command that asks yum to install the file that will be referenced through the standard build ID derived symlink (note the slash after ‘8b’):

yum –enablerepo=fedora-debuginfo install /usr/lib/debug/.build-id/8b/d8064a80f57906f7e21504f13a86110cdb4535.debug

This only works for the latest/updated release and only for the primary architecture and I have not tested it. A final Fedora option is this option, which doesn’t require root access:

echo 8bd8064a80f57906f7e21504f13a86110cdb4535 \

| abrt-action-install-debuginfo –ids=- –cache=/tmp

I assume that the latest/updated release and primary architecture restrictions still apply, making these techniques good for local debugging, but not for analysis of crashes from remote machines, especially if the developer is running a different distribution.

A solution for Ubuntu symbols

Anybody can come along and say that something is broken, but that’s no way to make friends. So, I spent some extra time to create a solution, for Ubuntu anyway.

The Ubuntu symbol packages can be found at http://ddebs.ubuntu.com. Within the various sections of this web site there are Packages files that list all of the packages for various versions of Ubuntu – precise, precise-updates, quantal-security, etc. Those Packages files are almost what I need, but they lack build IDs. All that is needed was a Python script to parse the Packages files, download each .ddeb file listed, unpack it, get build IDs for the installed files, and append the data to an enhanced Packages file. Something like this pseudo-code:

for packagesURL in packagesURLs:

wget packagesURL

for ddebName in Packages:

wget ddebName

ar -x ddebName

tar -xf data.tar.*z

for file in unpackedFiles:

buildID = GetBuildID()

AppendToPackagesFile(buildID, file, ddebName)

After running this over precise and precise-updates I have an enhanced Packages file that I can query using grep to trivially find what file and package are associated with any build ID. Just run “readelf -n” on the file of interest – or look at the debug identifier in a breakpad crash report – and then run grep on the enhanced Packages file.

To get the download URL for the C++ runtime symbols, libstdc++.so.6 just do this (output word wrapped for readability):

$ grep 96b9cb6b542dac65f995ff4b2a68213653b86f02 Packages BuildID:

96b9cb6b542dac65f995ff4b2a68213653b86f02

/usr/lib/debug/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16

http://ddebs.ubuntu.com/pool/main/g/gcc-4.6/libstdc++6-dbgsym_4.6.3-1ubuntu5_i386.ddeb

In my last episode I needed symbols for libX11 for a customer crash. I only had a 32-character breakpad debug identifier but that is plenty for searching the enhanced Packages file. Just remember to fix the endianness of the breakpad debug identifier before grepping:

$ grep 2aba5398f777cdafba76dfaf3900bd79 Packages

BuildID:

2aba5398f777cdafba76dfaf3900bd794b39bf34

/usr/lib/debug/usr/lib/i386-linux-gnu/libX11.so.6.3.0

http://ddebs.ubuntu.com/pool/main/libx/libx11/libx11-6-dbgsym_1.4.99.1-0ubuntu2_i386.ddeb

BuildID:

2aba5398f777cdafba76dfaf3900bd794b39bf34

/usr/lib/debug/usr/lib/libX11.so.6.3.0

http://ddebs.ubuntu.com/pool/main/libx/libx11/libx11-6-udeb-dbgsym_1.4.99.1-0ubuntu2_i386.ddeb

Curiously this same technique fails when used to find symbols for the latest libc6 on Ubuntu 12.04. Apparently those symbols are not available in the ddebs repository, and I didn’t index the main repository, so this query returns nothing.

$ grep bc99bb8745130f34a31106951ceccfd9dc3295b4 Packages

It would be quite trivial to automate the process of downloading, extracting, and using the symbols that are found using this build ID technique. If the major Linux distributions made build ID information available in this form then the symbol problem would be solved – developers would just need to download Packages files from the distributions that they cared about. Until then, in order to save every Linux developer from using my script to download every package from Canonical’s site, I’m making my enhanced Packages file available here. It’s probably already out of date, but I will try to update it occasionally.

Any other thoughts or ideas on this topic are appreciated – put your suggestions in the comments.

Conclusion

There is no fundamental reason why symbols can’t be downloaded either automatically or effortlessly on Linux. There is no fundamental reason why this can’t work even when you’re analyzing a core file that was recorded on a different distribution. Linux is gradually moving in this direction, but right now Microsoft wins the “convenience” prize hands down.

I hope that my demonstration of how Linux symbol finding could be enhanced is useful as a proof of concept, or as an immediately useful way of locating symbol files on Ubuntu.