Today I am releasing Bloaty McBloatface 1.0. Bloaty is a size profiler for binaries. It helps you peek into ELF/Mach-O binaries to see what is taking up space inside.

Bloaty has gotten lots new features, bugfixes, and overall improvements since I announced it in 2016. I listed these changes briefly on the release page, but I wanted to go into a bit more detail here.

Improving Data Quality

Perhaps the biggest overall improvement to Bloaty is its data quality. When I first announced Bloaty, I got very understandable complaints like this one:

I ran it and it gives an awful lot of “[None]”: $ ~/d/bloaty/bloaty builder/virt-builder -d compileunits VM SIZE FILE SIZE -------------- -------------- 75.5% 1.96Mi [None] 3.67Mi 85.2% 8.7% 232Ki guestfs-c-actions.c 232Ki 5.3% 8.2% 219Ki guestfs.ml 219Ki 5.0% 2.0% 52.4Ki [Other] 52.4Ki 1.2% 1.3% 33.7Ki _none_ 33.7Ki 0.8% 0.7% 17.5Ki customize_cmdline.ml 17.5Ki 0.4% 0.6% 17.3Ki builder.ml 17.3Ki 0.4% 0.4% 11.8Ki customize_run.ml 11.8Ki 0.3% 0.4% 10.4Ki cmdline.ml 10.4Ki 0.2% 0.3% 7.08Ki firstboot.ml 7.08Ki 0.2% 0.2% 6.21Ki index-scan.c 6.21Ki 0.1% 0.2% 5.90Ki index_parser.ml 5.90Ki 0.1% 0.2% 5.15Ki sigchecker.ml 5.15Ki 0.1% 0.2% 4.87Ki getopt-c.c 4.87Ki 0.1% [...] It’s a mixed OCaml/C executable, but I ran it on a build from the local directory and all debug symbols are still available.

Indeed, a profiler tool that has no idea what to say about 85.2% of the binary is not going to be very useful. This was Bloaty’s biggest weakness when I first released it.

At first I misunderstood the nature of this problem. Bloaty’s design at the time was simple: it was reading .debug_aranges to assign ranges of the binary to compilation units. DWARF’s .debug_aranges section is an {address range -> compileunit} map that debuggers use to decide what compile unit a given function or data variable is from, given its address.

The output above indicates that .debug_aranges was only covering about 15% of the binary. What gives? My theory at the time was that .debug_aranges should theoretically be covering the whole binary, but was pretty incomplete for some reason. It seemed like a compiler problem that I was going to have to work around somehow.

Later I realized that .debug_aranges is only meant for identifying addresses of functions or data. Large portions of the binary are not functions or program data! For example, ELF/Mach-O binaries have all sort of stuff in them like:

symbol tables

relocations

debug information

unwind information

To get better results, I needed to find a way to break down these sections. I needed to determine which parts of the unwind information, for example, I could attribute to each function.

To achieve this, I had to parse the binary more thoroughly than I had before. I had to learn to parse unwind information ( .eh_frame and .eh_frame_hdr sections), which is a really esoteric and low-level thing to be doing. I’ll quote my comment in the code about how tricky this is to do correctly:

// Code to read the .eh_frame section. This is not technically DWARF, but it // is similar to .debug_frame (which is DWARF) so it's convenient to put it // here. // // The best documentation I can find for this format comes from: // // * http://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html // * https://www.airs.com/blog/archives/460 // // However these are both under-specified. Some details are not mentioned in // either of these (for example, the fact that the function length uses the FDE // encoding, but always absolute). libdwarf's implementation contains a comment // saying "It is not clear if this is entirely correct". Basically the only // thing you can trust for some of these details is the code that actually // implements unwinding in production: // // * libunwind http://www.nongnu.org/libunwind/ // https://github.com/pathscale/libunwind/blob/master/src/dwarf/Gfde.c // * LLVM libunwind (a different project!!) // https://github.com/llvm-mirror/libunwind/blob/master/src/DwarfParser.hpp // * libgcc // https://github.com/gcc-mirror/gcc/blob/master/libgcc/unwind-dw2-fde.c

Once I implemented this parser, I could attribute the .eh_frame and .eh_frame_hdr sections properly, and they would no longer show up as [None] .

I did the same thing for all the different kinds of DWARF debug info, for the symbol/string table and relocations. All of these are somewhat easier since they at least have clear standards that describe them.

But even that wasn’t enough. After implementing all of the above, I still found that some parts of the data section don’t have symbol table entries or debug info at all. Data like string constants or other anonymous data can resist being properly analyzed and attributed. To combat this, Bloaty will actually disassemble the binary looking for references to the data section. If a function references part of .data or .rodata , then we can attribute that part of the binary to the function that references it.

This was hard and detailed work, but it paid off. We can see the fruits of this labor if we do a hierarchical profile:

$ ./bloaty bloaty -d compileunits,sections VM SIZE FILE SIZE -------------- -------------- 44.9% 2.07Mi [136 Others] 8.72Mi 33.7% 6.0% 281Ki protobuf/src/google/protobuf/descriptor.cc 4.07Mi 15.7% 0.0% 0 .debug_str 1.16Mi 28.4% 0.0% 0 .debug_info 1.01Mi 24.8% 0.0% 0 .debug_loc 766Ki 18.4% 0.0% 0 .debug_pubnames 383Ki 9.2% 69.6% 195Ki .text 195Ki 4.7% 0.0% 0 .debug_line 177Ki 4.3% 0.0% 0 .debug_pubtypes 158Ki 3.8% 0.0% 0 .debug_ranges 131Ki 3.2% 0.0% 0 .strtab 44.6Ki 1.1% 14.6% 41.2Ki .dynstr 41.2Ki 1.0% 7.1% 19.8Ki .eh_frame 19.8Ki 0.5% 4.6% 12.8Ki .rodata 12.8Ki 0.3% 0.0% 0 .symtab 9.45Ki 0.2% 3.1% 8.62Ki .dynsym 8.62Ki 0.2% 1.0% 2.79Ki .eh_frame_hdr 2.79Ki 0.1% 0.0% 88 .bss 0 0.0% 6.5% 306Ki protobuf/src/google/protobuf/descriptor.pb.cc 2.38Mi 9.2% 0.0% 0 .debug_info 660Ki 27.1% 0.0% 0 .debug_loc 620Ki 25.4% 0.0% 0 .debug_str 256Ki 10.5% 0.0% 0 .debug_pubnames 166Ki 6.8% 0.0% 0 .debug_line 163Ki 6.7% 53.2% 163Ki .text 163Ki 6.7% 0.0% 0 .debug_ranges 154Ki 6.3% 0.0% 0 .strtab 71.1Ki 2.9% 22.3% 68.3Ki .dynstr 68.3Ki 2.8% 10.0% 30.8Ki .eh_frame 30.8Ki 1.3% 0.0% 0 .symtab 27.2Ki 1.1% 8.6% 26.4Ki .dynsym 26.4Ki 1.1% 0.0% 0 .debug_pubtypes 17.6Ki 0.7% 2.5% 7.63Ki .eh_frame_hdr 7.63Ki 0.3% 2.3% 6.91Ki .rodata 6.91Ki 0.3% 1.0% 3.13Ki .bss [...]

Here we can see that Bloaty has figured out what part of each section ( .debug_* , .text , ehframe , etc) it can attribute to each source file. Bloaty has constructed a very granular look into this binary, where each part of the file is attributed to the code that produced it.

I generally see 2% or less of the binary attributed to [None] now. Actually Bloaty never spits out a literal [None] anymore, because if we can’t figure out what function/compileunit/etc. some part of the binary comes from, we at least report its section. So if we’re stumped by some file range, we’ll report something like [section .rodata] instead of the very unhelpful [None] .

Debugging Stripped Binaries

People often want to profile stripped binaries. Very often the binaries you ship to customers don’t have full debug info in them, and you want to profile what you are shipping. But some of Bloaty’s more useful data sources ( compileunits especially) require debug information. What to do?

Bloaty now supports reading symbols and debug info from separate files. That way you can profile the thing you’re actually trying to shrink, instead of having your results skewed with the overhead of debugging information.

Bloaty uses build IDs to make sure that the debug information always exactly matches the file you are profiling.

First-class Mach-O Support

When Bloaty was first released, it parsed ELF and DWARF directly, but shelled out to command-line programs to parse Mach-O. This was slow and didn’t give us as much info as we would have liked. As of Bloaty 1.0, we now have first-class Mach-O support. Both fat and single-arch binaries are supported.

DWARF is fortunately a cross-platform standard, which means that Mach-O and ELF can share all of the code that parses DWARF. The code to parse DWARF is about the same size as the ELF and Mach-O parsers combined, so it’s great that so much of this code can be shared.

Experimental WebAssembly Support

I am really excited about WebAssembly. I wanted to learn more about it, so I wrote a basic parser for Bloaty. It can handle sections and functions so far.

I am excited to see that this has been getting some use already!

Using Bloaty as a Presubmit

Some people might wonder how to integrate Bloaty into their workflow. One thing I’ve seen that’s very cool is the way some projects like grpc integrate Bloaty with their pull requests. Here is an example.

This gives quick and useful feedback about how a given PR will affect the binary size of your artifacts. For size-sensitive projects, this is a nice way of keeping tabs and making sure PR’s don’t cause unexpected or disproportionate growth.

Post 1.0

Bloaty has become quite capable, but there is always more to do. Maybe the biggest thing on my wishlist is PE/COFF support so people on Windows can benefit.

I would also like to make Bloaty understand references between symbols. This would make it easier to answer questions like “could I shrink the binary a lot by avoiding calls to this one particular function?” It could also show you the the benefit you could get by compiling with -ffunction-sections and -fdata-sections if you’re not doing that already. These are options that let the linker strip individual functions if they are unreachable.

I’d also like to do a better job of mapping inlines. The idea of the “inlines” data source is to know if the inlining of a particular function is bloating your binary a lot. If it is, maybe it would be helpful to un-inline it. Right now the “inlines” data source uses the .debug_line section, which is what a debugger uses to decide what source file:line to place the cursor on when your problem is stopped at a given address. It would be more convenient to report inlines by function name instead, but .debug_line doesn’t know anything about functions. If I get my inlining info from .debug_info instead, I should be able to report inlines by function instead.

I’m happy with Bloaty 1.0 and look forward to improving it further!