All the tools presented in this blog post have been tested in accordance with the knowledge we had of them. We do not claim at all that our results are an accurate view of the state of the tools, and we probably missed features we did not know about. The figures should be seen as indicators and not as ground truth.

For the rest of this article, an export of a binary is defined as a file which stores various information about the program. These data range from meta information (format, architecture, compilers identification) to more specific elements on the disassembled code itself (instructions, mnemonics) and intelligence gathered by the disassembler (x-references, symbols).

Analyzing binaries programs often requires to disassemble them. The two most famous tools for this task are IDA Pro and the newer one from the NSA Ghidra . Even if really powerful, these tools are inadequate for running custom analyses on a disassembled binary or on multiple binaries at the same time. If the disassembler is not needed anymore, why bothering to keep it open and running in the background? This is actually costly, as each instance may eat up to a few hundreds of megabytes in RAM. The only necessary element is an export of the disassembled binary and this blog post presents an overview of the different exporters and disassemblers available.

All tests were run on a Dell XPS 15 with an Intel® Core™ i7-6700HQ CPU @ 2.60GHz with an SSD and 16 Go RAM running Debian 10 (Buster).

These programs were selected at random from programs available on our computers at the time of the tests. They are not supposed to have any outstanding features, just regular programs coming from widely used open-source projects.

To test the performances of the disassemblers, the three following programs were used, classified in three categories, small, medium and large. The selected binaries are:

Note : Even if these tools have been left aside because they did not seem to fit our needs, they are nice pieces of engineering. We still encourage everyone to have a look at them.

As seen in the table above, where only active projects are listed, there is a broad range of tools available. It was not possible to compare all the binary disassembly tools, as our time was limited. We thus elected not to include the following:

The first step to export a disassembled binary is to disassemble it. Numerous tools exist for this task, the most famous one being IDA , a commercial tool by HexRays. During the last years, other tools have been released ( Binary Ninja , radare , Ghidra ) with different ranges of features and prices. While this blog post does not conduct a complete review of all the existing tools, nor pretends to as it would be slippery, we still wanted to have a more informed opinion on the different options.

Two main strategies exist for exporters. The first one is to export disassembled instructions with information on their content (mnemonic, operands, expressions inside the operands). Using this strategy, the export itself is self-contained and no other tool is required to analyze it. The second strategy is to export only the raw bytes (of the instructions) themselves and leave the remaining disassembly work to another disassembler (e.g capstone ). An export using this strategy will be more compact, but at the price of needing a helping tool to understand the content of the export. The choice of the strategy obviously depends on the final objective of the tool. It makes sense for Ghidra not to export disassembled instructions because they have their own disassembler, and for BinExport to export everything because BinDiff should be autonomous (and as fast as possible).

The table below details the various information exported by the different exporters selected. The results were gathered by analyzing the description of the protocol and actual exported files.

The list of exporters available for the tools tested in the first section is shown below.

The following step is to export the disassembled program into a standalone file. The goal is to close the disassembler after the initial disassembly step, as its features are not needed anymore.

Full benchmark

This sections aims to compare with more details the exporters found for IDA and Ghidra. The results of the first section of this article comforted us to only consider those two disassemblers as they were more accurate.

We are also interested in comparing the performance of the built-in exporter of Ghidra against the plugin they offer for IDA. However, we choose not to include the experimental port of BinExport for Ghidra because it is still a work in progress and its performances are below the ones from IDA's version while exporting the same features.

Dataset For the rest of the benchmarks, we gathered a dataset of various binaries coming from different sources. While our dataset is not exhaustive, it tries to mimic the diversity of programs a reverser could encounter. It gathers binaries of various architectures, files formats, size and bitness. The sources used are listed below : binary-samples: A test suite for binary analysis tools made by Jonathan Salwan

AOSP (Android Open Source Project): An open source operating system for mobile devices

LLVM: The compiler infrastructure project Binary Name md5sum Architecture Format Binary size x64_delta_generator 8ad5f84d44b73289aa863c44aa7619e9 x86_64 ELF 15.28 elf-Linux-x64-bash 9a99d4a76f3f773f7ab5e9e3e482c213 x86_64 ELF 904.82 KB pe-Windows-x64-cmd 5746bd7e255dd6a8afa06f7c42c1ba41 x86_64 PE 337.00 KB elf-Linux-lib-x64.so 89a9ff6d56c3ad2ef9a185a17ef9f658 x86_64 ELF 1.09 MB busybox-mips b55e00aa275948e6aea776028088c746 MIPS-32 ELF 352.48 KB clang-check 4a3aec55b02c6b3fec39d0cdaaca483e x86_64 ELF 46.83 MB elf-Linux-ARMv7-ls de9f91f9cd038989fec8abf25031b42b armv7 ELF 88.68 KB MachO-OSX-x86-ls df2580eaf51e15e23de3db979992af1e x86 MachO 34.86 KB ts3server 3c5c3e83dca78b4602148ce8643521e2 x86_64 ELF 7.73 MB busybox-powerpc bcfd1ebe98bf3519c3f2c9c14e0f9cf9 PPC-32 ELF 1.10 MB dex38.dex 0acbdd5244d0726d0cbfb2d45d2f95a8 - DEX 11.48 KB MachO-OSX-x64-ls d174dcfb35c14d5fcaa086d2c864ae61 x86_64 MachO 38.66 KB pe-Windows-x86-cmd e52110456ec302786585656f220405eb x86 PE 294.50 KB classes.dex e62eaf49283093501e7c7cbe9743a0f7 - DEX 3.53 MB wpa_supplicant aa782fa15d1265b0d8cfc00b6f883187 x86 ELF 21.64 MB ctags 48644ed9bbb64c22ee538cbe99481f21 x86_64 ELF 4.59 MB crackmips 9416c32035cf2f2da41876e1c9411850 MIPS-32 ELF 25.54 KB llvm-opt f0d325ba8ebbe72aad180c8cab6de09c x86_64 ELF 33.83 MB elf-Linux-x86-bash b5bfc5bc405340bcc5050756ac92cf45 x86 ELF 792.14 KB delta_generator c2bd1c45f4647932e85561a42e0cbbb4 x86 ELF 16.49 MB mdbook 9c405c56cf9c05e0a25766f6639cd5ca x86_64 ELF 10.67 MB elf-Linux-ARM64-bash 086f3ad932f5b1bcf631b17b33b0bb0a armv8 ELF 827.54 KB elf-Linux-lib-x86.so df9fd3ec63ac207b9fa193b8dcea7eb7 x86 ELF 1.08 MB elf-Linux-Mips4-bash 628f094cff8ec9d9e36c5b94460c7454 MIPS-32 ELF 882.38 KB MachO-iOS-armv7-armv7s-arm64-Helloworld 750338e86da4e5c8c318b885ba341d82 armv7, armv8 MachO 299.06 KB MachO-iOS-armv7s-Helloworld 5ae2549bda51d826a51e97c03fb06f73 armv7 MachO 89.64 KB The graph above shows the number of instructions per program in the dataset. If most of our test suite is made of programs with less than a million instructions, a few large binaries were also included, to better understand how the exporters and disassemblers scaled. As we need to plot large ranges of values in the same graph, most of the curves looks flat for the first points.

Disassembly time The first metrics we were interested in is the disassembly time, defined as the duration of the automatic analysis. We knew that IDA was faster than Ghidra, but we wanted to measure to what extent. The results are impressive, Ghidra is much slower than IDA (up to 13 times slower for large binaries). Even if the disassembly step is a one time process, the performances of Ghidra are problematic for scalability. Nevertheless, it should be noted that the results are biased, because Ghidra performs an additional decompilation step.

Export time and size The first section helped us to draw an overview of the available exporters. Another interesting metrics is the export time for the following disassemblers/exporters pairs: IDA + BinExport

IDA + Ghidra XML

Ghidra + XML We chose to keep only those exporters because they were running on the disassemblers we selected, and had an interesting set of exported features. They also had a good support for Ghidra, and BinDiff has been used for years in the community without issues. We may also note that they use different exporting strategies: Ghidra does not export any information on instructions while BinExport decomposes every operand of each instruction and exports them. The export size of a program is far greater than the program itself for both tools. While BinExport produces a single Protobuf file, Ghidra generates two files, one XML with all the information and a raw byte file containing all the code of the exported binary. The figures on the graph represent the sum of the size of these two files. Program Size i64 BinExport IDA-XML Ghidra-XML elf-Linux-x64-bash 908 KB 11 MB 4.2 MB 4.9 MB 7.1 MB ts3server 7.8 MB 58 MB 20 MB 19 MB 64.8 MB llvm-opt 34 MB 300 MB 144 MB 127 MB 202 MB We observe that the size of the export for BinExport and XML is roughly the same. However, BinExport exports a lot more information on the binary than Ghidra. Remember that Ghidra does not export any information on the instructions themselves neither on the basic blocks besides their contents (i.e. raw bytes). The sizes of the exported files remain equivalent because of optimizations made by BinExport: the format is specifically designed for compactness (e.g. there is an extensive usage of deduplications tables) and the export file uses a binary serialization protocol, namely Protobuf. This will be further discussed in the next section. The table above also includes the sizes of the database generated by IDA, the i64 file, which is much larger than any of the exported file considered in this study.