Comparison of C/POSIX standard library implementations for Linux

A project of Eta Labs.

The table below and notes which follow are a comparison of some of the different standard library implementations available for Linux, with a particular focus on the balance between feature-richness and bloat. I have tried to be fair and objective, but as I am the author of musl, that may have influenced my choice of which aspects to compare.

Future directions for this comparison include detailed performance benchmarking and inclusion of additional library implementations, especially Google's Bionic and other BSD libc ports.

Bloat comparison musl uClibc dietlibc glibc Complete .a set 426k 500k 120k 2.0M † Complete .so set 527k 560k 185k 7.9M † Smallest static C program 1.8k 5k 0.2k 662k Static hello (using printf) 13k 70k 6k 662k Dynamic overhead (min. dirty) 20k 40k 40k 48k Static overhead (min. dirty) 8k 12k 8k 28k Static stdio overhead (min. dirty) 8k 24k 16k 36k Configurable featureset no yes minimal minimal Behavior on resource exhaustion musl uClibc dietlibc glibc Thread-local storage reports failure aborts n/a aborts SIGEV_THREAD timers no failure n/a n/a lost overruns pthread_cancel no failure aborts n/a aborts regcomp and regexec reports failure crashes reports failure crashes fnmatch no failure unknown no failure reports failure printf family no failure no failure no failure reports failure strtol family no failure no failure no failure no failure Performance comparison musl uClibc dietlibc glibc Tiny allocation & free 0.005 0.004 0.013 0.002 Big allocation & free 0.027 0.018 0.023 0.016 Allocation contention, local 0.048 0.134 0.393 0.041 Allocation contention, shared 0.050 0.132 0.394 0.062 Zero-fill (memset) 0.023 0.048 0.055 0.012 String length (strlen) 0.081 0.098 0.161 0.048 Byte search (strchr) 0.142 0.243 0.198 0.028 Substring (strstr) 0.057 1.273 1.030 0.088 Thread creation/joining 0.248 0.126 45.761 0.142 Mutex lock/unlock 0.042 0.055 0.785 0.046 UTF-8 decode buffered 0.073 0.140 0.257 0.351 UTF-8 decode byte-by-byte 0.153 0.395 0.236 0.563 Stdio putc/getc 0.270 0.808 7.791 0.497 Stdio putc/getc unlocked 0.200 0.282 0.269 0.144 Regex compile 0.058 0.041 0.014 0.039 Regex search (a{25}b) 0.188 0.188 0.967 0.137 Self-exec (static linked) 234µs 245µs 272µs 457µs Self-exec (dynamic linked) 446µs 590µs 675µs 864µs ABI and versioning comparison musl uClibc dietlibc glibc Stable ABI yes no unofficially yes LSB-compatible ABI incomplete no no yes Backwards compatibility yes no unofficially yes Forwards compatibility yes no unofficially no Atomic upgrades yes no no no Symbol versioning no no no yes Algorithms comparison musl uClibc dietlibc glibc Substring search (strstr) twoway naive naive twoway Regular expressions dfa dfa backtracking dfa Sorting (qsort) smoothsort shellsort naive quicksort introsort Allocator (malloc) musl-native dlmalloc diet-native ptmalloc Features comparison musl uClibc dietlibc glibc Conformant printf yes yes no yes Exact floating point printing yes no no yes C99 math library yes partial no yes C11 threads API yes no no no C11 thread-local storage yes yes no yes GCC libstdc++ compatibility yes yes no yes POSIX threads yes yes, on most archs broken yes POSIX process scheduling stub incorrect no incorrect POSIX thread priority scheduling yes yes no yes POSIX localedef no no no yes Wide character interfaces yes yes minimal yes Legacy 8-bit codepages no yes minimal slow, via gconv Legacy CJK encodings no no no slow, via gconv UTF-8 multibyte native; 100% conformant native; nonconformant dangerously nonconformant slow, via gconv; nonconformant Iconv character conversions most major encodings mainly UTFs no the kitchen sink Iconv transliteration extension no no no yes Openwall-style TCB shadow yes no no no Sun RPC, NIS no yes yes yes Zoneinfo (advanced timezones) yes no yes yes Gmon profiling no no yes yes Debugging features no no no yes Various Linux extensions yes yes partial yes Target architectures comparison musl uClibc dietlibc glibc i386 yes yes yes yes x86_64 yes yes yes yes x86_64 x32 ABI (ILP32) experimental no no non-conforming ARM yes yes yes yes Aarch64 (64-bit ARM) yes no no yes MIPS yes yes yes yes SuperH yes yes no yes Microblaze yes partial no yes PowerPC (32- and 64-bit) yes yes yes yes Sparc no yes yes yes Alpha no yes yes yes S/390 (32-bit) no no yes yes S/390x (64-bit) yes no yes yes OpenRISC 1000 (or1k) yes no no not upstream Motorola 680x0 (m68k) yes yes no yes MMU-less microcontrollers yes, elf/fdpic yes, bflt no no Build environment comparison musl uClibc dietlibc glibc Legacy-code-friendly headers partial yes no yes Lightweight headers yes no yes no Usable without native toolchain yes no yes no Respect for C namespace yes LFS64 problems no LFS64 problems Respect for POSIX namespace yes LFS64 problems no LFS64 problems Security/hardening comparison musl uClibc dietlibc glibc Attention to corner cases yes yes no too much malloc Safe UTF-8 decoder yes yes no yes Avoids superlinear big-O's yes sometimes no yes Stack smashing protection yes yes no yes Heap corruption detection yes no no yes Misc. comparisons musl uClibc dietlibc glibc License MIT LGPL 2.1 GPL 2 LGPL 2.1+ w/exceptions

Notes

In general

For each comparison in the table, each library is marked in red, yellow, or green. Red or yellow indicates that the library fails to support a feature or satisfy an optimality condition that may be desirable to some users.

For comparisons involving testing and measurement, the particular library versions compared are:

musl 1.1.5

uClibc 0.9.33.2 (Buildroot 2015.02)

dietlibc 0.32

glibc 2.19

Note that previous versions of this comparison included eglibc rather than glibc, mainly since Debian-based distributions were using the eglibc fork during the time in which glibc was essentially unmaintained. Since most of eglibc has been merged back into glibc and eglibc is being discontinued, the comparison has been updated based on glibc.

Bloat comparison

Roughly speaking, “bloat” is used to refer to overhead cost that does not contribute to the functioning of an application.

All figures are approximate based on the tests of versions of these libraries available on systems I use. I've used size(1) instead of file size since static library files are roughly 80% ELF header overhead for the contained object files. Part of what makes the shared libraries larger than their static equivalents is that they include parts of libgcc for long division and other math functions.

The size totals for glibc include the size of iconv modules, roughly 5M, in the “Complete .so set” figure. These are essential to providing certain functionality, and should be installed whether static or dynamic linking is being used.

The smallest C program is:

int main() {}

And the "hello" program I used is:

#include <stdio.h> int main(int argc, char **argv) { printf("hello %d

", argc); }

I've written it this way to ensure that the compiler cannot optimize the string printed to a constant and replace the call to printf with a call to puts .

Overhead is measured in dirty pages, i.e. the amount of swap-backed physical memory each process requires. These are a mix of private copy-on-write maps of the program image on disk, the heap, the stack, and anonymous maps. The /proc/$pid/smaps file was used to obtain the numbers for a program spinning in an infinite loop.

Dynamic linking overhead is largely dependent on the dynamic linker. A good 12-16k of the dynamic overhead is due to inefficiency in the standard dynamic linker. Ideally, replacing it could drop the overhead difference between static- and dynamic-linked programs to a single page.

It should be noted that uClibc was tested with many optional features enabled, particularly locale. Due to a bug (design flaw) in uClibc's locale support, locale loading code and malloc get linked even in programs which never use setlocale .

Behavior on resource exhaustion

These comparions deal with the robstness of various interfaces when the amount of free memory or other system resources are extremely low. Reporting failure is shaded green when it is the theoretical optimal behavior; it is shaded yellow when an alternate implementation could successfully perform the operation with no resource usage.

Thread-local storage covers both the case of attempting to create a new thread when there is insufficient memory available to satisfy the thread-local storage requirements of all loaded modules, and the case of attempting to load a new module with thread-local storage via dlopen when there is insufficient memory available to satisfy the storage requirements of all extant threads.

In the case of pthread_cancel , NPTL dynamically loads libgcc_s.so.1 at runtime upon the first cancellation request, and aborts the program if loading fails for any reason, including but not limited to resource exhaustion.

Performance comparison

All of these figures were obtained using my libc-bench suite, in UTF-8 locales, on one particular Intel Atom N280-based machine. They are not intended to be rigorous, only to give a rough idea of relative order-of-magnitude performance.

The tiny and big allocation figures are from b_malloc_tiny1 and b_malloc_big1 . The allocation contention tests measure malloc performance when two threads are simultaneously performing allocation and free operations. In the first test (local), each thread frees its own allocations. In the second (shared), the allocating and freeing thread are often not the same, breaking thread-local arena/cache optimizations.

The strstr figure is the max time taken by any of the strstr tests, in the interest of measuring worst-case time; which case is worst varies by implementation. glibc's bad performance could be fixed trivially by removing the code that disables the best optimization for needles shorter than 32 bytes; with this change it should match or slightly outperform musl.

The thread create and join figure is from b_pthread_createjoin_serial1 .

ABI and versioning comparison

Backwards compatibility means the usual thing, that new versions of the library are compatible with programs compiled against an older version. "Forwards compatibility" is a term I may have invented, but the idea it's intended to convey is that old versions of the library are compatible with programs compiled against a newer version, as long as the program does not depend on features that were missing from the older library version. In the latter case, the program would simply fail at (static or dynamic) link time with missing symbols.

Perhaps the simplest way to think of "forwards compatibility" is that it means you're not required to upgrade the library unless a program actually needs functionality that's missing in your version.

Symbol versioning and forwards compatibility both have merits, but they're essentially mutually exclusive.

"Atomic upgrades" means that a single atomic filesystem operation upgrades the library, with no race condition window during which dynamic-linked programs might fail to run. The canonical way to ensure atomic upgrades is having the whole library in a single .so file.

Algorithms comparison

When comparing substring search algorithms, m typically refers to the length of the needle (substring) and n typically refers to the length of the haystack (string to be searched). The two-way algorithm is O(n), and with the Boyer-Moore-like improvements musl uses (and which glibc uses, but only for extremely long needles), typical runtime is proportional to n/m. The naive algorithm is O(nm).

Backtracking regular expression implementations are simple to write, but have pathologically bad performance on many simile real-world expressions, and fail to take advantage of the regularity of the language.

The naive quicksort dietlibc uses has O(n) space requirement on the stack, meaning it can and will lead to stack-overflow crashes in real-world usage. This can be fixed by choosing the optimal order of recursion and performing tail-call optimizations. Quicksort is also O(n²) in time, and while typical performance is much better, worst-case performance is very bad. Shell sort is typically O(nα) where 1<α<2, though it can be optimized to O(n(log n)²). Determining the characteristics of uClibc's version would require some analysis. Smooth sort is O(n log n) and interpolates smoothly down to O(n) proportional roughly to the degree to which the input is already sorted. Intro sort is a variant of quicksort which detects worst-case recursion and switches to heap sort to maintain O(n log n) bounds.

Features comparison

Exact floating point printing refers to the ability to print the exact value of floating point numbers with printf when the specified precision is high enough. For instance, as a double-precision value, 0.1 is 0.1000000000000000055511151231257827021181583404541015625, which is the diadic rational 115292150460684704/260. Perhaps more usefully, the (exactly representable) number 2-60 should print as 0.000000000000000000867361737988403547205962240695953369140625 rather than some inexact approximation.

A complete C99 math library consists of the new single-precision and extended-precision versions of all the previously existing math functions, as well as their complex versions and tgmath.h .

POSIX threads refers to threads with real POSIX semantics, not the historical broken LinuxThreads (where each thread behaves like a distinct process) or similar implementations.

POSIX localedef refers to the ability to define custom locales, including charsets, etc.

TCB passwords are a feature from Openwall which move the password hashes from /etc/shadow to /etc/tcb/username/shadow . This allows users to change passwords and allows programs running as the user (for example, screen lockers) to authenticate the user's password without special suid or sgid privileges.

Linux extensions refer to kernel interfaces provided by Linux outside the scope of POSIX and historical behavior - epoll , signalfd , extended attributes, capabilities, module loading, and so on.

Target architectures comparison

There are a number of conformance issues in glibc's x32 support, the most notable being that it defines the tv_nsec member of struct timespec as long long despite both POSIX and C11 requiring it to have type long . This discrepency affects use with formatted printing functions and use of pointers to the member, among other things. A number of other interfaces also have been changed to use long long instead of long in structures; in many cases there is no standard governing the affected interface, but the changes break the interface contract published in other documentation such a Linux man pages.

uClibc's microblaze port is marked partial because it lacks support for threads and possibly other core features.

Ports marked "experimental" are those documented as such; this may mean some functionality is broken and/or ABI is not stable.

Build environment comparison

"Legacy-code-friendly headers" means that the system C header files evolved out of historical practice, and by default define/declare many things they shouldn't but which some legacy code might expect. They typically rely on deep levels of nested inclusion and complex conditional compilation.

"Lightweight headers" are roughly the opposite, written from scratch to match the C and POSIX standards, with minimal nested inclusion and preprocessor conditionals. This leads to an enormous performance advantage compiling large numbers of small files, but it also means poorly-written programs that relied on certain implementation-specific legacy characteristics might need minor fixes to compile.

Some of the libraries reviewed are virtually impossible to use without having built GNU binutils and gcc specifically targetting them (i.e. a native toolchain). Others make it easy to use an existing toolchain originally targetting a different library, overriding certain compiler and linker options to use the alternate library implementation.

Respect for the C and POSIX namespaces means that the namespace used by the standard C and standard POSIX functions and headers conforms to what these standards say about which names are reserved for the implementation versus reserved for the application. One common area of non-conformance is remapping functions like open , lseek , etc. to open64 , lseek64 , etc. - names which are reserved for the application. This is flagged as "LFS64 problems" in the table.

Security/hardening comparison

"Attention to corner cases" means that the library follows a general philosophy of being careful to support all possible inputs that don't explicitly invoke undefined behavior, especially when the input may come from a source external to the program. Over-use of malloc is flagged in the comparison when some interfaces that should not have any failure cases have created artificial ones due to the possibility of memory exhaustion.

An unsafe UTF-8 decoder is one which fails to detect invalid sequences and happens to decode them as aliases for valid characters.

Heap corruption detection means malloc makes an effort to detect, report, and abort when it detects double-free, attempts to free a pointer not obtained via malloc , etc.

Misc. comparisons

The choice of license affects the usability of a standard library implementation. GPL v2-only is shaded as the "worst" choice, in that it is incompatible with a large volume of Open Source/Free Software, namely anything using GPL v3-only. LGPL v2.1-only is much less problematic; it does not allow creation of a new LGPL-licensed library by merging with LGPL v3-only code, but it allows the merged program to be released under version 3 or later of the GPL. LGPL v2.1-or-later is very flexible, and MIT or BSD even moreso.