Closing the gap: cross-language LTO between Rust and C/C++



Michael Woerister

#LTO

10 minutes read

Rust, with its lack of a language runtime and its low-level reach, has an almost unique ability to seamlessly integrate with an existing C/C++ codebase, and

LLVM, as a language agnostic foundation, provides a common ground where the source language a particular piece of code was written in does not matter anymore.

From a technical perspective it allows for codebases to be optimized without regard for implementation language boundaries, making it possible for important optimizations, such as function inlining, to be performed across individual compilation units even if, for example, one of the compilation units is written in Rust while the other is written in C++.

From a psychological perspective, which arguably is just as important, it helps to alleviate the nagging feeling of inefficiency that many performance conscious developers might have when working on a piece of software that jumps back and forth a lot between functions implemented in different source languages.

Background - A bird's eye view of the LLVM compilation pipeline

The compiler front-end generates an LLVM bitcode module ( .bc ) for each compilation unit. In C and C++ each source file will result in a single compilation unit. In Rust each crate is translated into at least one compilation unit.



.c --clang--> .bc



.c --clang--> .bc





.rs --+

|

.rs --+--rustc--> .bc

|

.rs --+



In the next step, LLVM's optimization pipeline will optimize each LLVM module in isolation:

.c --clang--> .bc --LLVM--> .bc (opt)



.c --clang--> .bc --LLVM--> .bc (opt)





.rs --+

|

.rs --+--rustc--> .bc --LLVM--> .bc (opt)

|

.rs --+



LLVM then lowers each module into machine code so that we get one object file per module:

.c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o



.c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o





.rs --+

|

.rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o

|

.rs --+



Finally, the linker will take the set of object files and link them together into a binary:

.c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o ------+

|

.c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o ------+

|

+--ld--> bin

.rs --+ |

| |

.rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o --+

|

.rs --+





Link time optimization in LLVM

the compiler translates each compilation unit into LLVM bitcode (i.e. it skips lowering to machine code),



the linker, via the LLVM linker plugin, knows how to read LLVM bitcode modules like regular object files, and



the linker, again via the LLVM linker plugin, merges all bitcode modules it encounters and then runs LLVM optimization passes before doing the actual linking.



.c --clang--> .bc --LLVM--> .bc (opt) ------------------+ - - +

| |

.c --clang--> .bc --LLVM--> .bc (opt) ------------------+ - - +

| |

+-ld+LLVM--> bin

.rs --+ |

| |

.rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o -+

|

.rs --+





Cross-language link time optimization

rustc



.c --clang--> .bc --LLVM--> .bc (opt) ---------+

|

.c --clang--> .bc --LLVM--> .bc (opt) ---------+

|

+-ld+LLVM--> bin

.rs --+ |

| |

.rs --+--rustc--> .bc --LLVM--> .bc (opt) -----+

|

.rs --+





The Rust compiler and Clang are both based on LLVM but they might be using different versions of LLVM. This was further complicated by the fact that Rust's LLVM version often does not match a specific LLVM release, but can be an arbitrary revision from LLVM's repository. We learned that all LLVM versions involved really have to be a close match in order for things to work out. The Rust compiler's documentation now offers a compatibility table for the various versions of Rust and Clang.



The Rust compiler by default performs a special form of LTO, called ThinLTO, on all compilation units of the same crate before passing them on to the linker. We quickly learned, however, that the LLVM linker plugin crashes with a segmentation fault when trying to perform another round of ThinLTO on a module that had already gone through the process. No problem, we thought and instructed the Rust compiler to disable its own ThinLTO pass when compiling for the cross-language case and indeed everything was fine -- until the segmentation faults mysteriously returned a few weeks later even though ThinLTO was still disabled.



We noticed that the problem only occurred in a specific, presumably innocent setting: again two passes of LTO needed to happen, this time the first was a regular LTO pass within rustc and the output of that would then be fed into ThinLTO within the linker plugin. This setup, although computationally expensive, was desirable because it produced faster code and allowed for better dead-code elimination on the Rust side. And in theory it should have worked just fine. Yet somehow rustc produced symbol names that had apparently gone through ThinLTO's mangling even though we checked time and again that ThinLTO was disabled for Rust. We were beginning to seriously question our understanding of LLVM's inner workings as the problem persisted while we slowly ran out of ideas on how to debug this further.



You can picture the proverbial lightbulb appearing over our heads when we figured out that Rust's pre-compiled standard library would still have ThinLTO enabled, no matter the compiler settings we were using for our tests. The standard library, including its LLVM bitcode representation, is compiled as part of Rust's binary distribution so it is always compiled with the settings from Rust's build servers. Our local full LTO pass within rustc would then pull this troublesome bitcode into the output module which in turn would make the linker plugin crash again. Since then ThinLTO is turned off for libstd by default.



We noticed that the problem only occurred in a specific, presumably innocent setting: again two passes of LTO needed to happen, this time the first was a regular LTO pass within and the output of that would then be fed into ThinLTO within the linker plugin. This setup, although computationally expensive, was desirable because it produced faster code and allowed for better dead-code elimination on the Rust side. And in theory it should have worked just fine. Yet somehow produced symbol names that had apparently gone through ThinLTO's mangling even though we checked time and again that ThinLTO was disabled for Rust. We were beginning to seriously question our understanding of LLVM's inner workings as the problem persisted while we slowly ran out of ideas on how to debug this further. You can picture the proverbial lightbulb appearing over our heads when we figured out that Rust's pre-compiled standard library would still have ThinLTO enabled, no matter the compiler settings we were using for our tests. The standard library, including its LLVM bitcode representation, is compiled as part of Rust's binary distribution so it is always compiled with the settings from Rust's build servers. Our local full LTO pass within would then pull this troublesome bitcode into the output module which in turn would make the linker plugin crash again. Since then ThinLTO is turned off for by default. After the above fixes, we succeeded in compiling the entirety of Firefox with cross-language LTO enabled. Unfortunately, we discovered that no actual cross-language optimizations were happening. Both Clang and rustc were producing LLVM bitcode and LLD produced functioning Firefox binaries, but when looking at the machine code, not even trivial functions were being inlined across language boundaries. After days of debugging (and unfortunately without being aware of LLVM's optimization remarks at the time) it turned out that Clang was emitting a target-cpu attribute on all functions while rustc didn't, which made LLVM reject inlining opportunities.



In order to prevent the feature from silently regressing for similar reasons in the future we put quite a bit of effort into extending the Rust compiler's testing framework and CI. It is now able to compile and run a compatible version of Clang and uses that to perform end-to-end tests of cross-language LTO, making sure that small functions will indeed get inlined across language boundaries.

Using cross-language LTO: a minimal example

rustc

rustc

rustc



# Compile the Rust static library, called "xyz"

rustc --crate-type=staticlib -O -C linker-plugin-lto -o libxyz.a lib.rs



# Compile the C code with "-flto"

clang -flto -c -O2 main.c



# Link everything

clang -flto -O2 main.o -L . -lxyz





-C linker-plugin-lto

ld64

ld64

rustc

Conclusion

Acknowledgments