[llvm-dev] A libc in LLVM

On Tue, Jun 25, 2019 at 03:24:04AM +0000, Siva Chandra via llvm-dev wrote: > Hello LLVM Developers, > > Within Google, we have a growing range of needs that existing libc > implementations don't quite address. This is pushing us to start > working on a new libc implementation. > > Informal conversations with others within the LLVM community has > told us that a libc in LLVM is actually a broader need, and we are > increasingly consolidating our toolchains around LLVM. Hence, we > wanted to see if the LLVM project would be interested in us > developing this upstream as part of the project. > > To be very clear: we don't expect our needs to exactly match > everyone else's -- part of our impetus is to simplify things > wherever we can, and that may not quite match what others want in a > libc. That said, we do believe that the effort will still be > directly beneficial and usable for the broader LLVM community, and > may serve as a starting point for others in the community to flesh > out an increasingly complete set of libc functionality. > > We are still in the early stages, but we do have some high-level > goals and guiding principles of the initial scope we are interested > in pursuing: > > The project should mesh with the "as a library" philosophy of the > LLVM project: even though "the C Standard Library" is nominally "a > library," most implementations are, in practice, quite monolithic. > > The libc should support static non-PIE and static-PIE linking. This > means, providing the CRT (the C runtime) and a PIE loader for static > non-PIE and static-PIE linked executables. > > If there is a specification, we should follow it. The scope that we > need includes most of the C Standard Library; POSIX additions; and > some necessary, system-specific extensions. This does not mean we > should (or can) follow the entire specification -- there will be > some parts which simply aren't worth implementing, and some parts > which cannot be safely used in modern coding practice. > > Vendor extensions must be considered very carefully, and only > admitted when necessary. Similar to Clang and libc++, it does seem > inevitable that we will need to provide some level of compatibility > with other vendors' extensions. > > The project should be an exemplar of developing with LLVM tooling. > Two examples are fuzz testing from the start, and > sanitizer-supported testing. > > There are also few areas which we do not intend to invest in at this point: > > Implement dynamic loading and linking support. > Support for more architectures (we'll start with just x86-64 for simplicity). > > For these areas, the community is of course free to contribute. Our > hope is that, preserving the "as a library" design philosophy will > make such extensions easy, and allow retaining the simplicity when > these features aren't needed. > > We intend to build the new libc in a gradual manner. To begin with, > the new libc will be a layer sitting between the application and the > system libc. Eventually, when the implementation is sufficiently > complete, it will be able to replace the system libc at least for > some use cases and contexts. > > So, what do you think about incorporating this new libc under the > LLVM project? Since I have a little experience in this area, I'd like to chime in on it. :-) TL;DR I think it's a reall, REALLY bad idea. First, writing and maintaining a correct, compatible, high-quality libc is a monumental task. The amount of code needed is not all that large, but the subtleties of how it behaves and the difficulties of implementing various interfaces that have no capacity to fail or report failure, and the astronomical "compatibility surface" of interfacing with all C and C++ software ever written as well as a large amount of software written in other languages whose runtimes "pass through" the behavior of libc to the applications they host, all contribute to the scale of work, and of knowledge/expertise, involved in making something of even decent quality. (As an aside, note that I love to see hobby libc projects even if they have major problems, but that's totally different from proposing something that lots of people will end up stuck using.) Second, corporate development teams are uniquely qualified to utterly botch a libc, yet still push it into widespread use, and the cost is painful compatibility hacks in all applications. Apple did this with their fork of BSD libc code. Google has done it once already with their fork of musl in Fuchsia -- a project which I contributed significant amounts of free labor to in terms of tracking down folks for license clarification their lawyers wanted, only to have them never bother to ask me why technical things were done they way they were before making random useless and broken changes in their fork. A corporate-led project does not have to answer to the community, and will leave whatever bugs they introduce in place for the sake of bug-compatibility with their own software rather than fixing them. Third, there is tremendous value in non-monoculture of libc implementations, or implementations of any important library interfaces or language runtimes. Likewise there's tremendous value in non-monoculture of tooling (compilers, linkers, etc.). Avoiding monoculture preserves the motivation for consensus-based standards processes rather than single-party control (see also: Chrome and what it's done to the web) and the motivation for people writing software to write to the standards rather than to a particular implementation. A big part of making that possible is clear delineation of roles between parts of the toolchain and runtime, with well-defined interface boundaries. Some folks have told me that I should press LLVM to make musl the "LLVM libc" instead of whatever Google wants to do, but that misses the point: there *shouldn't be* a "LLVM libc", or any one library implementation that's "first class" for use with LLVM while others are only "second class". So, in summary: Point 1 is why making a libc for real-world use is not to be taken lightly. Point 2 is why, if it is done, it shouldn't be a Google project. Point 3 is why there should not be an "LLVM libc". Hope this is all helpful. Regards, Rich