Improving Link Time on Windows with clang-cl and lld



Unknown

#clang-cl , #lld , #Windows

18 minutes read

<string>

, then both of those object files will have hundreds of duplicate type records that need to be de-duplicated during the link step. This means you have to compute O(M x N) hash values, even though only a small fraction of those ultimately contribute to the final PDB.





Several strategies have been invented to deal with this over the years and try to make linking faster. Many years ago, Microsoft introduced the notion of a Type Server (enabled via /Zi compiler option in MSVC), which moves some of the work into the compiler (to take advantage of parallelism). More recently we have been given the /DEBUG:FASTLINK linker option which attempts to solve the problem by not merging types at all in the linker. However, each of these strategies has its own set of disadvantages, and neither can be considered perfect for all use cases.





In this blog post, we'll first go over some technical background about CodeView so that we can understand the problem, followed by a summary of existing attempts to speed up type merging. Then, we'll describe a novel extension to the PE/COFF file format which speeds up linking by offloading part of the work required to de-duplicate types to the compiler and using a new algorithm which uniquely identifies type records even across input files, and discuss the various tradeoffs of each approach. Finally, we'll present some benchmarks and discuss how you can try this out in clang-cl and lld today.





Background

Consider a simple structure in C++, defined like this a header file:





struct Node {

Node *Next = nullptr ;

Node *Prev = nullptr ;

int Value = 0;

};





Since each compilation happens independently of every other compilation, the compiler cannot assume any other translation unit will ever emit the records necessary to describe this type. As a result, to guarantee that the type makes it into the final PDB, every compiler instance that encounters this definition must emit type information for this type. So the record will be serialized by the compiler into a series of records that looks roughly like this:





0x1004 | LF_STRUCTURE [size = 40] `Node`

unique name: `.?AUNode@@`

vtable: <none>

base list: <none>

field list: <none>

options: forward ref | has unique name

0x1005 | LF_POINTER [size = 12]

referent = 0x1004

mode = pointer

opts = None

kind = ptr32

0x1006 | LF_FIELDLIST [size = 52]

- LF_MEMBER

name = `Next`

Type = 0x1005

Offset = 0

attrs = public

- LF_MEMBER name = `Prev` Type = 0x1005 Offset = 4 attrs = public

- LF_MEMBER name = `Value` Type = 0x0074 (int) Offset = 8 attrs = public

0x1007 | LF_STRUCTURE [size = 40] `Node`

unique name: `.?AUNode@@` vtable: <none> base list: <none> field list: 0x1006 options: has unique name

The values on the left correspond to the types index in the type sequence and depend on what types have already been encountered, while other types can the refer to them (for example, referent = 0x1004 ) means that this record is a pointer to whatever the type at index 0x1004 was.





As a result of this design, another compilation unit which includes the same header file will need to emit this exact same type, with the only difference being the indices (since the other compilation may encounter other types before this one, causing the ordering to be different).





In short, type indices only make sense within the context of a single type sequence (i.e. compiland), but since the linker needs to see across all object files, it has to have some way of identifying whether a type from object file A is isomorphic to a different type from object file B, even if its type indices might be different numerically from any previously seen type.

type merging, is the primary consumer of CPU cycles during linking (measured in LLD, and estimated in MSVC linker by comparing /DEBUG:FULL vs /DEBUG:FASTLINK times), and as such it is the portion of the linking process which this blog post presents a new solution to. This algorithm, henceforth referred to as, is the primary consumer of CPU cycles during linking (measured in LLD, and estimated in MSVC linker by comparing /DEBUG:FULL vs /DEBUG:FASTLINK times), and as such it is the portion of the linking process which this blog post presents a new solution to.

Existing Solutions

It’s worthwhile to discuss some of the existing attempts to reduce the cost associated with type merging so that we can compare and contrast their various pros and cons.

Type Servers (/Zi)