I’ve been looking at C++ compilation times lately (e.g. here or there), and through correspondence with Microsoft C++ compiler folks learned about a fairly awesome, but undocumented, cl.exe flag.

It’s /d2cgsummary .

How to investigate slow compile times?

Given that compile/build times are a massive problem in large C++ codebases, it’s amazing how little information we can get about why they are slow. For example when using Visual Studio C++ compiler, about the only options seem to be:

1) Use /Bt flag, to get time spent in frontend ( c1xx.dll ) vs backend ( c2.dll ) compiler parts. This is good to narrow down whether slow compiles are due to too many things being parsed (e.g. a problem of including too many header files), which is typically a frontend problem, or is the problem some sort of language constructs that cause the backend (typically the optimizer part of the compiler) take “a lot of time”.

The “common knowledge” is that long C++ compile times are caused by too many header files being included, and in general by C/C++ #include model. That is often the case, but sometimes it’s the backend part of the compiler that takes “forever” to compile something (e.g. the __forceinline compile time issue I found in my previous post). The backend part typically only takes “long” when compile optimizations are on.

2) Take a profiler capture of cl.exe , using any of existing native code profiling tools (e.g. xperf), load symbols, and try to guess what is causing the long compile time based on profiling results.

3) Just do guesses and kinda randomly change code structure and see whether it helped.

Well, turns out there’s another option, the /d2cgsummary flag.

MSVC /d2cgsummary flag

I can’t find anything about this flag on the internet, so I’d assume it’s something “for internal use only”, and may or might not exist or be changed in any MSVC compiler version. It does seem to exist & work in VS2015 Update 3 and in VS2017. Passing it to cl.exe will give an output of “Anomalistic Compile Times” and some sort of “Caching Stats”, like this:

Code Generation Summary Total Function Count: 742 Elapsed Time: 13.756 sec Total Compilation Time: 13.751 sec Efficiency: 100.0% Average time per function: 0.019 sec Anomalistic Compile Times: 1 ?EvaluateRootMotion@animation@mecanim@@YAXAEBUClipMuscleConstant@12@AEBUClipMuscleInput@12@PEBM2AEAUMotionOutput@12@_N@Z: 11.667 sec, 0 instrs Serialized Initializer Count: 4 Serialized Initializer Time: 0.006 sec RdrReadProc Caching Stats Functions Cached: 61 Retrieved Count: 20162 Abandoned Retrieval Count: 0 Abandoned Caching Count: 14 Wasted Caching Attempts: 0 Functions Retrieved at Least Once: 61 Functions Cached and Never Retrieved: 0 Most Hits: ?f@?$load@$00@?$SWIZ@$0DCB@@v4f@meta@math@@SA?AT__m128@@T6@@Z: 2775 ?f@?$SWIZ@$0DCB@@v4f@meta@math@@SA?AT__m128@@T5@@Z: 2774 ?f@?$load@$00@?$SWIZ@$0EDCB@@v4f@meta@math@@SA?AT__m128@@T6@@Z: 1924 ?f@?$SWIZ@$0EDCB@@v4f@meta@math@@SA?AT__m128@@T5@@Z: 1920 ?pack@?$v@Uv4f@meta@math@@U?$sp@Uv4f@meta@math@@$0DCB@$02@23@$02@meta@math@@SA?AU123@AEBT__m128@@@Z: 1296 ??B_float4@math@@QEBA?AT__m128@@XZ: 1165

Anomalistic Compile Times seems to output “longest to compile” functions, where I guess their times are much longer than the average. Here, it immediately says that among all 700+ functions that were compiled, there is one that takes almost all the time! The names are mangled, but using either the undname.exe MSVC utility or an online demangler, it says the slow-to-compile function in this particular file was called mecanim::animation::EvaluateRootMotion .

This means that if I want to speed up compilation of this file, I need to only look at that one function, and somehow figure out why it’s slow to compile. The “why” is still an exercise for the reader, but at least I know I don’t have to investigate 700 other functions. Great!

RdrReadProc Caching Stats section I don’t quite understand, but it seems to be listing functions that were used most often in the whole compilation. Again after demangling, seems that math::meta::v4f::SWIZ<801>::load got used almost 3000 times, etc.

On some other files I was looking into, the /d2cgsummary flag did not point to one slow function, but at least pointed me to a subsystem that might be contributing to slow compile times. For example here:

Code Generation Summary Total Function Count: 4260 Elapsed Time: 20.710 sec Total Compilation Time: 20.685 sec Efficiency: 99.9% Average time per function: 0.005 sec Anomalistic Compile Times: 32 ?VirtualRedirectTransfer@Shader@@UEAAXAEAVRemapPPtrTransfer@@@Z: 0.741 sec, 0 instrs ??$TransferBase@VShader@@@RemapPPtrTransfer@@QEAAXAEAVShader@@W4TransferMetaFlags@@@Z: 0.734 sec, 0 instrs ??$Transfer@VShader@@@RemapPPtrTransfer@@QEAAXAEAVShader@@PEBDW4TransferMetaFlags@@@Z: 0.726 sec, 0 instrs ??$Transfer@VRemapPPtrTransfer@@@Shader@@QEAAXAEAVRemapPPtrTransfer@@@Z: 0.711 sec, 0 instrs ??$Transfer@VRemapPPtrTransfer@@@?$SerializeTraits@VShader@@@@SAXAEAVShader@@AEAVRemapPPtrTransfer@@@Z: 0.708 sec, 0 instrs ??$Transfer@USerializedShader@ShaderLab@@@RemapPPtrTransfer@@QEAAXAEAUSerializedShader@ShaderLab@@PEBDW4TransferMetaFlags@@@Z: 0.632 sec, 0 instrs ??$Transfer@VRemapPPtrTransfer@@@?$SerializeTraits@USerializedShader@ShaderLab@@@@SAXAEAUSerializedShader@ShaderLab@@AEAVRemapPPtrTransfer@@@Z: 0.625 sec, 0 instrs ??$Transfer@VRemapPPtrTransfer@@@SerializedShader@ShaderLab@@QEAAXAEAVRemapPPtrTransfer@@@Z: 0.621 sec, 0 instrs <...> RdrReadProc Caching Stats Functions Cached: 227 Retrieved Count: 74070 Abandoned Retrieval Count: 0 Abandoned Caching Count: 0 Wasted Caching Attempts: 0 Functions Retrieved at Least Once: 221 Functions Cached and Never Retrieved: 6 Most Hits: ?GetNewInstanceIDforOldInstanceID@RemapPPtrTransfer@@QEAAHH@Z: 3065 ?MightContainPPtr@?$SerializeTraits@D@@SA_NXZ: 2537 ?Align@TransferBase@@QEAAXXZ: 1965 <...>

Out of total 20 second compile time, there’s a bunch of functions that take almost 1 second to compile each (after demangling: Shader::VirtualRedirectTransfer(class RemapPPtrTransfer) , RemapPPtrTransfer::TransferBase<class Shader>(class Shader&, enum TransferMetaFlags) ) etc. So hey, out of everything that Shader.cpp does, I know that in our codebase it’s the “data serialization” part that is slow to compile.

What would be super useful to have from compiler vendors: way more diagnostics on what & why something is slow. Detailed reports like this flag, and/or custom ETW providers for xperf, or anything like that. These would not be things used by most people, but when someone has to use them, they would be invaluable. Saving a minute or two of build time for several hundred programmers is an enormous amount of productivity gain!

Anyway, now you know. If you use Microsoft’s C++ compiler, try out /d2cgsummary flag to cl.exe . You might find something interesting! Next time, how this flag pointed me towards a particular programming pattern that is slow to compile on MSVC, and allowed saving 80 seconds of compile time of one file.

Update: An item on Visual Studio feedback site to document the flag and perhaps some others that might be useful.