For many game studios, fast builds are essential to their development process and a lot of time and effort can be spent keeping builds fast. To support this, the PlayStation®4 (PS4) CPU Toolchain will shortly support a new Clang feature: time traces. This tells you where the compiler is spending its time, giving you an insight into what the compiler is doing and an insight into your own code. Here it is in action building a single source file:

Figure 1 - Single file time trace shown using chome://tracing via the Chrome browser

In Figure 1 you can see the compiler's time split between "Frontend" and "Backend" (see https://en.wikipedia.org/wiki/Compiler). In the Frontend, the compiler traverses through several levels of C++ include files; labelled as "Source" blocks. In a large project it is easy to lose track of how many nested header files a single "#include" will bring in and not appreciate how long that adds to the compile time. Time trace makes this clearly visible. It even shows the time taken to parse an individual class or template. Note that to avoid causing too much overhead, and therefore impacting compile time, shorter events are excluded from a trace.

In the Backend, the compiler is optimising your code and generating machine code. In LLVM this is done in a series of passes, each performing a particular transformation. Time traces show how long the compiler takes optimising functions ("OptFunction" in the above animation) and running passes ("RunPass" in the above animation) such as "Dead Store Elimination", though many are too fast to be recorded.

At the bottom of this view are totals for each category of event giving an overall picture of where time is spent. Note, where events are nested inside each other (e.g. "Source" events), the totals only count time once for multiple levels, also note that some totals overlap others, so don't expect all the individual totals to add up to the total compile time. For example, "PerformTemplateInstantiation" is included as part of the "Frontend" total as well as by itself.

How to use time traces

Generating traces

To generate compile time traces you just need to add -ftime-trace to your compiler arguments:

clang -c foo.cpp -ftime-trace

This outputs a JSON time-trace file named according to the output object file, with the file extension (e.g. .o ) replaced with .json .

Viewing in Chrome

The JSON file can be opened in the Chrome browser. Enter the URL chrome://tracing and select the option to "Load" a file.

Figure 2 - Opening a trace in Chrome

From here you can use the controls to explore the profile. Selecting an event will show more details in the bottom pane. The search box in the top right can be used to find specific events such as header files or optimisation passes.

Analysing a whole build

It is possible to use the information from time traces along with data from your build tool to analyse a whole build. This will depend on what you use as a build tool. We have produced an example of how build data from the Ninja build tool can be integrated with Clang time traces (based on https://github.com/nico/ninjatracing):

Figure 3 - Analysing a whole ninja build

Figure 3 shows a parallel build (14 jobs running at once) and embeds the detailed time trace data from the compiler. This lets you view your whole build (to spot long files or limited parallelism), right down to why individual files are taking a long time to build. Using the search field, you can highlight header files to see everywhere that they are used, and how long your build is taking to parse them.

Speedscope

The above approach works well for small local builds. For very large builds, or very wide builds, as with SN-DBS, this view can get unmanageable. An alternative way of analysing a large build is to generate a flat profile, ignoring parallel jobs, and analyse it with the Speedscope tool (https://github.com/jlfwong/speedscope). A flat profile can be generated with a few lines of python code:

#!/usr/bin/env python3 """Combine JSON from multiple -ftime-traces into one. Run with (e.g.): python combine_traces.py foo.json bar.json.""" import json import sys if __name__ == '__main__': start_time = 0 combined_data = [] for filename in sys.argv[1:]: with open(filename, 'r') as f: file_time = None for event in json.load(f)['traceEvents']: # Skip metadata events # Skip total events # Filter out shorter events to reduce data size if event['ph'] == 'M' or \ event['name'].startswith('Total') or \ event['dur'] < 5000: continue if event['name'] == 'ExecuteCompiler': # Find how long this compilation takes file_time = event['dur'] # Set the file name in ExecuteCompiler event['args']['detail'] = filename # Offset start time to make compiles sequential event['ts'] += start_time # Add data to combined combined_data.append(event) # Increase the start time for the next file # Add 1 to avoid issues with simultaneous events start_time += file_time + 1 with open('combined.json', 'w') as f: json.dump({'traceEvents': sorted(combined_data, key=lambda k: k['ts'])}, f)

Python code to combine time trace files

Speedscope is a more advanced viewer that can handle larger time trace files and allows more analysis of where time is spent.

Figure 4 - Speedscope analysing a build time trace

Figure 4 shows an unsorted time ordered view, a view ordered by longest event first, followed by a "Sandwich" view (similar to the “Butterfly” view in Razor). Sorting columns of the "Sandwich" view allows you to compare how long your build takes in the compiler Frontend and Backend and allows you to find header files that take a long time across your build (either because they are large or because they are included in many places). The right-hand side of thie view shows you the breakdown of that into smaller events (such as parsing other source files, classes or templates).

Example build time problems

Single very long file

Distributed builds can be dominated by a single very large file taking a long time to build. With a full build time trace, this is easy to identify. Once you have identified a single long file, it should be possible to see whether it is long due to:

Parsing long header files.

Long template instantiation.

Slow optimisation of a function.

With this identified, you may be able to improve build times by restructuring your code.

Header file

Analysing a whole build profile can highlight problems such as time spent re-parsing the same header files in multiple compile units. If this is the case, then consider whether to:

Break up large headers into smaller headers

Try a tool such as Include What You Use (IWYU)

Try using precompiled headers

Template instantiation

Build time traces can help highlight very deep or slow template instantiations. For example, calculating a sum of 1 to 100 with template metaprogramming shows up as a very deeply nested trace (see Figure 5). Note that the detail at the bottom of the screen identifies the template and arguments taking time. Time traces make this kind of build problem easier to find and diagnose.

Figure 5 - A deeply nested template instantiation

Slow optimisation of a function

LLVM/Clang performs optimisations in a series of passes. Most of these will be too fast to show up in a time-trace. If an "OptFunction" event is taking a very long time this may indicate a compiler bug (see example Figure 6). If you see this, it should be reported back to the compiler provider. Time trace will report the function it is running on which may help you to identify characteristics of your source code that are causing the slowdown and fix or workaround the problem yourself.

Figure 6 - A slow optimisation bug

Summary

Build time traces offer a lot of detail both about what is happening in your source code and what the compiler is doing when building your source code. The trace is provided in a standard, machine-readable format, so it is easy to process and analyse. If you are facing build time problems, or just want to improve your build times, then I recommend trying the clang -ftime‑trace option.