As part of my 6-month research secondment to Microsoft Research in Cambridge I am taking up the challenge of migrating the current GHC build system based on standard make into a new and (hopefully) better one based on Shake . If you are curious about the project you can find more details on the wiki page.

During this week I’ve been trying to wrap my head around the current build system and so far I’ve felt rather intimidated by the scale of this undertaking. Yet I am optimistic because I’m helped by several much smarter people 🙂 If you are a Haskell fan and would like to help make GHC better, please join this team effort!

To give you an idea about typical dependency chains, below is a representative one:

File compiler/prelude/primops.txt.pp describes GHC’s primitive operations and types. It uses C preprocessor directives like #include and #if and therefore needs to be preprocessed. The result goes into compiler/stage1/build/primops.txt . If one of the included files changes (e.g., ghc_boot_platform.h ) the result must be rebuilt.

describes GHC’s primitive operations and types. It uses C preprocessor directives like and and therefore needs to be preprocessed. The result goes into . If one of the included files changes (e.g., ) the result must be rebuilt. File primops.txt is processed by genprimopcode --out-of-line to generate primop-out-of-line.hs-incl .

is processed by to generate . genprimopcode itself needs to be built. It lives in utils/genprimopcode and is a Haskell program, so it is first built with the bootstrap (stage 0) compiler. Read more about compiler stages here.

itself needs to be built. It lives in and is a Haskell program, so it is first built with the bootstrap (stage 0) compiler. Read more about compiler stages here. Finally, a bunch of .hs-incl files are included (by a C preprocessor) into compiler/prelude/PrimOps.lhs , which then participates in the rest of the build in a more or less standard way.

A detour into graph families and parallelism

Since my recent research has been focused on graphs, I find build dependency ‘graphs’ quite interesting. I put graphs in quotes, because they are actually graph families, where each constituent graph corresponds to a partial order on the set of build actions: this is what gets executed when you run a build tool once (partial orders give parallelism and termination). There is some additional conditional structure, which enables/disables parts of the family depending on certain file freshness criteria, contents of generated files, and on the final build target; conditions select a particular graph in the family, so each build run is unambiguous. I think that one can use the algebra of parameterised graphs to capture and reason about build systems, for example, transform them in a provably correct way. The theory of parameterised graphs has been developed specifically to reason about graph families; you can have a look at my recent paper about this here.