Reimplementing graphmod as a source plugin: graphmod-plugin

Posted on August 9, 2018

You may have heard about source plugins by now. They allow you to modify and inspect the compiler’s intermediate representation. This is useful for extending GHC and performing static analysis of Haskell programs.

In order to test them out, I reimplemented the graphmod tool as a source plugin. graphmod generates a graph of the module structure of your package. Reimplementing it as a source plugin makes the implementation more robust. I implemented it as as a type checker plugin which runs after type checking has finished. The result: graphmod-plugin

An example of the structure of the aeson package

Architecture

The plugin runs once at the end of type checking for each module. Therefore, if we want to collate information about multiple modules, we must first serialise the information we want and then once all the modules have finished compiling collect all serialised files and process the information.

We will therefore first define a plugin which extracts all the import information from one module before defining the suitable executable which collects all the import information and produces the final output graph.

graphmod-plugin consists of a library which exports the plugin and an executable which is then invoked to render the information. Here is how to directly use the two in tandem:

Once the dot file has been generated, you can use the normal graphviz utilities to render the file.

tred removes transitive edges from the graph before we render the graph as a pdf.

The plugin

A type checker plugin is a function of the following type:

The TcGblEnv is the output of the type checker, it contains all the type checked bindings in addition to lots of other useful information. We are interested in just the imports, these are located in the tcg_rn_imports field.

An LImportDecl GhcRn is a data type which contains information about each import.

Along with the module name, there is lots of meta information about other aspects of the import such as whether it was qualified and so on. Our plugin will take this information and convert it into the format expected by the existing graphmod library.

The graphmod Import data type is a simplified version of ImportDecl . It’s straightforward to extract the information we need. Notice how much simpler this approach is than the approach taken in the original library which uses a lexer to try to identify textually the position of the imports.

Notice that it is also possible to extend the GraphMod.Import data type to contain new information easily. In the previous implementation this would be much more effort as the lexing approach is fragile.

Serialisation

Once we have gathered this information we need to serialise it and write it to disk so that once we have compiled all the modules we can deserialise it and render the final graph.

As we are using GHC, we can use the same serialisation machinery as GHC uses to write interface files. Of course, you are free to use whatever serialisation library you like but there are already instances defined for GHC specific types. We won’t need any of them in this example but they can be useful. The writeBinary function takes a value serialisable by the GHC.Binary class and writes it to the file.

We also needed to write some simple Binary instances by hand in order to do the serialisation.

Plugin Description

Once we have these parts, we can assemble them into the final plugin. We first get the imports out of tcg_rn_imports and then convert them using convertImport . We then write this information to a uniquely named file in the output directory which is passed as an argument to the plugin.

mkPath tries to come up with a unique name for a module by using the moduleUnitId . The file name doesn’t matter particularly as long as it’s unique. We could instead write this information to a database or to a file handle. Writing to disk is just a convenient method of serialisation.

Then, we define the plugin by making a definition called plugin and overriding the typeCheckResultAction field and the pluginRecompile field. purePlugin means that the result of our plugin only depends on the contents of the source file rather than any external information. This means that we don’t need to recompile the module every time just because we are using a plugin.

Now that our module exports an identifier of type Plugin called plugin we are finished defining the plugin part of the project.

The finaliser

Once all the modules have finished compiling. They will have written their information to a file in a certain directory that we can now inspect to create the dot graph.

We define an executable to do this. The executable takes the directory of the files as an argument, reads all the files and then processes them to produce the graph.

In the collectImports function, we first read the directory from a command line argument. Then we find all the files in this directory and read their contents into memory. We use the helper function readImports which uses functions from the Binary module to read the serialised files. Finally, we build the graph using all the import information and then pass the graph we have built to the existing graphmod backend.

The buildGraph function builds an in memory representation of the module graph. There is a node for each module and an edge between modules if one imports the other. We finally mimic the original graphmod tool and output the representation of the graph on stdout . This can then be piped to dot in order to render the graph.

Running the plugin with nix

By far the most convenient way to run the plugin is with nix. This gets around the problem of having to run the finaliser after compiling the plugin. We use the haskell-nix-plugin infrastructure in order to do this.

The information required to run the plugin consists of information about the plugin package but also an additional, optional, final phase which runs after the module has finished compiling.

I will add this definition to the plugins.nix file in haskell-nix-plugin once ghc-8.6.1 is released.

We then would use the addPlugin function in order to run the plugin on a package. In order to get the module graph we inspect the GraphMod output.

Running this script on aeson produces this quite large image which shows the whole module graph.

A complete example default.nix can be found in the repo.

Conclusion

We have described one way in which one can structure a plugin. There are probably other ways but this seems ergnomic and convenient. Hopefully others will find this quite detailed summary and reference code useful to build upon.