An introduction to creating GCC plugins

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

The free and open source community is largely based on extensibility, sharing, and modularity, so projects will often provide an API that allows users to add functionality. One way to accomplish that is via plugins to the base project, which allows a user to customize the project to their needs and easily share the result with others who might also see utility in such a modification. Such a framework was recently implemented in GCC.

As of version 4.5 of GCC, a user can create a plugin that provides extra features to the compiler, which they can further pass around as a shareable module. GCC plugins provide developers with a rich subset of the GCC API to allow them to extend GCC as they see fit. Whether it is writing an additional optimization pass, transforming code, or analyzing information, plugins can be quite useful.

In this article we will learn how to develop a GCC plugin by creating one that can be used for spell checking read-only strings in an executable. What is the reason for doing so, one might ask? Well, it's a lesson on plugins, and this example provides a nice way to investigate this aspect of GCC.

Creating a plugin

A plugin is just a shared object file, following an API defined by GCC. To get started cooking up a plugin we will need to obtain a version of GCC, 4.5 or later. After your GCC is built, you are ready to get your hack on. Create a development directory " mkdir myplugin " and " cd myplugin " From there we will create the source file which will be our plugin: myplugin.c .

Plugin preparation

The first step in creating a plugin is to fill out some of the structures that the plugin API provides. These allow GCC to properly call the plugin, provide the GCC user some "help" information, and verify that the plugin is able to run on the version of the compiler that is trying to use the plugin. Before I introduce the plugin API, take a peek at it for yourself, the easiest way to figure out where the API header is located is to run the GCC that you are targeting your plugin for:

gcc -print-file-name=plugin

The files in there are the GCC API accessible to your plugin. If, however, the command simply prints the word " plugin ", that means those files are not installed on your system. Of interest right now is the boiler-plate stuff, located in gcc-plugin.h . The following structures are provided by the API, and their fields are documented in gcc-plugin.h .

struct plugin_name_args : Contains information GCC deduces about the plugin that is being called. We do not fill this structure out, rather GCC passes us an instance of it when our plugin's initialization routine is called. We can also get arguments for the plugin that were passed via the command line, such as: -fplugin-arg-myplugin-foo=bar . All plugin arguments are prefixed with the "basename" of the plugin, which is the name of the plugin shared object file that results after compilation but without the .so extension.

: struct plugin_info : Version and help string for your plugin, what gcc -v will display. Must be registered upon initialization of your plugin Optional

: struct plugin_gcc_version : Ensures that your plugin will operate on the appropriate version of GCC. Gets passed as input to the plugin via initialization. We can use this to verify version information. Optional

:

Now to get started. Using your favorite text editor, create a file called myplugin.c and include the headers for the basic GCC API.

#include <gcc-plugin.h>

The next thing we must define is a variable called plugin_is_GPL_compatible . When plugins are being loaded, the symbols in the resulting object file are examined, and before the plugin will actually be usable, GCC makes sure that there is a symbol called plugin_is_GPL_compatible . If this symbol exists, then your plugin is deemed to have the proper license.

int plugin_is_GPL_compatible = 1;

Next, let's define some metadata about the plugin; that is done by instantiating an instance of struct plugin_info in case the user asks for help (via gcc -v and/or gcc -v --help ) with the plugin loaded. This is not required but can be helpful. Also, to make this effective, during initialization GCC must be handed this structure:

static struct plugin_info myplugin_info = { .version = "42", .help = "Not yet...", };

Next, we will define an instance of plugin_gcc_version so that, during initialization, we can verify that our plugin and GCC can dance together nicely.

static struct plugin_gcc_version myplugin_ver = { .basever = "4.6", };

Note that every field of this structure ( basever , datestamp , devphase , revision , and configuration_arguments ), all of which are strings, must match that of the version of GCC you are using when you let GCC determine if the version matches, e.g. via a call at initialization to plugin_default_version_check() . You can use your own version check method instead, which is ideal, as I doubt that your configuration options would exactly match those of your plugin's users. In short, do your own version checking unless you are bundling your plugin as part of a GCC release where the configuration options and datestamps, etc. will match.

If you are curious, the 'plugin-version.h' has the data for the basever , datestamp , devphase , revision , used when you built GCC (or when your distribution package maintainer built it). For our example, we will just validate against the base version string of GCC and will handle the version check ourselves. For our purposes we only care about any GCC version 4.6 and are ignoring the micro version number, e.g. the '0' in 4.6.0.

Plugin initialization

By now we have provided a bunch of info about the plugin, but there is one more piece of information that is required by GCC: the routine that actually initializes the plugin and registers callbacks to occur during the compilation of a program. This is another symbol the plugin framework will pluck out of the file, so it must be named plugin_init() . Our plugin is going to operate as a GIMPLE pass, allowing us to analyze (and transform if we wish) GCC's middle-end representation of the program being compiled. There are a couple of other options available. IPA_PASS is an interprocedural pass where the developer parses the control flow graph of the program and optimizes code that is dependent across functions, such as function inlining. The RTL_PASS is the last pass the compiler performs. This type of pass operates on a RTL (register transfer layout) representation of the program. RTL is what maps the gcc GIMPLE code into the register of a machine (machine description).

In order to do that, we need to define the proper struct and pass it information to the plugin framework, which will then call our callback handler once for each function in the source file. There are a number of other things that can be accomplished via plugins. These things are called "events" and are enumerated values in the plugin.def file. The .def files in GCC represent enumerations and are translated to source code when GCC is being compiled. These are then included into GCC's source for plugin.c:

#include "plugin.def"

This makes using ctags kind of tricky, because it does not inspect .def files, nor does it know how to parse them. On the other hand, cscope works well if tell it to look for .def files.

Below is the data structure we use when our plugin is asked to initialize itself.

/* See tree-pass.h for a list and descriptions for the fields of this struct */ static struct gimple_opt_pass myplugin_pass = { .pass.type = GIMPLE_PASS, .pass.name = "myplugin", /* For use in the dump file */ /* Predicate (boolean) function that gets executed before your pass. If the * return value is 'true' your pass gets executed, otherwise, the pass is * skipped. */ .pass.gate = myplugin_gate, /* always returns true, see full code */ .pass.execute = myplugin_exec, /* Your pass handler/callback */ };

Please note that there are many more options we can specify for our pass, but to keep things simple, we are just going to have a basic pass with a gate and execution callback. myplugin_exec() is the function that is called back given the conditions specified in the optimization pass we define below. Here are the pieces for all three parts of the optimization pass (the source code for the full plugin can be found here):

Now that we have a structure defining how we want our pass to act, let's finally write the required plugin_init() function:

/* Return 0 on success or error code on failure */ int plugin_init(struct plugin_name_args *info, /* Argument information */ struct plugin_gcc_version *ver) /* Version info of GCC */ { /* * Used to tell the plugin-framework about where we want to be called in the * set of all passes. This is located in tree-pass.h */ struct register_pass_info pass; printf("Plugin initialized...

"); /* * We could call: plugin_default_version_check() to validate our plugin, but * we will skip that. Instead, as mentioned it can be more useful if we * validate the version information ourselves */ if (strncmp(ver->basever, myplugin_ver.basever, strlen("4.6"))) return -1; /* Incorrect version of GCC */ /* * Setup the info to register with GCC telling when we want to be called and * to what GCC should call, when it's time to be called. */ pass.pass = &myplugin_pass.pass; /* * Get called after GCC has produced the SSA representation of the program. * After the first SSA pass. */ pass.reference_pass_name = "ssa"; pass.ref_pass_instance_number = 1; pass.pos_op = PASS_POS_INSERT_AFTER; /* Tell GCC we want to be called after the first SSA pass */ register_callback("myplugin", PLUGIN_PASS_MANAGER_SETUP, NULL, &pass); /* * Tell GCC some information about us... just for use in --help and * --version */ register_callback("myplugin", PLUGIN_INFO, NULL, &myplugin_info); /* Successful initialization */ return 0; }

The above ties everything together. The first thing we need our plugin to do is to visit each statement in the source code that is being compiled and identify the read-only data. As an SSA pass, the compiler will hand our plugin GIMPLE, GCC's intermediate language, a set of basic blocks per function. Since all frontends (C, C++, Go, Fortran, etc) produce GIMPLE, gcc can effectively optimize just the GIMPLE code, rendering all optimizers capable of working on any language that GCC can parse. Likewise, when a developer writes a pass for the GIMPLE gcc intermediate language, it becomes language agnostic and can be applied to any language GCC parses. Using a three-address code, GIMPLE represents each statement in the input language as a statement consisting of two operands and a result value. In the case of an assignment statement with an addition operator, we have something like:

lhs = op1 + op2

Where lhs , op1 , and op2 are the three address codes that make up the assignment statement.

Read-only identification

There might be an easier way of finding read-only data in a program, however our approach here allows one to understand how to traverse basic blocks and statements in the program. We are more concerned with learning than functionality for this tutorial.

When a function gets passed to our plugin we need to do something with it, mainly analyze it and identify read-only data. The following is our callback we registered previously myplugin_exec() :

static unsigned myplugin_exec(void) { unsigned i; const_tree str, op; basic_block bb; gimple stmt; gimple_stmt_iterator gsi; FOR_EACH_BB(bb) for (gsi=gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) { stmt = gsi_stmt(gsi); for (i=0; i<gimple_num_ops(stmt); ++i) if ((op = gimple_op(stmt, i)) && (str = is_str_cst(op))) spell_check(stmt, str); } return 0; }

FOR_EACH_BB operates on a global variable in GCC which represents the current function being processed, cfun . The gsi is a GIMPLE statement iterator. In short, this loop traverses each basic block in cfun and visits each statement in that function. We look at each statement in the basic block via the gsi interface. Finally we look at each operand that makes up the GIMPLE statement. The is_str_cst() is a predicate function that we will define below. This function determines if the operand we plucked from the statement represents a string constant. If it does we will spell check that string.

In GCC, every construct in the source code is essentially a tree. A tree node could represent constants, variables, functions, etc. Often these tree instances are wrapped multiple times. For instance the tree node might be encapsulated in an SSA_NAME instance for the SSA representation. Under that might be a POINTER_TYPE and under that an INTEGER_CST for a constant integer. Note that these node types are defined in tree.def . Helper functions in tree.h can help to identify and peel off these layers, such as with TREE_CODE() and TREE_OPERAND() respectively. Also, TREE_TYPE() is useful for peeling off the layers. The debug_* functions can help better understand this layering, particularly debug_tree() . The following routine is what we will start off with for identifying STRING_CST :

static const_tree is_str_cst(const_tree node) { const_tree str = node; /* Filter out types we are ignoring */ if (TREE_CODE(str) == VAR_DECL) { if (!(str = DECL_INITIAL(node))) /* nop expr */ return NULL_TREE; else if (TREE_CODE(str) == INTEGER_CST) /* Ignore single chars */ return NULL_TREE; str = TREE_OPERAND(str, 0); /* addr expr */ } else if (TREE_CODE(str) == ADDR_EXPR) str = TREE_OPERAND(str, 0); /* We only deal with readonly stuff */ if (!TYPE_READONLY(str) && (TREE_CODE(str) != ARRAY_REF)) return NULL_TREE; if (TREE_CODE(str) != STRING_CST) str = TREE_OPERAND(str, 0); if (TREE_CODE(str) != STRING_CST) return NULL_TREE; else return str; }

As you can see, we detect the type of node to be processed. If we are initially presented with a declaration node (which one might get if the node were a parameter instance) we look at the declaration of it via DECL_INITIAL() . If we get an address, we peel off that variant and look at what it is an address expression of. If the node isn't read-only we assume that it's probably not a hard-coded string so we just ignore it. There is probably a better way of handling this, such as just placing the node in a loop around operand checks, but I'll leave other methods of peeling the types as an exercise for the reader.

Conclusion

gdb

gocc

xgcc

xgcc

gcc

debug_gimple_stmt()

debug_generic_stmt()

debug_tree()

Index entries for this article GuestArticles Davis, Matt

Plugin passes are quite powerful, as can be seen above. I hope this brief jaunt down "pass" lane has been helpful. But before I leave you to your plugin hacking I want to give a few more tips that can aid learning the GCC internals. Use a debugger. If you built GCC and your plugin with debugging symbols, you can learn a lot by stepping through the code. When debugging via, make sure you use the actual binary for compiling the language you want, for go the binary isand for C you probably want to useinstead of GCC.is located in the object directory where you built GCC, in thesubdirectory. Use that for debugging C related code. One other useful tip in learning the GCC internals is to actually trace the code that is used for printing and dumping data from the passes, such as, andthese will help you see how GCC structures the objects because everything is a tree. Log in to post comments)