Middle End Lisp Translator (MELT)

MELT site

These wikipages are not well maintained. See the GCC MELT site http://gcc-melt.org for more up to date informations

OBSOLETE CONTENT

This page documents the MELT branch (which I previously called Basilys) - see the paper in the 2007 GCC Summit Proceedings, Multi-stage construction of a global static analyser by Basile Starynkevitch, pages 143 - 152 and the MELT paper in the GROW09 workshop (inside Hipeac09).

Recent slides are gcc-melt-tutorial-oct2010.pdf (MELT tutorial GCC Summit 2010), here (MELT slides november 2009) and RMLL 2010 slides (july 2010) here.

This work is motived by static code analysis, and more generally any other middle-end specific processing (which would go inside a plugin). The basic insight of MELT is that C is a too low level language to code compilers (and their plugins), in particular their middle end passes, notably those passes which are specific and should be prototyped quickly! This view is in contrast with the popular (and still wise) opinion that a compiler should be coded in the language it compiles (but MELT is coded in MELT).

MELT is a domain specific language to develop GCC extensions (it is actually tailored to fit strongly into GCC). You hopefully will increase your productivity by coding your GCC extensions in MELT, however you need to learn some basics of GCC internals (major internal representations like Gimple and Tree, overview of GCC passes), and also to learn the MELT language.

Discussions in english about GCC MELT should go on the gcc@gcc.gnu.org mailing list. A discussion group in French on GCC MELT is available on gcc-melt-french googlegroup.

availability and download

The MELT subversion repository is the GCC MELT branch on svn://gcc.gnu.org/svn/gcc/branches/melt-branch - a tarball snapshot for a GCC 4.5 MELT plugin is given in the MELT tutorial page (but retrieving thru svn is much preferable). The MELT branch has an incomplete but existing gcc-internals documentation (run make pdf to get it). An older version of this documentation is attached to the MELT tutorial wiki page.

Why MELT can be useful to you?

MELT is useful for the same reasons that (big) GCC plugins (or extensions) will be useful, and MELT can be viewed as a tool to easier code GCC extensions (like those provided by plugins) in. MELT permit (perhaps more easily than coding big GCC plugins in C) to tailor the GCC to suit it to your own specific need, e.g.

coding specific passes adding more diagnostics for a given software. For instance, once could code in MELT (easier than in C) a GCC pass which checks that every call to fopen is tested against NULL in the same routine. Such a pass is specific to the standard C library providing fopen . One could imagine than major free software (like the linux Kernel, the GTK or QT graphical toolkit, the Apache server, ...) might want to develop their specific GCC extensions e.g. in MELT to do some checks specific to that software. coding specific passes adding more optimizations for a particular software. For instance, one could code in MELT (easier than in C) a few GCC passes which optimize every call to fprintf(stdout, ...) into printf(....). Doing this inside the compiler (in particular, in a pass called after inlining) is easier than doing it outside. Very probably, current big free software offer analogue optimization possibilities. optimizing a given big software to some particular (hardware or operating) system. GCC offers a wide set of tuning parameters, and one could probably tune them to fit into some particular variant of a target (e.g. a system on chip with an ARM or PowerPC inside). refactoring, or aspect oriented programming, or software metrics or quality or navigation can also be improved by your GCC extensions. any other GCC extensions you want or need, in particular if it is big enough to justify learning a new specific language.MELT can probably be useful to any person wanting to develop specific GCC extensions thru plugins, provided these are big enough to justify learning a new domain specific language.

MELT is probably not interesting if you want to develop a tiny GCC plugin (but be aware that plugins tend to grow significantly), or if you want to develop a new feature for the core GCC (not thru a plugin), i.e. some pass that you want to push inside the GCC trunk (but you could prototype it in MELT).

Basic hints for coding GCC extensions in MELT

Some hints for newcomers; most of the hints here also apply to people considering coding a GCC plugin in C.

You first need to grasp GCC main internal representations (notably Tree & Gimple & Gimple/SSA). There are several documents about them (including the source files gcc/tree.def and gcc/gimple.def and their comments and the internal GCC documentation); read them first!

You then need to understand a little bit the organization of the GCC passes. Perhaps a good way to feel how GCC passes work is first to build a very small C (or Fortran or Ada) source code example yourexample.c illustrating your needs in a new (otherwise empty) directory, and to run there something like gcc -S -fverbose-asm -O2 -fdump-tree-all yourexample.c

This will fill your directory with a lot of dump files (about a hundred or two) like yourexample.c.004t.gimple or yourexample.c.080t.copyprop3 . These dump files are partial textual representations of the various intermediate representations (usually Gimple) of your code. By glancing inside them with a textual editor, you will understand what the compiler did, and perhaps be able to choose the pass after which your MELT extension (or GCC plugin in C) has to work. You will often have to add your own (one or few) passes, preferably coded in MELT, and you need to understand where (after which existing GCC pass) will you insert your own pass (or passes) coded in MELT.

You also have to look for examples of MELT processing, in files like melt/xtramelt*.melt . Feel free to ask for help.

Small MELT overview

The MELT branch contains several (related) stuff. It can also be compiled as a GCC plugin. See MELT tutorial for details. Everything can be enabled or disabled at GCC configure time or at GCC run time:

a Lisp dialect compiled into C code, with which one can code middle end passes a runtime which extends the GCC infrastructure to support the previous items, in particular a generational copying garbage collector well suited for the lisp dialect above, which is build above the existing GGC (which deals with old values).

All this work was and is (partly) funded by the french Ministery of Economy ... in the past thru ITEA within the GlobalGCC project and in 2010-2011 by the OpenGPU project.

The MELT branch has been created on february 19th 2008 and you can get it for readonly with svn checkout svn://gcc.gnu.org/svn/gcc/branches/melt-branch/ and for read-write (assuming you have an account) with svn checkout svn+ssh://yourusername@gcc.gnu.org/svn/gcc/branches/melt-branch/ (replacing yourusername with appropriate login). inkin My (Basile Starynkevitch's) contributions to GCC are covered by the copyright transfer signed by CEA to FSF, reference RT306238 which I have announced here

The MELT branch follows quite closely the GCC trunk (which is usually merged more than weekly into MELT).

If you want to try quickly and easily MELT, you should use the Plugin version of MELT. You can find the last build at the MELT page. You need at least GCC 4.6 with plugin support (compiled with --enable-plugin) and the related development package. Installation is detailled in the README.

In practical terms, MELT is quite resource consuming; more precisely, the compilation of the C generated code by a C compiler (usually GCC itself) is the major bottleneck. However, some recent progress has been made (by splitting some MELT files, and by improving the C code generator, notably by shrinking the huge initialization routine of each module by moving some code into several smaller static routines.). We suggest compiling MELT on a 4Gb RAM machine, or not passing any optimization flag when compiling the generated C code. (This is unrelated to the flags used to compile your own code in a MELT enabled GCC). More details here.

You can find some performance tests here.

MELT motivation

MELT is a high-level Lisp-like language designed to fit very closely in GCC internal representations, thru an automated translation of MELT code into GCC specific C code, compiled by a C compiler (usually some GCC) and then loaded as plugins. This enables easier development of high-level static analysis and transformation, working on GCC middle end representation (GIMPLE tuple).

summary of motivations

The motivations are detailed in the GCC Summit 2007 papers; in a few words

Coding passes in a LISP dialect is more fun and easier to the human developer. Some of these MELT passes (related to static analysis) are expected to run for a very long time. These peculiar passes are very rarely run (and only explicitly). It is worthwhile (and in the spirit of Lisp) to generate MELT/Lisp code during such very time consuming passes. So some passes might profit of dynamic code generation (at the meta-level) during them. Hence the MELT infrastructure should be able to generate some (specialized) code (as C files), to compile it into a dynamically loadable stuff (e.g. *.so shared objects on Linux/ELF; or *.la file with libtool), and to dynamically load it (all this during the same peculiar cc1 execution).

The MELT branch should generate C code during cc1 execution (the C code is translated from LISP internally) - it is important that it happens during execution of the cc1 process (because the whole idea is to be able to generate and then execute code during some very time consuming MELT passes). This C code is compiled into a dynamically loadable stuff (usually a *.so) and dynamically loaded by dlopen (see function compile_to_dyl in file gcc/melt-runtime.c in the MELT branch).

MELT Lisp dialect

The Lisp dialect is a Lisp1 (so more a Scheme than a Common Lisp w.r.t. names and bindings) dialect able to handle both boxed and some unboxed values. You can define primitives (which get translated to C), for example the (unboxed) integer addition is already defined as (defprimitive +i (:long a b) :long "((" a ") + (" b "))") the first :long occurrence describes the types of the formal arguments a and b, the second occurrence describes the result. There is an minimal object system (single-inheritance hierarchy, rooted at CLASS_ROOT . S-exprs are objects. Values can be MELT objects, closures, vectors, lists (a linked list of pairs, with pointers to first and last pair), pairs, etc... boxed integers or boxed GCC trees or hashtables (of objects, of trees, ..) etc... Every value has a discriminant (a MELT object) which is its class for objects. Adding support for other GCC datatypes is very easy. Tail-recursion is not handled (looping should use the forever keyword, and loops are can be exit ed). the let keyword is like let* in Scheme (binding sequentially).

MELT compiler implementation

It should be stressed that the MELT compiler (actually inside cc1) is translating MELT Lisp dialect S-exprs (either from a file or in memory) into a (huge) C file, which is usually compiled into a dynamically loadable stuff and then dynamically loaded (all in the same cc1 process). The MELT compiler is written in MELT (files warmelt*.melt), and a CommonLisp variant (file contrib/cold-basilys.lisp for Clisp) was coded to bootstrap it.

The MELT compiler (see function compile_list_sexpr in warmelt-outobj.melt) proceeds in several steps.

a list of S-exprs (these are MELT objects of CLASS_SEXPR ) is given, either by parsing a MELT file or because it is in memory. An initial environment is also given (could be empty for the particular case of warmelt-first.melt). the first step is macro-expansion. Every S-expr is macro expanded, usually into some subclass of CLASS_SRC (it is a source element); for example the if MELT keyword occurrence expands to an instance of CLASS_SRC_IF but most S-expr are just plain function or primitive applications. Then a normalization phase occur. Each source element is normalized (into a subclass of CLASS_NREP ) by adding additional let bindings. For example (F (G X) 2) becomes something equivalent to (LET ( (GG (G X) (FF (F GG 2)) ) FF) where FF and GG are fresh. Then a compilation phase is called. It transforms the normalized stuff into some abstract syntax tree (in subclasses of CLASS_OBJCODE ) which mimics a subset of C. At last, this forest of OBJCODE-s is pretty-printed as C code. The generated C code is compiled into a shared object which is dynamically loaded (as any plugin). The C code should be compilable without the GCC source or build directory (once this GCC has been installed) because included files for the C code plugin would be saved elsewhere (e.g. in some melt-private-include/ subdirectory). This is discussed here

current state

You need, in addition of all libraries used by GCC trunk (like mpfr and gmp):

the libtldl development (ie with headers) library from the libtool dynamic loader - libtldl is a dlopen replacement to dynamically load code at runtime. The Parma Polyhedra Library (PPL). It is also needed by Graphite, recently (end of August 2008) merged into trunk. You need at least PPL version 0.10.2 or a very recent GIT snapshot. PPL 0.10 won't work, so you probably would need to build PPL from its source. Essentially a fairly recent GNU/Linux system. I don't know anything else. Maybe it might later work on some other systems. I'm using Debian/Sid or Debian/Lenny on AMD64. Some nice guy (named Rob) was able to build MELT on a Solaris x86 system (but I needed to correct some linuxisms in the code). Some significant amount of RAM (because the generated warmelt*.c (or warmelt*-0.c ) are huge and contain a big, but simple, routine). I have a 4Gb RAM machine. A fairly recent version of GTK to compile the contrib/simple-probe.c

Current (may 2009) state (quite messy, notably for building):

* the configure.ac should be usable, but you have to rebuild all configure files (see the Regenerating_GCC_Configuration page for details) and re configure with --with-ppl . Currently I (Basile) use the following configure command: $GCCTOP/configure --enable-maintainer-mode --enable-checks=tree,gc --enable-languages=c --disable-bootstrap --disable-multilib \

--with-ppl=/usr/local/

Regenerating the configure file is necessary because the configure file is not svn commit -ed on the branch (to avoid conflicts when merging with the trunk).

* the runtime (including a copying generational garbage collector) should be usable (but was painful to debug) files gcc/melt-runtime.c and gcc/melt-runtime.h

and a melt/README-MELT

* Since rev136289 MELT has bootstrapped: the generated files warmelt*.c are compiled into warmelt*.so plugins which are able to generate the very same warmelt*-0.c from melt/warmelt*.melt . Hence the files warmelt-*-0.c are in the SVN repository but they are quite big. rev136409 corrected a bug (crashing on x86/32 bits).

The generated gcc/warmelt-*-0.c files are quite big, and generated by translating gcc/melt/warmelt-*.melt (Lisp MELT) files. You probably need at least 2Gb of RAM to have them compiled with gcc -fPIC -O2 into gcc/warmelt-*.so files in the build directory. If you have less memory, consider compiling them without optimisations (ie -O0 ), e.g. by editing some *melt-cc-script shell script, or by compiling or configuring gcc with CFLAGS=-O0 .

tiny tutorial about MELT

A very incomplete tutorial about MELT. Working knowledge of Scheme, CommonLisp (or at least EmacsLisp) is supposed.

See also preferably the MELT tutorial page which explains how to use and code in MELT.

MELT data

MELT data come in two flavors.

unboxed data - this data is not a MELT value; typical examples are: :long (the C long integer);

:tree (the naked GCC tree structure pointer, as defined in tree.h );

:cstring (constant C strings, outside of any heap or stack); etc. Adding new flavors of MELT unboxed data is quite simple but requires some C code. boxed data are MELT values allocated in the MELT heap. In particular: MELT objects

MELT integers are boxed integers.

MELT trees are boxed GCC trees.

MELT string maps (hash tables associating strings to MELT values)

MELT object maps (hash tables associating MELT objects to MELT values).

MELT tree maps (hash tables associating GCC tree-s to MELT values). This is the preferred way to associate information to GCC trees.

MELT tuples are sequence of MELT values.

MELT closures.

MELT lists are single-linked lists which keep the first & last node of the list,

MELT pairs are like cons-es in LISP. MELT lists' nodes are MELT pairs. etc. See file melt-runtime.h for details.

Each MELT value (i.e. boxed data) is allocated by the MELT generational garbage collector. The MELT object system has single inheritance, a tree (not a forest) of classes, rooted at CLASS_ROOT . Notice that many MELT values (e.g. boxed integers or trees, tuples, closures, object maps ...) are not MELT objects. MELT classes and fields and method selectors are themselves MELT objects, so the object system is somehow reflexive. See CLASS_CLASS for the class of classes and CLASS_FIELD for the class of fields.

Every MELT values have a discriminant (the first field in their C struct). This discriminant is a MELT object; for MELT objects, it is their class. For MELT values which are not objects (like MELT trees, MELT object maps, etc..) it is a MELT object instance of CLASS_DISCR . Every discriminant (including MELT classes) have a dispatch table associating selectors to closures (the method implementation). Hence, method addition is very dynamic and can occur at runtime.

Every MELT object has not only a class, but also a fixed hash code and a magic number (unsigned short). For discriminants (& classes), the magic number is used (even by GGC & MELT garbage collectors) to discriminate the actual structure of the data. In particular, one can have several discriminants for MELT strings sharing the same magic, OBMAG_STRING. Hence, we have DISCR_STRING for most strings, and DISCR_VERBATIMSTRING for some strings handled specially.

The nil MELT value, represented by a NULL C pointer, has DISCR_NULLRECV as its discriminant. So it is possible to invoke (via a MELT selector) a MELT method on the null MELT value.

Be careful about the boxed vs unboxed data.

MELT functions

MELT functions take arguments and produce results. The first argument is a MELT value (it cannot be an unboxed data). Other arguments can be either MELT values or unboxed data. The first result of a MELT function is a MELT value, but additional results can be unboxed. It is expected that each MELT function (and selector) has an implicit, but well defined, signature describing its arguments and results. There is no variable argument facility in MELT.

Here is a sample MELT function repeat-times applying a function f to an argument x some specified n times. f and x are values, n is an unboxed long argument.

(defun repeat-times (f x :long n) ; n is an unboxed formal argument of type long (forever reploop ; infinite loop called reploop (if (<=i n 0) ; test if the unboxed n is negative (exit reploop)) ; exit the loop if yes (f x) ; call f (setq n (-i n 1)))) ; decrement n

The syntax of defun is (defun function-name formal-argument-list body... ) . The formal-argument-list contains either formal arguments names, or type keywords like :value :long :tree etc. A type keyword applies to all following formal arguments (till end of list or other type keyword). The body of a defun is a sequence of zero, one, or more expressions evaluated sequentially.

MELT functions can be anonymous. For instance, assuming (say s) has the side effect of displaying the value s and evaluates to nil, and (make_stringconst discr_string "hello") evaluates to a boxed string value, the following code would say hello ten times.

(repeat-times (lambda (msg) (say msg) (newline)) (make_stringconst discr_string "hello") 10)

MELT primitives

A primitive is something which expands into C code. Primitives are defined using the defprimitive construct. Their invocation expands to C code. For instance the long integer substraction -i is defined by

(defprimitive -i (:long x y) :long "((" x ") - (" y "))")

The syntax is (defprimitive primitive-name formal-argument-list result-type expansion ) without any restriction on the formal argument list (the first formal can be unboxed). Expansions in primitive are strings (verbatim C code fragment) or formals. The expansion can be given more simply as

(defprimitive -i (:long x y) :long #{($x)-($y)}#)

The characters sequence between #{ and }# (we call that a macro-string) is expanded by the MELT lexer into an s-expr of symbols (id prefixed with a dollars) and strings (all the rest).

Once defined, a primitive is invoked syntactically as a function.

As with C macros, and for similar reasons, it is advised to add parenthesis in primitive expansions.

MELT citerators

A C-iterator is something expanding into a C loop, (e.g. a for loop). This construct permits description of arbitrary C loop-like instructions. C-iterating expressions (whose operator is a C-iterator) are used for side effects. Look into melt.texi for more.

See also

The GCC_Plugins page.

The MELT tutorial page and then the writing a pass in MELT page, and also the memory management in MELT page.

The MELT branch is often merged with the GCC trunk (usually more than twice a week).

Some examples (I will copy them soon here) are given in this post on Lambda The Ultimate

A reference documentation has been written (in gcc/doc/melt.texi file). Please read it. Tutorial is still missing.

A graphical schema is attached here.

There is a vim plugin for melt, you can download it at : http://pvittet.com/melt/plugin_vim/melt_plugin.zip.

FAQ and bugs