Guide to compile-time meta-programming Compile-Time Meta-Programming (CTMP) can be thought of as being a more powerful cousin of macros; formally, it is said to allow the user of a programming language a mechanism to interact with the compiler. Most commonly CTMP is used to allow the construction of arbitrary program fragments by user code. In essence Converge provides a mechanism to allow its concrete syntax to describe abstract syntax trees - conventionally called ITree's in Converge - which can then be then spliced into a source file. First examples As a first example, consider this program which, at compile-time, computes the 30th entry in the Fibonacci sequence (computationally, a fairly expensive operation) and assigns it to the fib30 variable: func fib(n): if n == 0: return 0 elif n == 1: return 1 else: return fib(n - 1) + fib(n - 2) fib30 := $<CEI::lift(fib(30))> At run-time the variable fib30 will have the value of 514229 directly assigned to it, negating the need to repeatedly calculate this particular entry in the Fibonacci sequence. The key feature of CTMP are splice annotations $<...> . A splice evaluates the expression within at compile-time (and before VM instruction generation), replacing the splice annotation itself with the AST resulting from its evaluation. This is achieved by creating a temporary module containing the splice expression in a function, compiling the temporary module into bytecode, injecting it into the running VM, and then evaluating the function therein. Since a splice must return an ITree, we need to convert a normal integer object into its ITree integer equivalent: in this example we lift the integer with the CEI::lift function. The following program is a slightly more involved example of CTMP, trivially adopted from its Template Haskell cousin. expand_power recursively creates an expression that multiplies n x times; mk_power takes a parameter n and creates a function that takes a single argument x and calculates x ^ n; power3 is a specific power function which calculates n^3: func expand_power(n, x): if n == 0: return [| 1 |] else: return [| $c{x} * $c{expand_power(n - 1, x)} |] func mk_power(n): return [| func (&x): return $c{expand_power(n, [| &x |])} |] power3 := $<mk_power(3)> When the above example has been compiled into VM instructions, power3 essentially looks as follows: power3 := func (x): return x * x * x * 1 Quasi-quoted expressions [| ... |] build ITrees that represent the program code contained within them whilst ensuring that variable references respect Converge's lexical scoping rules. Insertions ${...} or $c{...} are used within quasi-quotes; they evaluate the expression within and copy the resulting ITree into the ITree being generated by the quasi-quote. Differences from other approaches As the above examples show, by using the splicing and quasi-quoting mechanisms, users can generate code at compile-time. Many readers will be familiar with other macro and macro-esque approaches, and a brief comparison can be enlightening. Lexical macros such as the C preprocessor are deal in string replacement, with little or no regard for the programming language they are dealing with. In such systems it is possible to create syntactically nonsensical programs, and to encounter difficult problems such as variable capture. Converge instead works at the abstract syntax tree level; it has little in common with such approaches. Traditional macro systems are best exemplified by LISP. Converge ultimately traces its CTMP abilities back to LISP. As seen above, Converge uses normal functions for CTMP, and does not explicitly identify macros; instead macro calls (a.k.a splices) are explicitly identified. A number of other differences result from the fact that Converge has a modern, relatively rich, syntax compared to LISPs minimal syntax. Major features The three major features of compile-time meta-programming are splicing, quasi-quoting, and insertion. Splicing The key part of the powers program is the splice annotation in the line power3 := $<mk_power(3)> . The splice tells the compiler to evaluate the expression between the chevrons at compile-time, and to include the result of that evaluation in the module for ultimate bytecode generation. In order to perform this evaluation, the compiler creates a temporary or dummy module which contains all the necessary definitions up to, but excluding, the definition the splice annotation is a part of; to this temporary module a new splice function (conventionally called $$splice$$ ) is added which contains a single expression return splice expr . This temporary module is compiled to bytecode and injected into the running VM, whereupon the splice function is called. Thus the splice function `sees' all the necessary definitions prior to it in the module, and can call them freely -- there are no other limits on the splice expression. The splice function must return a valid ITree which the compiler uses in place of the splice annotation. Splice annotations within a file are executed strictly in order from top to bottom. There are three forms of splice: Default splices $ automatically rename any visible variables in the spliced in ITree to ensure that no variable capture is possible. Capturing splices $c do not rename variables, and thus potentially allow variable capture to occur. Pragma splices $p are different from either previous form in that they do not expect (and, in fact, ignore) any return result from the splice expression. Syntax Type Description $ Default Default splicing evaluates an expression at compile-time and overwrites itself with the resulting ITree. Before it overwrites itself it renames all free and bound variables to fresh names meaning that variable capture is impossible with this type of splicing. $c Capturing Capturing splicing evaluates an expression at compile-time and overwrites itself with the resulting ITree. Unlike default splicing it does not rename free or bound variables, and allows variable capture to occur. $p Pragma Pragma splices evaluates an expression and discards the result (if there is one - the expression may fail). All splice variants have a DSL block equivalent $<<...>> , $c<<...>> , and $p<<...>> . Quasi-quoting Quasi-quotes allow ITree's to be built using Converge's normal concrete syntax. Essentially a quasi-quoted expression evaluates to the ITree which represents the expression inside it. For example, whilst the raw Converge expression 4 + 2 prints 6 when evaluated, [| 4 + 2 |] evaluates to an ITree which prints out as 4 + 2 . Thus the quasi-quote mechanism constructs an ITree directly from the users' input - the exact nature of the ITree is of immaterial to the casual ITree user, who need not know that the resulting ITree is structured along the lines of add(int(4), int(2)). The quasi-quotes mechanism can be used to express ITree's of single expressions - in which case an individual ITree is returned - or a multi-line sequence of expressions - in which case a list of ITree's is returned. Note that Converge's splicing and quasi-quote mechanisms cancel each other out: $<[| x |]> is equivalent to x (though not necessarily vice versa if x does not contain a valid ITree). Insertion Insertions within quasi-quotes work very differently to splices (which occur outside quasi-quotes): the insertion expression itself does not force compile-time evaluation. Instead the insertion expression is essentially copied as-is into the code that the quasi-quotes transforms to. For example, the quasi-quoted expression [| ${x} + 2 |] leads to an ITree along the lines of add(x, int(2)) -- the variable x in this case would need to contain a valid ITree. As this example shows, since splice annotations within quasi-quotes do not cause a change of meta-level, variable references do not cause staging concerns. There are three two forms of insertion. Default insertions ${...} automatically rename any visible variables in the inserted ITree to ensure that no variable capture is possible. Capturing insertions $c{...} do not perform such renaming, and thus potentially allow variable capture to occur. This feature completes the cancelling out relationship between splicing and quasi-quoting: [| ${x} |] is equivalent to x (though not necessarily vice versa if x does not contain a valid ITree). Scoping rules Compile-time meta-programming requires additions to Converge's scoping rules. Scoping and splicing Evaluating a splice expression leads to a new stage in the compiler being executed. Converge's rules about which references can cross the staging boundary are simple: only references to top-level module definitions can be carried across the staging boundary. For example the following code is invalid since the variable x will only have a value at run-time, and hence is unavailable to the splice expression which is evaluated at compile-time: func g(...): return ... func f(x): $<g(x)> Note the resulting error message: Error: Line 21, column 3: Unknown variable 'x' might initially seem somewhat confusing, since the user can clearly see x . However it makes rather more sense when one realises that the message is the result of compiling the following temporary module: func g(...): return ... func $$splice$$(...): return g(x) Clearly when compiling this, there is no x variable in scope. Scoping and quasi-quotes The quasi-quote mechanism can be used to surround any Converge expression to allow the easy construction of ITree's. Quasi-quoting an expression has two important properties: it introduces a new scope (in identical fashion that a function defines a new scope), and it fully respects lexical scoping. Thus quasi-quotes allow users to easily avoid unintended variable capture. Lexical scoping Consider the following contrived example of module A : func x(): return 4 func y(): return [| x() * 2 |] and module B : import A, Sys func x(): return 2 func main(): Sys::println($<A::y()>) The quasi-quotes mechanisms ensures that since the reference to x in the quasi-quoted expression in A::y refers lexically to A::x , that running module B prints out 8 . This example shows one of the reasons why Converge needs to be able to statically determine namespaces: since the reference of x in A.y is lexically resolved to the function A::x , the quasi-quotes mechanism can replace the simple reference with an original name that always evaluates to the slot x within the specific module A wherever it is spliced into, even if A is not in scope (or a different A is in scope) in the splice location. Hygiene Variable capture is when splicing in an ITree could cause variables with the same name to inadvertently overwrite (or capture ) each other. The default splicing mechanism in Converge prevents any variable capture occurring. However often one wishes certain variables to be captured. Consider the following contrived example which uses a capturing splice: func f(): return [| x := 4 |] func g(): x := 10 $c<f()> y := x What might one expect the value of y in function g to be after the value of x is assigned to it? A naive splicing of f() into g would mean that the x within [| x := 4 |] would be captured by the x already in g -- y would end with the value 4 . If this was the case, using the quasi-quote mechanism could potentially cause all sorts of unexpected interactions and problems. In order to solve this problem, not only is Converge able to statically determine namespaces, but variables can be alpha-renamed (essentially meaning that x can be changed consistently to y in a given scope) without affecting the programs semantics. This is a significant deviation from the Python heritage. The quasi-quotes mechanism determines all bound variables in a quasi-quoted expression, and preemptively alpha-renames each bound variable to a name which is invalid in the normal concrete syntax. In so doing, Converge guarantees that the user can not inadvertently cause variable clashes. All references to the variable within the quasi-quotes are updated similarly. Thus the x within [| x := 4 |] will not cause variable capture to occur, and the variable y in function g will be set to 10 . There is one exception: top-level definitions (all of which are assignments to a variable, although syntactic sugar generally obscures this fact) can not be alpha-renamed since this could lead to run-time slot missing exceptions being raised. Converge thus does not permit top-level definitions to be alpha-renamed. Dynamic scoping Sometimes the quasi-quote mechanisms automatic alpha-renaming of variables is not what is needed. For example consider a function swap(x, y) which should swap the values of two variables. In such a case, we want the result of the splice to capture the variables in the spliced environment. The following definition of swap expects to be passed two ITree's representing variables: func swap(x, y): return [| temp := $c{x} $c{x} := $c{y} $c{y} := temp |] It is initially tempting to try and use this function as follows: a := 10 b := 20 $c<swap([| a |], [| b |])> However this causes a staging error since the a and b in the quasi-quotes do not refer to a bound variable either inside or outside the quasi-quotes when the splice expression is evaluated (the latter would be invalid anyway, due to Converge's lifting rules; see later}). swap needs to be passed ITree's representing variable names, not references to variables. ITree's representing variable names can be constructed by the idiom CEI::ivar(x) where x is a string representing a variable name. When such a variable name is spliced into a quasi-quotes it will not be renamed, thereby allowing dynamic scoping. A correct call to swap thus looks as follows: $c<swap(CEI::ivar(a), CEI::ivar("b"))> In this case, the variable names constructed by the CEI interface are first spliced into the quasi-quotes in the swap function. The resulting ITree from the quasi-quotes is then spliced in place of the swap call, and the variable names dynamically capture the a and b variables. Dynamic scoping also tends to be useful when a quasi-quoted function is created piecemeal with many separate quasi-quote expressions. In such a case, variable references can only be resolved successfully when all the resulting ITree's are spliced together since references to the function's parameters and so on will not be determined until that point. Since it is highly tedious to continually write CEI::ivar("foo") , Converge provides the special syntax &foo which is equivalent. Notice that this notation prefixes a variable name --- it has nothing to do with the value the variable contains. Using this syntax also allows swap to be called in the following less cumbersome fashion: $c<swap([| &a |], [| &b |])> Forward references and splicing We saw earlier that when a splice annotation outside quasi-quotes is encountered, a temporary module is created which contains all the definitions up to, but excluding, the definition holding the splice annotation. This is a very useful feature since compile-time functions used only in one module can be kept in that module. However this introduces a real problem involving forward references (i.e. a reference to a definition defined later in the file). If a splice annotation is encountered and compiles a subset of the module, then some definitions involved in forward references may not be included: thus the temporary module will fail to compile, leading to the entire module not compiling. Worse still, the user is likely to be presented with a highly confusing error telling them that a particular reference is undefined when, as far as they are concerned, the definition is staring at them within their text editor. Consider the following contrived example: func f1(): return [| 7 |] func f2(): x := f4() func f3(): return $<f1()> func f4(): pass If f2 is included in the temporary module created when evaluating the splice annotation in f3 , then the forward reference to f4 will be unresolvable. Converge's solution to this problem is to include only the minimal needed subset of definitions in the temporary module; thus most forward references do not raise a compile-time error. When a splice annotation is encountered, the Converge compiler does not immediately create a temporary module. First it calculates the splice expressions' free variables; any previously encountered definition which has a name in the set of free variables is added to a set of definitions to include. These definitions themselves then have their free variables calculated, and again any previously encountered definition which has a name in the set of free variables is added to the set of definitions to include. This last step is repeated until an iteration adds no new definitions to the set. At this point, Converge then goes back in order over all previously encountered definitions, and if the definition is in the list of definitions to include, it is added to the temporary module (this last stage ensures that definitions are not reordered in the temporary module). Note also that free variables which genuinely do not refer to any definitions (i.e. a mistake on the part of the programmer) will pass through this scheme unmolested and will raise an appropriate error when the temporary module is compiled. Using this method, the temporary module that is created and evaluated for the example looks as follows: func f1(): return [| 7 |] func $$splice$$(): return f1() There are thus no unresolvable forward references in this example. Notice that Converge's approach to the forward reference problem is not a completely general solution since some forward references (particularly those to definitions beyond a splice site) are inherently unresolvable. Converge's approach is intended to significantly reduce the problem to the point that any unresolvable references are the result of programmer error. Variable renaming When generating code, it is often the case that references to variables outside generated are made. In certain situations, intentional dynamic scoping can interfere with this. Converge thus allows variables to be declared as being renamed in a scope. When a variable is renamed, the previous name is removed from the scope, and the renamed name is implicitly nonlocal to the scope. For example in the following code, calling f will result in the outer x being incremented by 2; the x defined in f is local to f itself. x := 2 func f(): rename x as y x := 3 y += 2 The rename declaration is typically only used in quasi-quoted code. Note that the rename and nonlocal declarations must be declared before any expressions in a function body. The CEI interface At various points when compile-time meta-programming, one needs to interact with the compiler. The CEI package is the officially sanctioned interface to the Compiler, and can be imported with import CEI . Accessing the compiler internals in any other way may lead to future incompatibilities or undefined behaviour. The complete CEI module documentation is available online. When quasi-quotes are not enough Some desired ITree's can not be created using quasi-quotes. For example an if statement with multiple elif clauses has no obvious concrete syntax equivalent. In such cases the CEI interface presents a more traditional meta-programming interface to the user that allows ITree's that are not expressible via quasi-quotes to be built. Each ITree element has a corresponding function with a lower case name and a prepended `i' in the CEI interface e.g. ivar . Names We saw earlier that the Converge compiler sometimes uses names for variables that the user can not specify using concrete syntax. Users can generate their own fresh names - that is names that are guaranteed to clash with other normal or fresh names - using the fresh_name function. This takes an optional argument x which, if present, is incorporated into the generated name whilst still guaranteeing the uniqueness of the resulting name; this feature aids debugging by allowing the user to trace the origins of a fresh name. All fresh names begin with $$ , and subverting the fresh name interface by manually creating variables beginning with the same prefix leads to undefined behaviour. Lifting values When meta-programming, one often needs to take a normal Converge value (e.g. a string) and obtain its ITree equivalent: this is known as lifting a value. The lift function in the CEI module lifts built-in datatypes such as strings into their ITree equivalents. Container types such as lists are recursively converted into their ITree equivalent. Example: A compile-time version of printf This section uses the example of generating a function for a simplified printf style function. This function takes a format string such as "%s has %d %s" and returns a quasi-quoted function which takes an argument per % specifier and intermingles that argument with the main text string. For the purposes of this section, we deal only with decimal numbers %d and strings %s . This example assumes the existence of a function split_format which given a string such as "%s has %d %s" returns a list of the form [PRINTF_STRING, " has ", PRINTF_INT, " ", PRINTF_STRING] where PRINTF_STRING and PRINTF_INT are constants. The full code can be found for this example, including for split_format can be found in the examples/compile_time directory in the Converge distribution. First we define the main printf function which creates the appropriate number of parameters for the format string (of the form p0 , p1 etc.). Parameters must be created by the CEI interface. An iparam has two components: a variable, and a default value (the latter can be set to null to signify the parameter is mandatory and has no default value). printf then returns an anonymous quasi-quoted function which contains the parameters, and a spliced-in expression returned by simple_printf_expr : func simple_printf(format): split := split_format(format) params := [] i := 0 for part := split.iter(): if part == PRINTF_INT | part == PRINTF_STRING: params.append(CEI::iparam(CEI::ivar("p" + i.to_str()), null)) i += 1 return [| func ($c{params}): return $c{simple_printf_expr(split, 0)} |] Note the use of an insertion (${params}) of a list of parameters into the anonymous function. simple_printf_expr is a recursive function which turns a list such as [PRINTF_INT, ": ", PRINTF_STRING] into an expression along the lines of p0.to_str + ": " + p1 : func simple_printf_expr(split, param_i): if split.len() == 0: return [| "" |] param := CEI::ivar("p" + param_i.to_str()) if Builtins::String.conformed_by(split[0]): return [| $c{CEI::lift(split[0])} + $c{simple_printf_expr(split[1 : ], param_i)} |] elif split[0] == PRINTF_INT: return [| $c{param}.to_str() + $c{simple_printf_expr(split[1 : ], param_i + 1)} |] elif split[0] == PRINTF_STRING: return [| $c{param} + $c{simple_printf_expr(split[1 : ], param_i + 1)} |] For the input %s has %d %s the following ITree is created: func (p0, p1, p2): return p0 + " has " + p1.to_str() + " " + p2 + "" Of course, what we see here is not the ITree itself; rather we see a pretty printed concrete syntax version of the ITree. ITree's can be converted into a string representation with the CEI::itree_format function. Valid splice and insertion locations Splices and inserts can appear at various locations, with different restrictions on what they are expected to return. The valid locations, and expected return types of a splice or insert, are as follows: Location Example(s) Description Block-level expression x := ... $ f(x) A single IExpr, or list of IExpr's, can be spliced or inserted when an insert appears as a block-level expression (i.e. it is not a sub-expression). Embedded expression x := $ + 2 An IExpr can be spliced or inserted anywhere a normal expression would be expected. Top-level definitions import X $ A single IClass_Def, IFunc_Def, IAssignment, or a list containing objects of these types can be spliced at the top-level of a file (since quasi-quotes can not appear at the top-level of a file, this particular type of splicing does not have an insertion equivalent). Class name class $ (...): ... An IVar can be spliced or inserted for a class name. Class fields class C: $ A single IAssignment or IFunc_Def, or a list containing objects of these types, can be spliced or inserted when an insert appears as a class field. Function name func $ (...): ... An IVar can be spliced or inserted for a function name. Function parameter(s) func f($ ): ... A single IParam, or list of IParams, can be spliced or inserted into a functions parameters list. Rename(s) rename $ A single IRename, or list of IRenames, can be spliced or inserted into a list of renames. Module lookup Module::$ An IVar can be spliced or inserted for a definition to be looked-up in a module. Slot lookup o.$ A string can be spliced or inserted for a slot name to be looked-up in an object. Assignment target(s) $ := ... An IVar, IGet, ISlot_Lookup, or a list containing objects of these types, can be spliced or inserted for the targets of an assignment. Example: Generating an arbitrary number of functions Many libraries provide large numbers of near-identical functions to users. For example an HTML library may provide a function for each HTML element, whose job is to create a new object of the matching name. Typically creating each of these functions is an error-prone cut and paste task. With CTMP we can easily generate an arbitrary number of such functions. Using an insertion of a function name we can create quasi-quoted functions whose names are not known statically: name := "..." [| func $c{CEI::ivar(name)}(): .. |] Note that the resulting function's name will be dynamically scoped. Since we can splice lists of ITrees in at various locations, it then becomes trivial to generate multiple such functions. Let's take a simple example of a function which should generate a list of ITree functions which should print the name of the function when executed: func pfuncs(names): funcs := [] for name := names.iter(): funcs.append([| func $c{CEI::ivar(name)}(): Sys::println(${CEI::istring(name)}) |]) return funcs We can then use this function in a module as follows: $c<pfuncs(["dog", "cat", "mouse"])> func main(): dog() cat() Running this module will print out dog and cat . Src infos Src infos are Converge's way of passing around and storing information relating bytecode instructions to the source code location that generated that bytecode. The src info concept is used uniformly throughout the Converge parser, compiler, and VM. A src info is a tuple [<file name>, <char offset>] ; src infos are lists of these tuples. The special variable __SRC_INFO__ returns src infos that pinpoint the first character of the variable name. When an ITree is built with quasi-quotes, it automatically picks up the src infos of the file and location of the quasi-quote. Extra src infos can be added using the following syntax: [<expr>| ... |] Calling ITree creation functions in the CEI module automatically adds src infos relating the caller to the ITree. i.e. CEI::ix(...) is roughly equivalent to ITree::Ix.new(..., __SRC_INFOS__) . DSL blocks A DSL can be embedded into a Converge source file via a DSL block. Such a block is introduced by a variant on the splice syntax $<<...>> where expr should evaluate to a function (the DSL implementation function). Note that, as with normal splices, the default action of a DSL splice is to ensure that no variable capture can occur; both capturing DSL splices $c<<...>> and pragma DSL splices $p<<...>> are available. The DSL implementation function is called at compile-time with a string representing the DSL block, and is expected to return an AST which will replace the DSL block in the same way as a normal splice. Compile-time meta-programming is thus the mechanism which facilitates embedding DSLs. Colloquially one uses the DSL implementation function to talk about the DSL block as being `an expr block'. An example of a DSL block and DSL implementation function is as follows: func dslim(dsl_block, src_infos): return ... $<<dslim>>: ... Typically the DSL implementation uses the CEI::dsl_parse convenience function to parse the DSL block. Inserting ITrees which may capture A tricky problem is embedding an ITree which may wish to intentionally capture variables in the scope it is spliced into, not the ITree it is inserted into. If dynamically scoped variables are used within the ITree being embedded into, problems can occur - this happens frequently with DSLs. The CEI::embeddable_itree function can be used to solve this problem. It takes in an ITree, and returns a copy with all free and bound variables to fresh names; usefully it also returns a list of IRenames which can be spliced in at the outermost-embedding level to ensure that the capturing. A typical use of this is as follows: renames, embeddable_cit := CEI::embeddable_itree(cit) return [| func () { rename $c{renames} $c{embeddable_cit} }() |] Further reading Examples of compile-time meta-programming with accompanying discussion can be found in the Converge distribution.