Descent into L

elle

Open source implementation

L is a multi-paradigm (functional/imperative) programming language which emphasizes readability, simplicity, conciseness and performance. The L compiler is being actively implemented and most of the following features are already working; an initial release is planned in the next few months. This document describes the features of the L language and thebinary; for a shorter introduction to the main features of the language, click here . Note that the language is not yet frozen, so minor elements (especially syntax) in the following are subject to change; this document will be updated together with the code.

The L compiler is licensed under the Cecill-C license, a copyleft and LGPL-compatible license that allows linking without restriction.

The source code of the compiler is has been specially written to make it as readable as a book and welcome new contributors, or anyone that would like to study its internals.

A simple, familiar syntax

Great languages have not only a nice (simple and powerful) semantic, but also a good (simple and readable) syntax. However it is easy to just accumulate syntax extensions as new features are added. In contrast, there has been many iterations that led to the current syntax, leading to a more consistent (i.e. simpler and less surprising) language. Actually, the syntax is not frozen yet, and minor changes can still happen before 1.0.

Everything is an expression

The syntax looks similar than C's, except that there is no distinction between statement and expressions: everything is an expression. This makes the language simpler and more uniform. However, sequences and introduction of variables is done using brace ({}) characters, which provide the nice alignment property and view of sequential execution found in imperative languages.

{ let a = if( x == 0) 3 else { let u = 5; u*(u-1) } let b = { let y = 5 let z = 6 y*y + z*z } print( a) print( b) a + b }

Optional type annotations

In L, type annotations are entirely optional, which contributes to the lightness of the syntax. let introduces read-only bindings (preferred whenever possible), and var mutable ones. L preferred style is to use let bindings whenever possible (in particular to hold intermediary results), and use var only when mutation is necessary. The use of mutable variables makes code more difficult to understand, in particular when reusing a variable for different purposes; limiting mutable variables and marking them explicitly encourages a better style.

{ var x := 0 var y:Int := 5 // Explicit declaration: no runtime difference if (a == 3) { x := x + y } x } // Equivalent program without mutable variables { let x = 0 let y:Int = 5 // Explicit declaration: no runtime difference let x_possibly_incremented = if (a == 3) x else x+y x_possibly_incremented }

Statement separation

Delimiting statements in C-like languages is necessary to disambiguate the syntax; for instance it allows to differentiate f(x) - f(y) from f(x); -f(y) , or f(x+3) from f; (x+3) . Often, ';' is used as a statement separator or terminator.

Instead, L relies on the idea of token separation to disambiguate such cases. There are four degrees of separation, that depend on the characters between two tokens; namely (from stronger to weaker):

Explicit , when the characters between two tokens contain a ';'

, when the characters between two tokens contain a ';' Newline , when the characters between two tokens contain a '

' and no ';'

, when the characters between two tokens contain a '

' and no ';' Whitespace , when there is a non-empty number of characters between two tokens, that contain neither ';' nor '

'

, when there is a non-empty number of characters between two tokens, that contain neither ';' nor '

' Stuck, when there are no characters between the tokens

Statements should be separated by newlines, unless everything fits on one line, in which case they should be separated by ';'.

When in need to split a statement on several lines, you should split after infix operators. I.e. a+

b contains one statement, while a

+b contains two.

contains one statement, while contains two. One exception is function calls, which are denoted only by a stuck open '(' or '{'. f(x) is f calling x ; f (x) does not mean anything; f

(x) and f;(x) is f followed by x . Forcing stuck calls enforce a single indentation style for all L programs, and help identifying calls visually.

is calling ; does not mean anything; and is followed by . Forcing stuck calls enforce a single indentation style for all L programs, and help identifying calls visually. The other exception is the . operator for reverse function application (and advanced namespace resolution), which can be separated by a newline (but not by a ';'). However, it must be stuck to the following token. The reason for this is that it allows chaining calls: f(a, computation) .another_one(a_param) .final_computation

Vertical alignment and indentation

The fact that a token is part of a statement or of the following statement depends on this separation level. Generally, tokens separated by '

' or ';' are part of the next statement, while other tokens are part of the current statement. Sometimes, the same token with different separation levels before and/or after can even be viewed as if they were two different tokens (they have different syntax, meaning, and priorities). While these rules may seem complex when enunciated, they are actually quite natural and generic. Generally, it boils down to:

L syntax contributes to maintaining a correct alignment and indentation. When performing pattern matching, the start of a match is identified using indentation. Nested function calls are aligned, and provide visually a tree structure. Blocks, introduced with {}, allows to maintain the same level of indentation when writing a sequential computation.

{ let f = { 0 -> let a = 3 let b = 5 a + b 1 -> function1( a_function( 4, 5) another_one( an_argument, another_arg)) n -> 8 + n } f(4) }

Algebraic datatypes and pattern matching

Pattern matching

Pattern matching is a construct that allows to extract several components of an object at once. For instance:

{ let x = (1,2,3) let (a,_,c) = x a + c // 4 }

x

Thevariable contains a tuple with three elements. Later, we use pattern matching to extract the first and third component out of the tuple. This code would be even more useful if the tuple construction and pattern matching were more separated, e.g. were in different functions.

Patterns appear in many different places in the code. As we saw, The let construct introduces a pattern (note: the var construct does not). But the arguments to a function are also a tuple pattern:

def f(a,(b,c),d) = a + b + c + d { let u = (1,2) f(1,u,3) }

Algebraic datatypes

You can define your own datatypes (record and sum types), and match them in patterns. Here is an example of a simple record datatype. Note: the x: and y: parts are optional, but they allow the use of the p.x notation to extract an individual component.

data Point(x:Int, y:Int) { let p = Point(1,2) let Point(a,b) = p a+b // 3 }

A more complex form allows to define sum types, a generalized form of enum that can carry data. Here is a simple example of the use of sum types to define the type of mathematical expressions, and a small calculator written in L:

data Expression { Number(Int) Add(Expression,Expression) Sub(Expression,Expression) Mul(Expression,Expression) Div(Expression,Expression) } def eval(e) = e.{ Number(n) -> n Add(e1,e2) -> eval(e1) + eval(e2) Sub(e1,e2) -> eval(e1) - eval(e2) Mul(e1,e2) -> eval(e1) * eval(e2) Div(e1,e2) -> eval(e1) / eval(e2) } eval( Add( 3, Mul( 2, 4))) // 11

The expression.{ pattern_1 -> body_1 ... pattern_n -> body_n } construct matches each pattern against the given expression in turn, selects the first that matches, assigns the variables to the corresponding components in the expression, and executes the corresponding body. This construct allows to eliminate a lot of boilerplate code.

The form data Point(Int,Int) is actually a shorthand for the case where the sum type has only one element: data Point { Point(Int,Int) }

Sum types are ubiquitous when defining complex data structures; but it is often implicit. Its most common form in many other languages is the use of the special "null" pointer (this is Tony Hoare's billion dollar mistake, avoided by the L language.

Parametric polymorphism

With L one can write data structures and algorithms which are parametrically polymorphic, i.e. that work for any type. Note that the length and append functions below are polymorphic (they work on both lists of Int and Bool), without requiring any annotation to make them so.

data List<t> = { Cons(head:t, tail:List<t>) Nil } def length = { Cons( _, tail) -> 1+length(tail) Nil -> 0 } def list_int = Cons(11, Cons(22, Cons(33, Nil))) list_int.length // => 3 def list_bool = Cons(true, Cons(false, Nil)) lits_bool.length // => 2 def append(l1,l2) = l1.{ Nil -> l2 Cons(x,tail) = Cons(x, tail.append( l2)) } append( list_int, Cons(44, Nil)) // Cons(11, Cons(22, Cons(33, Cons( 44, Nil)))) append( Cons( true, Cons( true, Nil)), list_bool) // Cons( true, Cons( true, Cons( true, Cons( false, Nil))))

First-class functions

Syntax for creating functions

In L, functions are first-class citizens, meaning that they can be passed around at will. Functions can be created simply using the { arg1,arg2,...,argn -> body} syntax.

{ let f = { a,b -> { x -> a * x + b }} let g = f(2,3) g(1) // 5 }

Observe that multiple arguments are passed as tuples, instead of being curried as in most functional languages. This allows to determine more easily when closures are created.

L features a special syntax for creating simple functions. Blocks that contain an '_' character are transformed into functions, each _ being replaced with an argument. This way of performing partial application avoids hard-to-find type errors arising when forgetting some of the arguments of a curried function.

def plus = {_+_} // equivalent to { a,b -> a + b } def three_times = { 3 * _ } // equivalent to { x -> 3 * x }

Functions argument are actually patterns and can be matched directly. Actually, the arg.{ pattern1 -> body1; ... patternn -> bodyn syntax presented above is a special case of function pattern matching, where arg is matched by applying it to a function.

def fact = { 0 -> 1 n -> n * fact(n -1) } fact(5) // 120

Examples of downward and upward passing of functions

Passing functions downward is perfect for parameterizing an algorithm, like the traversal of a data structure or a sorting function.

// Splits l according to the predicate p. Stable. def partition(l:List<t>, p:t->Bool) = l.{ Nil -> Nil Cons(head,tail) -> let (part_true, part_false) = partition(tail,p) if p(head) (Cons(head,part_true), part_false) else (part_true, Cons(head, part_false)) } // Sorts l. def quicksort(l, greater?:t->t->Bool) = l.{ Nil -> NIl Cons(x,tail) -> let (great_ones,small_ones) = tail.partition{ _.greater?(x) } small_ones.quicksort(greater?).append( Cons( x, great_ones.quicksort(greater?))) } // Sort a list of integers using reversed order Cons( 2, Cons( 3, Cons( 1, Nil))).quicksort{ x,y -> x <= y } // => Cons( 3, Cons( 2, Cons( 1, Nil))) // Sort a list of pairs of integers using lexical order Cons( (2,2), Cons( (3,1), Cons( (2,1), Nil))).quicksort{ (x1,x2),(y1,y2) -> x1 > x2 || x1 = x2 && y1 >= y2 } // => Cons( (2,1), Cons( (2,2), Cons( (3,1), Nil)))

def id? = Regexp.compile("[A-Za-z_]+") // => id?: String -> Bool "hello".id? // => true "23".id? // => false

Static typing with complete type inference

Passing functions upward also has many uses. One is dynamic compilation, e.g. dynamically creating a function that matches a regular expression.

L is statically typed, which means that many simple errors are detected at compile time.

def applyn( f, n, x) = if( n == 0) x else f( applyn( f,n-1, x)) applyn( 3, { _ + 2 }, 10) // => Type error: expected t -> t, got Int applyn( { _ + 2 }, 3, 10) // => 16 applyn( { _ * 2 }, 10., 3) // => Type error: exected Int, got Float applyn( { _ * 2 }, 3, 10.) // => 80. append( list_int, list_bool) // => Type error: expected List<Int>, got List<Bool>

In addition, static typing guarantees that the program executes safely, e.g. it will not cause an unexpected segmentation fault; without causing the overhead of the runtime checks done in a dynamically typed language.

L features complete type inference, which means that the whole program can be written without a single type annotation. However, you can write type annotations if you want to.

def applyn( f:t -> t, n:Int, x:t) -> t = if( n == 0) x else f( applyn( f,n-1, x)) applyn( { _ + 2 }, 3, 10:Int) // => 16 applyn( { _ * 2 }:Float -> Float, 3, 10.) // => 80

Macros and syntax extensions

This combination combines the benefits of a lightweight syntax, as found in dynamic languages, with the strong error detection and efficiency of statically typed languages.

L provides several extension points. First, L offers a way to dynamically extend the concrete syntax of the language. Second, Lisp-like macros can perform changes in the parse tree. These extension points allow to add domain specific languages to L, for instance to mix XML and L, SQL and L, adding a nice syntax for the active record pattern...

One of the examples of the standard library where these macro extensions are used is the print macro. print is followed by a list of expressions between braces. This makes the printing code much more readable (as compared to, for instance, C-like format strings).

print{ "Hello, " user.name "!

" "Today you are " user.age + 1 ".

" }

Interpretation and native compilation to LLVM

L features an interactive toplevel (or REPL) that you can use to try the language or debug a program.

$ elle interactive Welcome to L! > 4+5 9:Int

L can be used as an interpreter:

#!/usr/bin/env elle run print{ "Hello, world!

"}

For more substantive development comprising several files, each file has to be compiled individually. The current supported workflow is to compile to a LLVM bitcode file, and then link the different modules with the LLVM linker. In the future L will support several compilation options, which allows choosing between faster compile time (during development) or faster resulting program (for production).

$ elle compile -to-llvm program.l -o program.bc $ file program.bc program.bc: LLVM bitcode $ clang -O2 -c -emit-llvm other.c -o other.bc $ llvm-ld program.bc other.bc -native program.run $ elle compile program.l -o program.run $ file program.run program.run: ELF 32-bit LSB executable, Intel 80386, ...

Mutable variables and imperative features

L is a functional programming language, but does not discourage the use of imperative features: in-place update is sometimes the way to write the most efficient algorithm.

L features mutable variables, introduced by the var keyword. Assignment is done using the := assignment operator. The only restriction regarding mutable variables is that they cannot hold a polymorphic value (this would be unsound). L preferred style is to avoid mutable variables when they can be replaced by immutable ones (introduced by let ), because they are harder to optimize (and can lead to less efficient code), and their use can make code harder to understand.

def fact(n) = { var i := n var result := 1 while(i < n){ result := result * i; i := i+1 } result }

Note that variables must be given an initial value; in the future we may assign variables to the undef special value, provided we can prove that the variable is not read before being assigned.

To accommodate mix functional/imperative programming, L is a strict language with a completely defined evaluation order; i.e. arguments are evaluated from left to right before calling the function.

(f(3))(g(4),h(5)) // equivalent to: { let a = f(3); let b = g(4); let c = h(5); a(b,c) }

Delayed and lazy executions are possible, by making them explicit.

{ let delayed = { () -> print{ "Delayed hello world!"}} print{ "Before delayed "} delayed() } // Displays: Before delayed Delayed hello world!

Garbage collection

Maximizing stack allocations, and guaranteeing that some allocations are made on the stack

Using programming techniques that avoid creating intermediary objects, such as using streams (lazy lists) instead of lists.

Using analyses and transformations to avoid creating intermediary objects, such as linear typing, deforestation and storage reuse

Allowing the programmer to use explicit memory or region management where it makes sense

Guaranteed optimizations

L can free the programmer of the burden of explicitly allocating and deallocating objects. When doing general-purpose programming, this results in a productivity boost. The L garbage collector is not generational, so as not to slow down in-place updates. L will encourage a programming style that avoids creating too many short-lived objects, by:When memory management is important for the application, such as in system, real-time, or high-performance applications, L allows fine control over memory allocation, by guaranteeing that some allocations are done on the stack, using explicit region management, or using manual (unsafe) memory allocation.

What makes an optimization guaranteed is just the fact that it appears in the L standard, not just in some L compiler (The L compiler will also perform optimizations that will not be guaranteed). This may seem to be a tiny difference. In fact, being in the standard means that the code can rely on the optimization being performed for whatever version of the compiler being used, or even for a different implementation of L.

One use is to write modular code that may look inefficient as is, but that you know the compiler will make efficient by the sole use of the guaranteed optimizations. For instance, you can write code such as:

def an_option = true { var res := 0 [0..1000].each{ i -> [0..800].each{ j -> if( an_option) res := res + i*j else res := res + i+j } } }

if

This code is logically clean, because the choice introduced byis close to the different alternatives. But as-is, it is horribly inefficient because the test is done inside the loop. But a guaranteed optimization says that conditionals to constants are removed away. Thanks to guaranteed optimization, you know that every compiler will remove the test, so you can leave the code as it is knowing that this will never cause a performance issue. Actually, this guaranteed optimization remove the need for preprocessing conditional in the language.

Another use of guaranteed optimization allows to control memory allocation. By guaranteeing that some allocations are always made on the stack, and by writing the functions carefully (and checking that all allocation is indeed done on the stack), you can writing real-time code, system code such as a garbage collector, or performance-critical code, using L's high-level language and constructs; instead of rewriting code in C. A guaranteed optimization that falls into this category is the classical tail-call optimization, that allows to bound stack allocation when functions are carefully written.

The list of guaranteed optimizations is not definite yet, and will grew as the need arise and the optimizations themselves are implemented in the prototype compiler. But they will comprehend at least:

Inter-function tail-call optimization (tail calls are transformed into jumps, so that mutually recursive functions do not consume all the stack

That it will apply iteratively apply a set of syntactic rewriting rules (until no rewriting rule has been applied). These rules include contification ((transformation of inner functions into simple goto s), inlining of functions used once, simplification of if( constant)

s), inlining of functions used once, simplification of That some allocations are performed or the stack (more precisely: not in the heap)

And more

Module and typeclass system, type-directed name resolution, and many other features are planned!