Coke programming guide $Id: coke.html.in 545 2006-11-06 18:04:41Z piumarta $

corresponds to examples/jolt in the idst-5.4 release

1 Introduction

2 Compiling and installing

3 Syntax

3.1 Numbers

3.2 Strings

3.3 Identifiers

3.4 Lists

3.5 Comments

3.6 Artificial sweeteners

4 Semantics

4.1 Evaluation rules

4.2 Intrinsic operators

4.3 Special forms

5 Primitives

6 Abstract syntax definitions

6.1 Compiler back-end interfaces

7 Resources



1 Introduction

examples/jolt

2 Compiling and installing

make

examples/jolt

make run

./main

3 Syntax

3.1 Numbers

Floating-point numbers consist of an optional sign, one or more digits, a decimal point, zero or more digits, and an optional exponent consisting of a single letter ' e ' and an integer. Only decimal floating-point numbers are supported.

Note: Jolt does not currently support floating-point numbers.

3.2 Strings

"

\a 'bel' (ASCII code 7) \b backspace (ASCII code 8) \e escape (ASCII code 27) \f form feed (ASCII code 12)

newline (ASCII code 10) \r carriage return (ASCII code 13) \t horizontal tab (ASCII code 9) \v vertical tab (ASCII code 11)

nn octal character nnn (ASCII code 0nnn)

3.3 Identifiers

!#$%&*./:<=>?@\^_|~+-

3.4 Lists

(

)

3.5 Comments

3.6 Artificial sweeteners

'element (single quote) is equivalent to (quote element) `element (backquote) is equivalent to (quasiquote element) ,element (comma) is equivalent to (unquote element) ,@element (comma followed by commercial at) is equivalent to (unquote-splicing element)

[ receiver selector [arguments...] ]

( send (quote selector ) receiver [arguments...] )

send

send

[...]

For keyword selectors one more feature is provided:

[ receiver keyword1: argument1 keyword2: argument2 ... ]

( send (quote keyword1:keyword2:... ) receiver argument1 argument2 ... )

keyword

4 Semantics

4.1 Evaluation rules

Strings evaluate to a primitive (non-object) pointer to an array of characters terminated with a nul character.

Identifiers always represent an r-value (some physical location associated with the identifier during the execution of the code in which the identifier appears). When they appear outside special syntactic forms they evaluate to the associated l-value -- the value stored in the physical location associated with the identifier.

Lists (that are not special syntactic or semantic forms, as described below) represent the application of a function value to zero or more argument values. The form

( function [arguments...] )

4.2 Intrinsic operators

Certain built-in operators behave like functions but are not evaluated as identifiers. (Their arguments are evaluated exactly as descried above.)

The intrinsic arithmetic and relational operators are as follows:

( + args... ) addition of two or more arguments ( - arg ) negation of a single argument ( - args... ) subtraction of two or more arguments ( * args... ) multiplication of two or more arguments ( / args... ) division of two or more arguments ( % args... ) remainder after division of two or more arguments ( < args... ) non-zero iff first argument less than second argument <= >= > ... idem for other (in)equalities ( == args... ) non-zero iff arguments identical ( != args... ) non-zero iff arguments not identical

Pointers behave like (long) integers. Pointer dereferencing is provided for the traditional machine types:

( char@ address [index] )

( short@ address [index] )

( int@ address [index] )

( long@ address [index] )

( set-char@ address [index] value )

( set-short@ address [index] value )

( set-int@ address [index] value )

( set-long@ address [index] value )

set

4.3 Special forms

( and args... ) evaluates each of the args from left to right until one of them yields zero; args to the right of the first zero value are not evaluated. The result of the overall form is non-zero iff all args yield non-zero. ( or args... ) evaluates each of the args from left to right until one of them yields non-zero; args to the right of the first non-zero value are not evaluated. The result of the overall form is zero iff all args yield zero. ( if condition consequent [alternate] ) evaluates the condition and then either the consequent (if the condition was non-zero) or the alternate (if present). The result of the overall form is the result of whichever of consquent and alternate is evaluated. If alternate is not present and condition is zero then the overall result is zero. ( while condition [args...] ) while the condition evaluates to non-zero, all the args are evaluated in order, left to right. This process continues until condition evaluates to zero, causing the overall form to yield zero. ( let ( [bindings...] ) [args...] ) evaluates each of the args in order, left to right, in an environment augmented by the result of evaluating the bindings. Each binding has the form ( identifier value ) where value is evaluated according to the usual rules before being bound to identifier during the evaluation of the args. The identifiers are added to the environment before evaluating and binding their values (which is a bug and will be fixed sometime in the near future). The overall result is the value of the last of the args; if no args are present then the overall value is zero. ( set identifier value ) evaluates value and then assigns it to the location associated with the r-value identifier. ( set ( identifier args... ) value ) does not cause any evaluation. It simply rewrites itself in the form ( set-identifier args... value ) and re-evaluates the resulting form. This permits any field or slot 'getter' to be converted into a corresponding 'setter' by enclosing the getter expression in a set form. For example, the expression ( set ( long@ address ) 42 ) is rewritten as ( set-long@ address 42 ) by the compiler and then immediately re-evaluated to effect the assignment. ( lambda ( [formal...] ) [expr...] ) evaluates to the address of a function accepting zero or more formal arguments, where each formal must be an identifier. When applied, the function value of a lambda expression evaluates each of its exprs in order, from left to right, in an environment in which each formal names a variable bound to a corresponding actual argument supplied in the function application. The value of the applied function is the value of the final expr; if no expr is present then the function returns zero. Note: The evaluation of a lambda yields a function that can be applied; the exprs are not evaluated until the function result of lambda is applied to zero or more actual arguments as explained for the evaluation of lists above. A given function value can be applied any number of times and obeys the local platform's C ABI conventions; it can therefore be passed as a callback 'function pointer' to statically-compiled library and OS routines. ( return value ) appearing in the body lambda function evaluates value and then immediately returns control to the function's caller yielding value as the result of the function application. ( define identifier value ) evaluates value, creates a variable binding for identifier in the current scope, and assigns the value to the variable. If the form appears as a top-level expression then the binding is created in the current global environment. If the form appears within a let or lambda expression then the binding is created as a local (automatic) variable within that expression. ( syntax identifier function ) is similar to define except that the function value must evaluate to an applicable function address. The identifier names a new abstract syntax tree node type with the function providing the compilation rules for any node of the newly-created type. (The scoping rules for the binding are identical to those of define ). See the section below on abstract syntax definitions for further explanation of the function argument. ( quote element ) evaluates to an object corresponding to the compiler's internal representation for element. (This object can be incorporated into abstract syntax trees passed to the compiler for evaluation and compilation.) ( send message receiver [args...] ) evaluates message, receiver and args (if present). The message must evaluate to an identifier and receiver to a compiler object. The message identifier is then sent as a message to the receiver object along with any args supplied. The overall value of the form is the (object) answered by the message send.

5 Primitives

( _dlsym handle symbol )

Which namespaces are searched depends on the handle parameter. If handle is zero then every namespace associated with the process and its dynamic libraries is searched in the order they were loaded. If handle is associated with a library opened with the library function dlopen then only that library (and any libraries it depends on) are searched for the symbol.

The _dlsym primitive returns zero if symbol cannot be found, and sets an error condition which may be queried with the dlerror library function.

Note: the library functions dlopen and dlerror must be imported explicitly. For example: (define _dlopen (_dlsym 0 "dlopen")) (define _dlerror (_dlsym 0 "dlerror")) (define printf (_dlsym 0 "printf")) (define exit (_dlsym 0 "exit")) (syntax begin (lambda (form) `(let () ,@[form copyFrom: '1]))) (define dlopen (lambda (name) (let ((handle (_dlopen name 0))) (if handle handle (begin (printf "Error: dlopen: %s

" (dlerror)) (exit 1))))))

6 Abstract syntax definitions

syntax

(syntax type (lambda (node compiler) action...))

The actions are responsible for compiling the node. Three mechanisms are available to the actions:

They can return an object (or more usually an object structure). This object is recompiled in place of the original node. For example: (syntax begin (lambda (node compiler) `(let () ,@[node copyFrom: '1]))) defines a begin node type providing 'compound expressions' that evaluate several expressions in order and yield the value of their final expression. The node is rewritten as a let and returned to the compiler for re-evaluation. The node is sent copyFrom: to extract all but its first element (removing the begin from the start of the node) and the ,@ splices the result into a quasiquoted let expression. The argument to copyFrom: must be an object (not a primitive value) hence the quoted integer.

defines a node type providing 'compound expressions' that evaluate several expressions in order and yield the value of their final expression. The node is rewritten as a and returned to the compiler for re-evaluation. They can ask any part of the node's substructure to compile itself normally, by sending it the message [ anObject compile: compiler] (where compiler is the second argument passed to the syntax function).

(where is the second argument passed to the syntax function). They can emit low-level machine instructions via the compiler's back-end. The interfaces to the back-ends are described in the next section.

6.1 Compiler back-end interfaces

compiler

vpu

(syntax foo (lambda (node compiler) (let ((vpu [compiler vpu])) ...)))

An arithmetic stack holds intermediate results of computations and arguments for function calls. All operators pop their argument(s) from this stack and push their result onto this stack.

holds intermediate results of computations and arguments for function calls. All operators pop their argument(s) from this stack and push their result onto this stack. A completely seperate variable stack holds the state of local variables. Local variables are created and destroyed on the top of this stack, and are referred to by integer index relative to the topmost variable.

holds the state of local variables. Local variables are created and destroyed on the top of this stack, and are referred to by integer index relative to the topmost variable. Similarly, a completely seperate label stack holds local labels that can be used as targets for branch instructions. Labels are created and destroyed on the top of this stack, and are referred to by integer index relative to the topmost label.

Management [ vpu new ] creates a new instance of a vpu. (This is normally not of interest, since the vpu passed to the syntax function indirectly through the compiler argument will be the correct instance to use for code generation.) [ vpu iTmp: numTemps ] [ vpu iTmp ] pushes numTemps integer-valued temporary variables onto the variable stack. In the second form, the missing numTemps is assumed to be one. [ vpu dropTmp: numTemps ] removes the topmost numTemps variables from the variable stack. [ vpu begin: numLabels ] pushes numLabels local labels onto the label stack. [ vpu end: numLabels ] removes the topmost numLabels local labels from the label stack. [ vpu define: n ] defines the nth local label (relative to the topmost) in the label stack to refer to the current 'program counter' location. A local label must not be defined more than once. A local label can be left undefined provided it is never used as the target of a branch instruction. Function prologue and epilogue [ vpu iEnter ] declares the prologue for an integer-valued function. This must be the first (non label defining) instruction in a function. [ vpu iArg: numArgs ] [ vpu iArg ] declares numArgs integer-valued arguments. Arguments must be declared immediately after the iEnter prologue and before any other instructions in the function body. In the second form, the missing numArgs is assumed to be one. [ vpu ret ] returns the topmost item on the stack as the result of the function being compiled. A ret always pops one item from the stack (the value returned), can occur any number of times in the body of a function, but must also be the last instruction in the function. Data movement [ vpu ldInt: integer ] pushes integer (which must be an integer object value) onto the arithmetic stack. [ vpu ldPtr: object ] pushes the address of the given object onto the arithmetic stack. [ vpu ldArg: n ] pushes the nth argument of the function being compiled onto the arithmetic stack. Arguments are numbered from zero. [ vpu stArg: n ] stores the topmost item in the arithmetic stack into the nth argument of the function being compiled. The item is not removed from the stack. [ vpu ldTmp: n ] pushes the contents of the nth local variable (relative to the top of the variable stack) onto the arithmetic stack. [ vpu stTmp: n ] stores the topmost item in the arithmetic stack into the nth local variable. The item is not removed from the stack. [ vpu dup: n ] [ vpu dup ] pushes a copy of the nth item in the arithmetic stack (relative to the top of the stack before the value is pushed) onto the top of the arithmetic stack. In the second form, n is implicitly zero (i.e., the topmost item in the stack is duplicated). [ vpu drop: n ] [ vpu drop ] pops the topmost n items off the arithmetic stack. In the second form, n is implicitly one. Arithmetic and relational operators [ vpu add ] [ vpu sub ] [ vpu mul ] [ vpu div ] [ vpu mod ] replaces the topmost two items on the arithmetic stack with their sum, difference, product, quotient or remainder (respectively). [ vpu lt ] [ vpu le ] [ vpu eq ] [ vpu ne ] [ vpu ge ] [ vpu gt ] replaces the topmost two items on the arithmetic stack with zero or one according to whether the topmost item is less than, less or equal to, equal to, not equal to, greater or equal to, or greater than (respectively) the second item. Memory access [ vpu rdb ] [ vpu rdh ] [ vpu rdw ] replaces the topmost item on the arithmetic stack with the byte, half word or word (respectively) at that address in memory. [ vpu wrb ] [ vpu wrh ] [ vpu wrw ] writes the second item on the arithmetic stack into the memory byte, half word or word (respectively) addressed by the topmost item in the stack. The topmost item is then popped from the arithmetic stack. Control flow [ vpu br: n ] transfers program control to the instruction immediately following the point of definition of the nth local label (relative to the top of the local label stack). [ vpu bf: n ] [ vpu bt: n ] pops the topmost item off the arithmetic stack and conditionally transfers program control to the instruction immediately following the point of definition of the nth local label (relative to the top of the local label stack). The bf instruction transfers control to the label's address only if the topmost item popped from the stack was zero. The bt instruction transfers control to the label's address only if the topmost item popped from the stack was non-zero. [ vpu iCall: n label: label] [ vpu iCalli: n ] pops n actual arguments from the arithmetic stack and calls an integer-valued function, leaving the result of the function call on the top of the arithmetic stack. In the first form the address to call is retrieved from a global label (see the section on global labels below). In the second form the address to call is popped off the arithmetic stack before the arguments (i.e., the arguments are pushed followed by the address to call in preparation for an iCalli instruction). Arguments are pushed in 'reverse' order, starting with the last and working towards the first. Compilation and miscellaneous information [ vpu compile ] compiles an entire function, as described by the sequence of message sends previously sent to the receiver, into native code. This message must never be sent more than once to a given instance of the vpu. [ vpu codeSize ] answers an integer object representing the total number of native bytes compiled by any vpu in the current process since the process began executing. [ vpu versionString ] answers a string object containing a human-readable version string for the vpu library linked into the running process.

6.1.1 Global labels

compile

global labels implement a family of value methods that call their associated native code address with zero or more arguments; and

methods that call their associated native code address with zero or more arguments; and global labels can be passed as the label argument in an iCall:label: instruction to provide the destination address for the function call; and

argument in an instruction to provide the destination address for the function call; and global labels can be defined by looking up an external native code address in the running process, giving access to arbitray library and operating system functions.

VPULabel

[ VPULabel new ] answers a new, uninitialised global label. [ vpuLabel lookup: symbol] defines the receiver as the address associated with the code or data object named by symbol (a string object) looked up in the process global namespace via _dlsym (see the section on primitives). [ vpu defineLabel: label] defines the given global label to refer to the address of the next instruction generated in the receiver.

iCall:label:

[ vpuLabel _labelAddress ] answers a primitive pointer equal to the defined address of the receiver. [ vpuLabel value ] answers the result of calling the defined address of the receiver as a function with no arguments. [ vpuLabel value: argument ] answers the result of calling the defined address of the receiver as a function with a single argument.

7 Resources