🔗 1 . Getting started

Welcome to Koka. This manual provides an overview and formal specification of the language. For more background information, see:

The library documentation.

The Koka research page and the slides of a talk presented Lang.Next (April 2012).

The source code of the Koka compiler.

The article Algebraic Effects for Functional Programming 2 ]

An article about the type system and semantics of Koka 1 ]

🔗 1.1 . Installing the compiler

At this point there are no binary releases of Koka and you need to build the compiler yourself. Fortunately, Koka has few dependencies and should build without problems on most common platforms, e.g. Windows (including WSL), macOS X, and Unix.

The following programs are required to build Koka:

Stack to run the Haskell compiler.

(use > curl -sSL https://get.haskellstack.org/ | sh on Unix and macOS X)

(use on Unix and macOS X) CMake to compile the generated C files.

(use > sudo apt-get install cmake on Ubuntu, > brew install cmake on macOS X).

(use on Ubuntu, on macOS X). Optional: The Ninja build system for faster build times.

(required on Windows, use > sudo apt-get install ninja-build on Ubuntu, > brew install ninja on macOS X).

(required on Windows, use on Ubuntu, on macOS X). Optional: the NodeJS runtime if using the Javascript backend.

Building Koka (note the --recursive flag):

> git clone --recursive https://github.com/koka-lang/koka > cd koka > stack build

You can also use stack build --fast to build a debug version of the compiler. You can invoke the compiler now as: (this takes a while as it needs to build the core libraries as well)

> stack exec koka -- -c test/algeff/common compile: test/algeff/common.kk loading: std/core loading: std/core/types loading: std/core/hnd check : test/algeff/common cmake --build "out\Debug\cbuild" --target test_algeff_common ... [5/5] Linking C executable test_algeff_common.exe compiled: out\Debug\test_algeff_common.exe

and run the resulting executable:

> out\Debug\test_algeff_common.exe 42 Hello there, there hi hi 1 2 [False,True,True,False] ([False,False,True,True,False],2)

If you leave out the -c flag, Koka will execute the compiled program automatically. The -O2 flag builds an optimized program. Let's try it on a functional implementation of balanced insertion in a red-black tree balanced ( rbtree.kk )

> stack exec koka -- -O2 -c test/bench/koka/rbtree32.kk ... cmake --build "out/RelWithDebInfo/cbuild" --target test_bench_koka_rbtree32 [15/15] Linking C executable test_bench_koka_rbtree32 compiled: out/RelWithDebInfo/test_bench_koka_rbtree32 > time out/RelWithDebInfo/test_bench_koka_rbtree32 420000 real 0m1.132s

We can compare this against an in-place updating C++ implementation using stl::map ( rbtree.cpp ) (which uses the GNU RBTree implementation internally):

> g++ --std=c++17 -o cpp_rbtree -O3 test/bench/cpp/rbtree.cpp > time ./cpp_rbtree 420000 real 0m1.096s ...

The close performance to C++ here is a result of Perceus automatically tranforming the fast path of the pure functional rebalancing to use mostly in-place updates, closely mimicking the imperative rebalancing code of the hand optimized C++ library.

Without giving any input files, the interpreter runs by default:

> stack exec koka

The Atom text editor is recommended to edit Koka programs. You can install support for Koka programs using

> jake atom

(or use jake sublime ) for the Sublime editor). If node is not installed, you can also copy the grammar files manually from the support/atom directory to ~/.atom/packages/language-koka .

🔗 1.2 . Running the interactive compiler

After running the plain stack exec koka command, the Koka interactive environment will start:

_ _ ____ | | | | |__ \ | | __ ___ | | __ __ _ __) | | |/ // _ \| |/ // _` || ___/ welcome to the koka interpreter | <| (_) | <| (_| ||____| version 2.0.0-alpha, Aug 23 2020, libc 64-bit |_|\_\\___/|_|\_\\__,_| type :? for help loading: std/core loading: std/core/types loading: std/core/hnd >

Now you can test some expressions:

> println("hi koka") loading: std/core loading: std/core/types loading: std/core/hnd check : interactive cmake --build "out\Debug\cbuild" --target interactive [2/2] Linking C executable interactive.exe compiled: out\Debug\interactive.exe hi koka > :t "hi" string > :t println("hi") console ()

Or load a demo:

> :l test/medium/fibonacci compile: test/medium/fibonacci.kk loading: std/core loading: std/core/types loading: std/core/hnd check : test/medium/fibonacci modules: test/medium/fibonacci > main() cmake --build "out/Debug/cbuild" --target interactive [2/2] Linking C executable interactive compiled: out/Debug/interactive The 10000th fibonacci number is 33644764876431783266621612005107543310302148460680063906564769974680081442166662368155595513633734025582065332680836159373734790483865268263040892463056431887354544369559827491606602099884183933864652731300088830269235673613135117579297437854413752130520504347701602264758318906527890855154366159582987279682987510631200575428783453215515103870818298969791613127856265033195487140214287532698187962046936097879900350962302291026368131493195275630227837628441540360584402572114334961180023091208287046088923962328835461505776583271252546093591128203925285393434620904245248929403901706233888991085841065183173360437470737908552631764325733993712871937587746897479926305837065742830161637408969178426378624212835258112820516370298089332099905707920064367426202389783111470054074998459250360633560933883831923386783056136435351892133279732908133732642652633989763922723407882928177953580570993691049175470808931841056146322338217465637321248226383092103297701648054726243842374862411453093812206564914032751086643394517512161526545361333111314042436854805106765843493523836959653428071768775328348234345557366719731392746273629108210679280784718035329131176778924659089938635459327894523777674406192240337638674004021330343297496902028328145933418826817683893072003634795623117103101291953169794607632737589253530772552375943788434504067715555779056450443016640119462580972216729758615026968443146952034614932291105970676243268515992834709891284706740862008587135016260312071903172086094081298321581077282076353186624611278245537208532365305775956430072517744315051539600905168603220349163222640885248852433158051534849622434848299380905070483482449327453732624567755879089187190803662058009594743150052402532709746995318770724376825907419939632265984147498193609285223945039707165443156421328157688908058783183404917434556270520223564846495196112460268313970975069382648706613264507665074611512677522748621598642530711298441182622661057163515069260029861704945425047491378115154139941550671256271197133252763631939606902895650288268608362241082050562430701794976171121233066073310059947366875

And quit the interpreter:

> :q Before the effect one believes in different causes than one does after the effect. -- Friedrich Nietzsche

🔗 1.3 . Algebraic effect handlers

A novel feature of Koka is a compiled and typed implementation of algebraic effect handlers (described in detail in [3]). In the interactive environment, you can load various demo files with algebraic effects which are located in the test/algeff directory.

> :f test/algeff/common

where :f forces a recompile (versus :l which avoids a recompile if possible). Use the :? command to get an overview of all commands. After loading the common demo, we can run it directly from the interpreter:

> :f test/algeff/common loading: test/algeff/common loading: std/core loading: std/core/types loading: std/core/hnd modules: test/algeff/common > :t test2 () -> console () > test2() loading: std/core loading: std/core/types loading: std/core/hnd loading: test/algeff/common check : interactive cmake --build "out/Debug/cbuild" --target interactive [2/2] Linking C executable interactive compiled: out/Debug/interactive Hello there, there

Some interesting demos are:

common.kk : Various examples from the paper “Algebraic Effects for Functional Programming” [3]. Shows how to implement common control-flow abstractions like exceptions, state, iterators, ambiguity, and asynchronous programming.

nim.kk : Various examples from the paper “Liberating effects with rows and handlers” [1].

🔗 2 . An overview of Koka

This is a short introduction to the Koka programming language meant for programmers familiar with languages like C++, C#, or JavaScript.

Koka is a function-oriented language that separates pure values from side-effecting computations (The word ‘koka’ (or 効果) means “effect” or “effective” in Japanese). Koka is also flexible and fun : Koka has many features that help programmers to easily change their data types and code organization correctly even in large-scale programs, while having a small strongly-typed language core with a familiar JavaScript like syntax.

🔗 2.1 . Hello world

As usual, we start with the familiar Hello world program:

Koka uses familiar curly-braces syntax where starts a line comment. Functions are declared using the fun keyword.

If you are reading this on Rise4Fun, you can click the load in editor button in the upper right corner of the example to load it into the editor and run the program.

Here is another short example program that encodes a string using the Caesar cipher, where each lower-case letter in a string is replaced by the letter three places up in the alphabet:

In this example, we declare a local function encode-char which encodes a single character c . The final statement s.map(encode-char) applies the encode-char function to each character in the string s , returning a new string where each character is Caesar encoded. The result of the final statement in a function is also the return value of that function, and you can generally leave out an explicit return keyword. Similarly, Koka's grammar is constructed in such a way that no semi-colons are needed to separate statements.

🔗 2.2 . Dot selection

Koka is a function-oriented language where functions and data form the core of the language (in contrast to objects for example). In particular, the expression s.encodeoverview/encode: (s : string, shift : int) -> string(3) does not select the encodeoverview/encode: (s : string, shift : int) -> string method from the stringstd/core/types/string: V object, but it is simply syntactic sugar for the function call encodeoverview/encode: (s : string, shift : int) -> string(s,3) where s becomes the first argument. Similarly, c.int converts a character to an integer by calling int(c) (and both expressions are equivalent). The dot notation is intuïtive and quite convenient to chain multiple calls together, as in:

for example (where the body desugars as println(length(encodeoverview/encode: (s : string, shift : int) -> string(s,3))) ). An advantage of the dot notation as syntactic sugar for function calls is that it is easy to extend the ‘primitive’ methods of any data type: just write a new function that takes that type as its first argument. In most object-oriented languages one would need to add that method to the class definition itself which is not always possible if such class came as a library for example.

Koka is also strongly typed. It uses a powerful type inference engine to infer most types, and types generally do not get in the way. In particular, you can always leave out the types of any local variables. This is the case for example for base and rot values in the previous example; hover with the mouse over the example to see the types that were inferred by Koka. Generally, it is good practice though to write type annotations for function parameters and the function result since it both helps with type inference, and it provides useful documentation with better feedback from the compiler.

For the encodeoverview/encode: (s : string, shift : int) -> string function it is actually essential to give the type of the s parameter: since the map function is defined for both liststd/core/list: V -> V and stringstd/core/types/string: V types and the program is ambiguous without an annotation. Try to load the example in the editor and remove the annotation to see what error Koka produces.

🔗 2.4 . Anonymous functions

Koka also allows for anonymous function expressions. For example, instead of declaring the encode-char function, we could just have passed it directly to the map function as a function expression:

It is a bit annoying we had to put the final right-parenthesis after the last brace. As a convenience, Koka allows anonymous functions to follow the function call instead. For example, here is how we can print the numbers 1 to 10 :

which is desugared to forstd/core/for: forall<e> (start : int, end : int, action : (int) -> e ()) -> e ()( 1, 10, fun(i){ println(i) } ) . In fact, since we pass the i argument directly to println , we can also the function itself directly, and write forstd/core/for: forall<e> (start : int, end : int, action : (int) -> e ()) -> e ()(1,10,println) .

Anonymous functions without any arguments can be shortened further by leaving out the fun keyword and just using braces directly. Here is an example using the repeat function:

where the body desugars to repeat( 10, fun(){println( hi )} ) . The is especially convenient for the whilestd/core/while: forall<e> (predicate : () -> <div|e> bool, action : () -> <div|e> ()) -> <div|e> () loop since this is not a built-in operator in Koka but just a regular function:

Note how the first argument to whilestd/core/while: forall<e> (predicate : () -> <div|e> bool, action : () -> <div|e> ()) -> <div|e> () is in braces instead of the usual parenthesis: Koka makes it always explicit whether code is evaluated before a function is called (in between parenthesis), or whether code is evaluated (potentially multiple times) by the called function instead (in between braces).

🔗 2.5 . Effect types

A novel part about Koka is that it automatically infers all the side effects that occur in a function. The absence of any effect is denoted as totalstd/core/total: E (or <> ) and corresponds to pure mathematical functions. If a function can raise an exception the effect is exnstd/core/exn: HX , and if a function may not terminate the effect is divstd/core/types/div: X (for divergence). The combination of exnstd/core/exn: HX and divstd/core/types/div: X is purestd/core/pure: E and corresponds directly to Haskell's notion of purity. Non- deterministic functions get the ndetstd/core/types/ndet: X effect. The ‘worst’ effect is iostd/core/io: E and means that a program can raise exceptions, not terminate, be non- deterministic, read and write to the heap, and do any input/output operations. Here are some examples of effectful functions:

When the effect is totalstd/core/total: E we usually leave it out in the type annotation. For example, when we write:

Then the assumed effect is totalstd/core/total: E . Sometimes, we write an effectful function, but are not interested in explicitly writing down its effect type. In that case, we can use a wildcard type which stands for some inferred type. A wildcard type is denoted by writing an identifier prefixed with an underscore, or even just an underscore by itself:

Hover over square6overview/square6: (x : int) -> console int to see the inferred effect for _e

🔗 2.6 . Semantics of effects

The inferred effects are not just considered as some extra type information on functions. On the contrary, through the inference of effects, Koka has a very strong connection to its denotational semantics. In particular, the full type of a Koka functions corresponds directly to the type signature of the mathematical function that describes its denotational semantics. For example, using 〚 t 〛 to translate a type t into its corresponding mathematical type signature, we have:

In the above translation, we use 1 + t as a sum where we have either a unit 1 (i.e. exception) or a type t, and we use Heap × t for a product consisting of a pair of a heap and a type t. From the above correspondence, we can immediately see that a totalstd/core/total: E function is truly total in the mathematical sense, while a stateful function ( ststd/core/types/st: H -> E<h> ) that can raise exceptions or not terminate ( purestd/core/pure: E ) takes an implicit heap parameter, and either does not terminate (⊥) or returns an updated heap together with either a value or an exception ( 1 ).

We believe that this semantic correspondence is the true power of full effect types and it enables effective equational reasoning about the code by a programmer. For almost all other existing programming languages, even the most basic semantics immediately include complex effects like heap manipulation and divergence. In contrast, Koka allows a layered semantics where we can easily separate out nicely behaved parts, which is essential for many domains, like safe LINQ queries, parallel tasks, tier-splitting, sand-boxed mobile code, etc.

🔗 2.7 . Combining effects

Often, a function contains multiple effects, for example:

fun combine-effects() { val i = srandom-int() // non-deterministic error("hi") // exception raising combine-effects() // and non-terminating } fun combine - effects() { val i = srandom - int() error( "hi" ) combine - effects() }

The effect assigned to combine-effects are ndetstd/core/types/ndet: X , divstd/core/types/div: X , and exnstd/core/exn: HX . We can write such combination as a row of effects as <divstd/core/types/div: X,exnstd/core/exn: HX,ndetstd/core/types/ndet: X> . When you hover over the combine-effects identifiers, you will see that the type inferred is really <purestd/core/pure: E,ndetstd/core/types/ndet: X> where purestd/core/pure: E is a type alias defined as

🔗 2.8 . Polymorphic effects

Many functions are polymorphic in their effect. For example, the mapstd/core/map: forall<a,b,e> (xs : list<a>, f : (a) -> e b) -> e list<b> function applies a function f to each element of a (finite) list. As such, the effect depends on the effect of f , and the type of map becomes:

We use single letters (possibly followed by digits) for polymorphic types. Here, the map functions takes a list with elements of some type a , and a function f that takes an element of type a and returns a new element of type b . The final result is a list with elements of type b . Moreover, the effect of the applied function e is also the effect of the map function itself; indeed, this function has no other effect by itself since it does not diverge, nor raises exceptions.

We can use the notation <l|e> to extend an effect e with another effect l . This is used for example in the whilestd/core/while: forall<e> (predicate : () -> <div|e> bool, action : () -> <div|e> ()) -> <div|e> () function which has type: whilestd/core/while: forall<e> (predicate : () -> <div|e> bool, action : () -> <div|e> ()) -> <div|e> () : ( pred : () -> <divstd/core/types/div: X|e> boolstd/core/types/bool: V, action : () -> <divstd/core/types/div: X|e> () ) -> <divstd/core/types/div: X|e> () . The whilestd/core/while: forall<e> (predicate : () -> <div|e> bool, action : () -> <div|e> ()) -> <div|e> () function takes a predicate function and an action to perform, both with effect <divstd/core/types/div: X|e> . Indeed, since while may diverge depending on the predicate its effect must include divergence.

The reader may be worried that the type of whilestd/core/while: forall<e> (predicate : () -> <div|e> bool, action : () -> <div|e> ()) -> <div|e> () forces the predicate and action to have exactly the same effect <divstd/core/types/div: X|e> , which even includes divergence. However, when effects are inferred at the call-site, both the effects of predicate and action are extended automatically until they match. This ensures we take the union of the effects in the predicate and action. Take for example the following loop

import std/num/random fun main() { looptest() } fun looptest() { while { odd?(srandom-int()) } { throw("odd") } } fun looptest() { while { odd?(srandom - int()) } { throw( "odd" ) } }

In the above program, Koka infers that the predicate odd(random-int()) has effect <ndetstd/core/types/ndet: X|e1> while the action has effect <exnstd/core/exn: HX|e2> for some e1 and e2 . When applying whilestd/core/while: forall<e> (predicate : () -> <div|e> bool, action : () -> <div|e> ()) -> <div|e> () , those effects are unified to the type <exnstd/core/exn: HX,ndetstd/core/types/ndet: X,divstd/core/types/div: X|e3> for some e3 (which can be seen by hovering over the looptest identifier)

🔗 2.9 . Isolated state

The Fibonacci numbers are a sequence where each subsequent Fibonacci number is the sum of the previous two, where fiboverview/fib: (n : int) -> div int(0) == 0 and fiboverview/fib: (n : int) -> div int(1) == 1 . We can easily calculate Fibonacci numbers using a recursive function:

Note that the type inference engine is currently not powerful enough to prove that this recursive function always terminates, which leads to inclusion of the divergence effect divstd/core/types/div: X in the result type.

Here is another version of the Fibonacci function but this time implemented using local state. We use the repeat function to iterate n times:

The var declaration declares a variable that can be assigned too using the (:=) operator. In contrast, a regular equality sign, as in y0 = y introduces an immutable value y0 . For clarity, one can actually write val y0 = y for such declaration too but we usually leave out the val keyword.

Local variables declared using var are actually syntactic sugar for allocating explicit references to mutable cells. A reference to a mutable integer is allocated using r = refstd/core/types/ref: forall<h,a> (value : a) -> (alloc<h>) ref<h,a>(0) (since the reference itself is actually a value!), and can be dereferenced using the bang operator, as !r . The desugared version of our previously Fibonacci function can be written using explicit references as

As we can see, using var declarations is quite convenient since such declaration automatically adds a dereferencing operator to all occurrences except on the left-hand side of an assignment.

When we look at the types inferred for the references, we see that x and y have type refstd/core/types/ref: (H, V) -> V<h,intstd/core/types/int: V> which stands for a reference to a mutable value of type intstd/core/types/int: V in some heap h . The effects on heaps are allocation as heap<h> , reading from a heap as readstd/core/types/read: H -> X<h> and writing to a heap as writestd/core/types/write: H -> X<h> . The combination of these effects is called stateful and denoted with the alias ststd/core/types/st: H -> E<h> .

Clearly, the effect of the body of fib3overview/fib3: (n : int) -> int is ststd/core/types/st: H -> E<h> ; but when we hover over fib3overview/fib3: (n : int) -> int , we see the type inferred is actually the totalstd/core/total: E effect: (n:intstd/core/types/int: V) -> intstd/core/types/int: V . Indeed, even though fib3overview/fib3: (n : int) -> int is stateful inside, its side-effects can never be observed. It turns out that we can safely discard the ststd/core/types/st: H -> E<h> effect whenever the heap type h cannot be referenced outside this function, i.e. it is not part of an argument or return type. More formally, the Koka compiler proves this by showing that a function is fully polymorphic in the heap type h and applies the runstd/core/types/run: forall<e,a> (action : forall<h> () -> <alloc<h>,read<h>,write<h>|e> a) -> e a function (corresponding to runST in Haskell) to discard the ststd/core/types/st: H -> E<h> effect.

The Garsia-Wachs algorithm is nice example where side-effects are used internally across function definitions and data structures, but where the final algorithm itself behaves like a pure function, see the lib/demo/garsiaWachs.kk example in the distribution.

🔗 2.10 . A larger example: cracking Caesar encoding

Enough about effects and imperative updates. Let's look at some more functional examples :-) For example, cracking Caesar encoded strings:

The val keyword declares a static value. In the example, the value englishoverview/english: list<double> is a list of floating point numbers (of type doublestd/core/types/double: V ) denoting the average frequency for each letter. The function freqsoverview/freqs: (s : string) -> list<double> builds a frequency table for a specific string, while the function chisqroverview/chisqr: (xs : list<double>, ys : list<double>) -> double calculates how well two frequency tables match. In the function crack these functions are used to find a shift value that results in a string whose frequency table matches the englishoverview/english: list<double> one the closest – and we use that to decode the string. Let's try it out in the editor!

🔗 2.11 . Optional and named parameters

Being a function-oriented language, Koka has powerful support for function calls where it supports both optional and named parameters. For example, the function replace-allstd/core/replace-all: (s : string, pattern : string, repl : string) -> string takes a string, a pattern pattern, and a replacement string repl :

Using named parameters, we can also write the function call as:

Optional parameters let you specify default values for parameters that do not need to be provided at a call-site. As an example, let's define a function sublistoverview/sublist: forall<a> (xs : list<a>, start : int, len : optional<int>) -> list<a> that takes a list, a start position, and the length len of the desired sublist. We can make the len parameter optional and by default return all elements following the start position by picking the length of the input list by default:

Hover over the sublistoverview/sublist: forall<a> (xs : list<a>, start : int, len : optional<int>) -> list<a> identifier to see its full type, where the len parameter has gotten an optional intstd/core/types/int: V type signified by the question mark: :?int .

An important aspect of a function-oriented language is to be able to define rich data types over which the functions work. A common data type is that of a struct or record. Here is an example of a struct that contains information about a person:

Every struct (and other data types) come with constructor functions to create instances, as in Personoverview/Person: (age : int, name : string, realname : string) -> person(25, gagaoverview/gaga: person ) . Moreover, these constructors can use named arguments so we can also call the constructor as Personoverview/Person: (age : int, name : string, realname : string) -> person( nameoverview/name: (person : person) -> string = "Lady Gaga", ageoverview/age: (person : person) -> int = 25, realnameoverview/realname: (person : person) -> string = "Stefani Joanne Angelina Germanotta" ) which is quite close regular record syntax but without any special rules; it is just functions all the way down!

Also, Koka automatically generates accessor functions for each field in a struct (or other data type), and we can access the ageoverview/age: (person : person) -> int of a personoverview/person: V as gagaoverview/gaga: person.ageoverview/age: (person : person) -> int (which is of course just syntactic sugar for ageoverview/age: (person : person) -> int(gagaoverview/gaga: person) ).

By default, all structs (and other data types) are immutable. Instead of directly mutating a field in a struct, we usually return a new struct where the fields are updated. For example, here is a birthdayoverview/birthday: (p : person) -> person function that increments the ageoverview/age: (person : person) -> int field:

Here, birthdayoverview/birthday: (p : person) -> person returns a fresh personoverview/person: V which is equal to p but with the ageoverview/age: (person : person) -> int incremented. The syntax p(...) is sugar for calling the copy constructor of a personoverview/person: V . This constructor is also automatically generated for each data type, and is basically defined as:

When arguments follow a data value, as in p( age = age + 1) , it is desugared to call this copy function, as in p.copyoverview/copy: (p : person, age : optional<int>, name : optional<string>, rname : optional<string>) -> person( ageoverview/age: (person : person) -> int = p.ageoverview/age: (person : person) -> int+1 ) . Again, there are no special rules for record updates and everything is just function calls with optional and named parameters.

🔗 2.14 . More data types

Koka also supports algebraic data types where there are multiple alternatives. For example, here is an enumeration:

type colors { Red Green Blue } type colors { Red Green Blue }

Special cases of these enumerated types are the voidstd/core/types/void: V type which has no alternatives (and therefore there exists no value with this type), the unit type () which has just one constructor, also written as () (and therefore, there exists only one value with the type () , namely () ), and finally the boolean type boolstd/core/types/bool: V with two constructors Truestd/core/types/True: bool and Falsestd/core/types/False: bool .

Constructors can have parameters. For example, here is how to create a number type which is either an integer or the infinity value:

We can create such number by writing integer(1) or infinity . Moreover, data types can be polymorphic and recursive. Here is the definition of the liststd/core/list: V -> V type which is either empty ( Nilstd/core/Nil: forall<a> list<a> ) or is a head element followed by a tail list ( Consstd/core/Cons: forall<a> (head : a, tail : list<a>) -> list<a> ):

Koka automatically generates accessor functions for each named parameter. For lists for example, we can access the head of a list as Consstd/core/Cons: forall<a> (head : a, tail : list<a>) -> list<a>(1,Nilstd/core/Nil: forall<a> list<a>).head .

We can now also see that struct types are just syntactic sugar for regular a type with a single constructor of the same name as the type. For example, our earlier personoverview/person: V struct, defined as

desugars to:

Todo

🔗 2.16 . Inductive, co-inductive, and recursive types

For the purposes of equational reasoning and termination checking, a type declaration is limited to finite inductive types. There are two more declarations, namely cotype and rectype that allow for co-inductive types, and arbitrary recursive types respectively.

~bar : before=‘|’

~many : before=‘{ ’ after=' }'

~opt : before='[ ' after=' ]'

🔗 3 . Koka language specification

This is the draft language specification of the Koka language, version 0.7.

Currently only the lexical and context-free grammar are specified. The standard libraries are documented separately.

🔗 3.1 . Lexical syntax

We define the grammar and lexical syntax of the language using standard BNF notation where non-terminals are generated by alternative patterns:

nonterm ::= pattern 1 | pattern 2

In the patterns, we use the following notations:

terminal A terminal symbol x0A A character with hexadecimal code 0A a..f The characters from a to f ( pattern ) Grouping [ pattern ] Optional occurrence of pattern { pattern } Zero or more occurrences of pattern pattern 1 | pattern 2 Choice: either pattern 1 or pattern 2 pattern <!diff> Difference: elements generated by pattern except those in diff nonterm [lex] Generate nonterm by drawing lexemes from lex

Care must be taken to distinguish meta-syntax such as | and ) from concrete terminal symbols as | and ) . In the specification the order of the productions is not important and at each point the longest matching lexeme is preferred. For example, even though function is a reserved word, the word functions is considered a single identifier. A prefix or postfix pattern is included when considering a longest match.

🔗 3.1.1 . Source code

Source code consists of a sequence of 8-bit characters. Valid characters in actual program code consists strictly of ASCII characters which range from 0 to 127 and can be encoded in 7-bits. Only comments, string literals, and character literals are allowed to contain extended 8-bit characters.

A program source is assumed to be UTF-8 encoded which allows comments, string literals, and character literals to contain (encoded) unicode characters. Moreover, the grammar is designed such that a lexical analyzer and parser can directly work on source files without doing UTF-8 decoding or unicode category identification. To further facilitate the processing of UTF-8 encoded files the lexical analyzer ignores an initial byte-order mark that some UTF-8 encoders insert. In particular, any program source is allowed to start with three byte-order mark bytes 0xEF , 0xBB , and 0xBF , which are ignored.

🔗 3.2 . Lexical grammar

In the specification of the lexical grammar all white space is explicit and there is no implicit white space between juxtaposed symbols. The lexical token stream is generated by the non-terminal lex which consists of lexemes and whitespace.

Before doing lexical analysis, there is a linefeed character inserted at the start and end of the input, which makes it easier to specify line comments and directives.

🔗 3.2.1 . Lexical tokens

lex ::= lexeme | whitespace lexeme ::= conid | qconid | varid | qvarid | op | opid | qopid | wildcard | natural | float | string | char | reserved | opreserved | special

The main program consists of whitespace or lexeme's. The context-free grammar will draw it's lexemes from the lex production.

anyid ::= varid | qvarid | opid | qopid | conid | qconid qconid ::= modulepath conid qvarid ::= modulepath lowerid modulepath ::= lowerid / { lowerid / } conid ::= upperid varid ::= lowerid <!reserved> lowerid ::= lower idtail upperid ::= upper idtail wildcard ::= _ idtail typevarid ::= letter { digit } idtail ::= { idchar } [ idfinal ] idchar ::= letter | digit | _ | - idfinal ::= ? | { ' } reserved ::= infix | infixr | infixl | prefix | type | cotype | struct | alias | con | rec | forall | exists | some | fun | fn | val | var | extern | if | then | else | elif | match | return | with | in | handle | handler | mask | override | control | rcontrol | effect | context | instance | module | import | as | public | private | abstract | interface | yield | qualified | hiding | unsafe (future reserved words) specialid ::= open | extend std / core / extend: ( slice : sslice , count : int ) -> sslice | behind | linear | value | reference | inline | noinline | include | import | js | c | file

Identifiers always start with a letter, may contain underscores and dashes, and can end with a question mark or primes. Like in Haskell, constructors always begin with an uppercase letter while regular identifiers are lowercase. The rationale is to visibly distinguish constants from variables in pattern matches. Here are some example of valid identifiers:

To avoid confusion with the subtraction operator, the occurrences of dashes are restricted in identifiers. After lexical analysis, only identifiers where each dash is surrounded on both sides with a letter are accepted:

fold-right n-1 // illegal, a digit cannot follow a dash n - 1 // n minus 1 n-x-1 // illegal, a digit cannot follow a dash n-x - 1 // identifier " n-x " minus 1 n - x - 1 // n minus x minus 1

Qualified identifiers are prefixed with a module path. Module paths can be partial as long as they are unambiguous.

core/ map std/ core/(&)

🔗 3.2.3 . Operators and symbols

qopid ::= modulepath opid opid ::= ( symbols ) op ::= symbols <! opreserved | optype> | || symbols ::= symbol { symbol } | / symbol ::= $ | % | & | * | + | ~ | ! | \ | ^ | # | = | . | : | - | ? | anglebar anglebar ::= < | > | | opreserved ::= = | . | : | -> optype ::= anglebar anglebar { anglebar } special ::= { | } | ( | ) | [ | ] | | | ; | ,

string ::= @" { graphic < " > | utf8 | space | tab | newline | "" } " (raw string) | " { graphic < " | \ > | utf8 | space | escape } " char ::= ' ( graphic < ' | \ > | utf8 | space | escape ) ' escape ::= \ ( charesc | hexesc ) charesc ::= n | r | t | \ | " | ' hexesc ::= x hexdigit 2 | u hexdigit 4 | U hexdigit 4 hexdigit 2 hexdigit 4 ::= hexdigit hexdigit hexdigit hexdigit hexdigit 2 ::= hexdigit hexdigit float ::= decimal . decimal [ exponent ] exponent ::= ( e | E ) [ - | + ] decimal natural ::= decimal | 0 ( x | X ) hexadecimal decimal ::= digit { digit } hexadecimal ::= hexdigit { hexdigit }

🔗 3.2.5 . White space

whitespace ::= white { white } | newline white ::= space | linecomment | blockcomment | linedirective linecomment ::= // { linechar } linedirective ::= newline # { linechar } linechar ::= graphic | utf8 | space | tab blockcomment ::= /* blockpart { blockcomment blockpart } */ (allows nested comments) blockpart ::= blockchars <blockchars ( /* | */ ) blockchars> blockchars ::= { blockchar } blockchar ::= graphic | utf8 | space | tab | newline

🔗 3.2.6 . Character classes

letter ::= upper | lower upper ::= A..Z lower ::= a..z digit ::= 0..9 hexdigit ::= a..f | A..F | digit newline ::= [ return ] linefeed (windows or unix style end of line) space ::= x20 (a space) tab ::= x09 (a tab ( \t )) linefeed ::= x0A (a line feed (

)) return ::= x0D (a carriage return ( \r )) graphic ::= x21 .. x7E (a visible character) utf8 ::= xC0 x80 (encoded 0 character) | ( xC2 .. xDF ) cont | xE0 ( xA0 .. xBF ) cont | ( xE1 .. xEC ) cont cont | xED ( x80 .. x9F ) cont | ( xEE .. xEF ) cont cont | xF0 ( x90 .. xBF ) cont cont | ( xF1 .. xF3 ) cont cont cont | xF4 ( x80 .. x8F ) cont cont cont ::= x80 .. xBF

🔗 3.3 . Semicolon insertion

Just like programming languages like Haskell, Python, JavaScript, Scala, Go, etc., there is a layout rule which automatically adds semicolons at appropriate places. This enables the programmer to leave out most semicolons:

Koka inserts semicolons automatically for any statements and declarations that are aligned between curly braces ( { and } ).

For example, in the following program:

we get semicolons before each statement that was aligned between the braces:

Since semicolons are only inserted for aligned statements, we can write a long statement on multiple lines by using more indentation:

In contrast to token-based layout rules (as in Scala or Go for example), this allows you to put line breaks at any point in a statement by just indenting more. Moreover, it means that the visual indentation of a program corresponds directly to how the compiler interprets the statements. Many tricky layout examples in other programming languages are often based on a mismatch between the visual representation and how a compiler interprets the tokens – with Koka's layout rule such issues are largely avoided.

To still allow for “block-style” layout, the layout rule does not insert a semicolon for an aligned statement if it starts with then , else , elif , or one of { , , , ) , or ] .

Of course, it is still allowed to use semicolons explicitly which can be used for example to put multiple statements on a single line:

The layout algorithm also checks for invalid layouts where the layout would not visually correspond to how the compiler interprets the tokens. In particular, it is illegal to indent less than the layout context or to put comments into the indentation (because of tabs or potential unicode characters). For example, the program:

is rejected. In order to facilitate code generation or source code compression, compilers are also required to support a mode where the layout rule is not applied and where no semicolons are inserted. A recognized command line flag for that mode should be --nosemi .

🔗 3.3.1 . The layout algorithm

Here we define the layout algorithm formally. A nice property of the layout algorithm is that it is performed on the token stream in between lexing and parsing, and is independent of both. In particular, there are no intricate dependencies with the parser that lead to bizarrely complex layout rules, as is the case in languages like Haskell or JavaScript.

To define the layout algorithm formally, we first establish some terminology:

A new line is started after every linefeed character.

Any non-white token is called a lexeme, where a line without lexemes is called blank.

The indentation of a lexeme is the column number of its first character on that line (starting at 1), and the indentation of a line is the indentation of the first lexeme on the line.

Because braces can be nested, we use a layout stack of strictly increasing indentations. The top indentation on the layout stack holds the layout indentation. The initial layout stack contains the single value 0 (which is never popped). We now proceed through the token stream where we perform the operations on the layout stack before the semicolon insertion:

Layout stack operations: If the previous lexeme was an open brace { or the start of the lexical token sequence, we push the indentation of the current lexeme on the layout stack. The pushed indentation must be larger than the previous layout indentation (unless the current lexeme is a closing brace). When a closing brace } is encountered the top indentation is popped from the layout stack.

Semicolon insertion: For each non-blank line, the indentation must be equal or larger to the layout indentation. A semicolon is inserted before the line whenever the indentation is equal, unless the first lexeme on the line is one of then , else , elif , or one of { , , , ) , or ] . Also, a semicolon is always inserted before a closing brace } and before the end of the token sequence.

As defined, semicolons are inserted whenever statements or declarations are aligned, unless the lexeme happens to be a clear statement continuation. To simplify the grammar specification, a semicolon is also always inserted before a closing brace and the end of the source. This allows us to specify many grammar elements as ended by semicolons instead of separated by semicolons which is more difficult to specify for a LALR(1) grammar.

Semicolon insertion can be easily implemented as part of the lexer, but could also be implemented as a straightforward transformation on the lexical token stream.

There is a full Flex (Lex) implementation of lexical analysis and the layout algorithm. Ultimately, the Flex implementation serves as the specification, and this document and the Flex implementation should always be in agreement.

The grammar specification starts with the non terminal module which draws its lexical tokens from lex where all whitespace tokens are implicitly ignored.

module [ lex ] ::= [ moduledecl ] modulebody moduledecl ::= semis [ visibility ] moduleid moduleid ::= qvarid | varid modulebody ::= { semis declarations } semis | semis declarations visibility ::= public | private semis ::= { ; } semi ::= ; semis

🔗 3.4.2 . Top level declarations

declarations ::= { import } { fixitydecl } topdecls import ::= [ visibility ] import [ moduleid = ] moduleid semi fixitydecl ::= [ visibility ] fixity natural identifier { , identifier } semi fixity ::= infixl | infixr | infix topdecls ::= { topdecl semi } topdecl ::= [ visibility ] puredecl | [ visibility ] aliasdecl | [ visibility ] typedecl | [ visibility ] externdecl | abstract typedecl

🔗 3.4.3 . Type declarations

aliasdecl ::= alias typeid [ typeparams ] [ kannot ] = type typedecl ::= typesort typeid [ typeparams ] [ kannot ] [ typebody ] | structmod struct typeid [ typeparams ] [ kannot ] [ conparams ] | effectmod effect typeid [ typeparams ] [ kannot ] [ opdecls ] | effectmod effect [ typeparams ] [ kannot ] opdecl | effectmod effect instance typeid [ typeparams ] [ kannot ] [ in type ] [ opdecls ] typesort ::= [ typemod ] type | cotype typemod ::= rec | open | extend std / core / extend: ( slice : sslice , count : int ) -> sslice | structmod structmod ::= value | reference effectmod ::= [ linear ] [ rec ] typeid ::= varid | [] | ( { , } ) | < > | < | > typeparams ::= < [ tbinders ] > tbinders ::= tbinder { , tbinder } tbinder ::= varid [ kannot ] typebody ::= { semis { constructor semi } } constructor ::= [ con ] conid [ typeparams ] [ conparams ] conparams ::= { semis { conparam semi } } conparam ::= (paramid | wildcard) : paramtype [ = expr ] opdecls ::= { semis { opdecl semi } } opdecl ::= [ visibility ] val identifier [ typeparams ] : tatom | [ visibility ] ( fun | control ) identifier [ typeparams ] opparams : tatom opparams ::= ( [ opparam { , opparam } ] ) opparam ::= [ paramid ] : paramtype

🔗 3.4.4 . Value and function declarations

puredecl ::= inlinemod val valdecl | inlinemod fun fundecl inlinemod ::= inline | noinline valdecl ::= binder = expr binder ::= identifier [ : type ] fundecl ::= funid funparam bodyexpr funparam ::= [ typeparams ] parameters [ : tresult ] [ qualifier ] funid ::= identifier | [ { , } ] (indexing operator) parameters ::= ( [ parameter { , parameter } ] ) parameter ::= paramid [ : paramtype ] [ = expr ] paramid ::= identifier | wildcard paramtype ::= type | ? type (optional parameter) qidentifier ::= qvarid | qidop | identifier identifier ::= varid | idop qoperator ::= op qconstructor ::= conid | qconid

block ::= { semis { statement semi } } statement ::= decl | withstat | returnexpr | basicexpr decl ::= fun fundecl | val apattern = valexpr (local values can use a pattern binding) | var binder := valexpr

bodyexpr ::= -> blockexpr | block blockexpr ::= expr (block is interpreted as statements) expr ::= withexpr block (interpreted as fn (){ ... } ) returnexpr basicexpr basicexpr ::= ifexpr | fnexpr | matchexpr | handlerexpr | opexpr ifexpr ::= if atom then { elif } [ else expr <!ifexpr> ] then ::= [ then ] expr <!ifexpr> elif ::= elif atom then matchexpr ::= match atom { semis { matchrule semi } } returnexpr ::= return opexpr fnexpr ::= fn funparam block handlerexpr ::= handler [ override ] heff opclauses | handle [ override ] heff ( expr ) opclauses | handler instance heff opclauses | handle instance heff ( expr ) opclauses heff ::= [ < tbasic > ] withexpr ::= withstat in expr withstat ::= with basicexpr with binder = basicexpr with [ override ] heff opclauses (with a handler) with binder = instance heff opclauses (with an instance)

🔗 3.4.7 . Operator expressions

For simplicity, we parse all operators as if they are left associative with the same precedence. We assume that a separate pass in the compiler will use the fixity declarations that are in scope to properly associate all operators in an expressions.

opexpr ::= prefix { qoperator prefixexpr } prefixexpr ::= { ! | ~ } appexpr appexpr ::= appexpr ( [ arguments ] ) (regular application) | appexpr [ [ arguments ] ] (index operation) | appexpr { fnexpr | block } (apply function expressions) | appexpr . atom | atom arguments ::= argument { , argument } argument ::= [ identifier = ] expr

🔗 3.4.8 . Atomic expressions

atom ::= qidentifier | qconstructor | literal | mask | ( ) (unit) | ( annexpr ) (parenthesized expression) | ( annexprs ) (tuple expression) | [ [ annexpr { , annexprs } [ , ] ] ] (list expression) literal ::= natural | float | char | string mask ::= mask [ behind ] < tbasic > annexprs ::= annexpr { , annexpr } annexpr ::= expr [ : typescheme ]

matchrule ::= patterns | expr -> blockexpr | patterns bodyexpr apattern ::= pattern [ typescheme ] pattern ::= identifier | wildcard | qconstructor [ ( [ patargs ] ) ] | ( [ apatterns ] ) (unit, parenthesized pattern, tuple pattern) | [ [ apatterns ] ] (list pattern) | apattern as identifier (named pattern) | literal patterns ::= pattern { , pattern } apatterns ::= apattern { , apattern } patargs ::= patarg { , patarg } patarg ::= [ identifier = ] apattern (possibly named parameter)

🔗 3.4.10 . Operation Clauses

opclauses ::= { semis { opclause semi } } | opclause semi opclause ::= val qidentifier [ type ] = expr | fun qidentifier opargs bodyexpr | control qidentifier opargs bodyexpr | rcontrol qidentifier opargs bodyexpr | return ( ( oparg ) | paramid) bodyexpr semi opargs ::= ( [ oparg { , oparg } ] ) oparg ::= paramid [ : type ]

🔗 3.4.11 . Type schemes

typescheme ::= somes foralls tarrow [ qualifier ] type ::= foralls tarrow [ qualifier ] foralls ::= [ forall typeparams ] some ::= [ some typeparams ] qualifier ::= with ( predicates ) predicates ::= predicate { , predicate } predicate ::= typeapp (interface)

tarrow ::= tatom [ -> tresult ] tresult ::= tatom [ tbasic ] tatom ::= tbasic | < anntype { , anntype } [ | tatom ] > | < > tbasic ::= typeapp | ( ) (unit type) | ( tparam ) (parenthesized type or type parameter) | ( tparam { , tparam } ) (tuple type or parameters) | [ anntype ] (list type) typeapp ::= typecon [ < anntype { , anntype } > ] typecon ::= varid | qvarid | wildcard | ( , { , } ) (tuple constructor) | [ ] (list constructor) | ( -> ) (function constructor) tparam ::= [ varid : ] anntype anntype ::= type [ kannot ]

kannot ::= :: kind kind ::= ( kind { , kind } ) -> kind | katom -> kind | katom katom ::= V (value type) | X (effect type) | E (effect row) | H (heap type) | P (predicate type)

As a companion to the Flex lexical implementation, there is a full Bison(Yacc) LALR(1) implementation available. Again, the Bison parser functions as the specification of the grammar and this document should always be in agreement with that implementation.

References [1] Daan Leijen. “Koka: Programming with Row Polymorphic Effect Types.” In Mathematically Structured Functional Programming 2014. EPTCS. Mar. 2014. arXiv: 1406.2061. 🔎 Daan Leijen. “Koka: Programming with Row Polymorphic Effect Types.” In Mathematically Structured Functional Programming 2014. EPTCS. Mar. 2014. [2] Daan Leijen. Algebraic Effects for Functional Programming. MSR-TR-2016-29. Microsoft Research. Aug. 2016. 3 ] 🔎 Daan Leijen. Algebraic Effects for Functional Programming. MSR-TR-2016-29. Microsoft Research. Aug. 2016. https://​www.​microsoft.​com/​en-​us/​research/​publication/​algebraic-​effects-​for-​functional-​programming . Extended version of [3] Daan Leijen. “Type Directed Compilation of Row-Typed Algebraic Effects.” In Proceedings of Principles of Programming Languages (POPL’17). Paris, France. Jan. 2017. Daan Leijen. “Type Directed Compilation of Row-Typed Algebraic Effects.” In Proceedings of Principles of Programming Languages (POPL’17). Paris, France. Jan. 2017. 🔎

Appendix

🔗 A . Full grammar specification