Implementation of the Parser in Elixir

The parser is made up of two parts, the tokenizer breaks the code into keywords and an AST maker.

Tokenizer

In the example given below every single nested or non-nested Scheme Expression is separated by ( ) .

(begin (if (> x y) (set! max x) (set! max y))

We have to tokenize the source program by ( ) so let’s do that.

There’s a lot going on here so let’s unpack it: def is an Elixir macro to declare a named function. Macros are a very powerful construct used for metaprogramming in Elixir which won’t be in the scope for this article. Functions in Elixir are identified by their name and arity. Arity of a function is the number of arguments it takes. The tokenize function takes a single argument str so tokenize is identified as tokenize/1 .

Elixir also provides helpful string handling functions in String module. Modules in Elixir are a group of functions. A module can be defined using defmodule macro. We’ll later add more functions in our parsing step and group those functions in a Parse module.

Now for the tokenizing step we have to ensure there is a minimum of a single whitespace between keywords so we space out the brackets. |> is called the pipe operator and it works similar to the Unix pipe. The output of the first function is piped as the first argument to the next function. So str is passed as the first argument to String.replace/3 . Finally String.split/2 gets the spaced out string, String.split/1 has a second default argument which is a whitespace ' ' and it returns a list of Strings. List is one of the enumerable types in Elixir, some of the other being Tuple and Map . So for the above example code, the output of tokenize/1 would be

[ "(", "begin", "(", "if", "(", ">", "x", "y", ")", "(", "set!", "max", "x", ")", "(", "set!", "max", "y", ")", ")" ]

AST Maker

Before we delve into this step we need to know more about Lists, pattern matching and recursion in Elixir.

Understanding Lists

Lists in Elixir can be defined by enclosing comma separated values in square brackets.

list = [ 1, 2, 3]

[1, 2, 3] length(list)

3

Internally in Elixir a list is represented as a linked list. This has some subtle effects such as finding the length of the list is a linear operation now.

We can concatenate two lists using ++ operator

[4] ++ list

[4, 1, 2, 3] list ++ [4]

[1, 2, 3, 4]

Prepending to a list occurs in constant time while appending to a list would take linear time.

A list in Elixir can be divided into a head and tail similar to a linked list using | operator.

[head | tail] = list head

1 tail

[2, 3]

head will have 1 and tail would itself be a list of 2 elements [2, 3]

Elixir provides helper functions hd/1 and tl/1 which find the head and tail of the list provided as the argument.

hd(list)

1 tl(list)

[2, 3]

Elixir also provides a dedicated List module which has plenty of functions to manipulate lists.

List.first(list)

1 List.last(list)

3

Atoms in Elixir

An atom is a constant who’s name represents its value. They can be defined by prepending a colon : to a name. The boolean values true and false are in fact Elixir atoms. Elixir provides a helper function to check for an atom.

my_atom = :john is_atom(my_atom)

true is_atom(false)

true

Pattern Matching Demsytified

In Elixir = operator is a match operator and it compares both the left hand and the right hand side values and if any of the sides don’t match it throws an error.

[1, 2, 3] = list

[1, 2, 3] [1, 2, 5] = list

** (MatchError) no match of right hand side value: [1, 2, 3]

This might look similar to == operator but that operator only does a comparison. The match operator = here does a comparison as well as binds a value to a variable but the variable binding can only take place at the left side of the match operator. This makes for a very interesting use case in de-structuring complex types such as a List or a Tuple .

[first, mid, last ] = list

[1, 2, 3] [first, mid, last]

[1, 2, 3] [1, 2, _] = list

[1, 2, 3]

In the third example the last element is underscore _ . The underscore is a special construct in Elixir which will match to anything and will not bind the value to an element. It is commonly used in pattern matching where an an element has to be ignored.

Now to understand pattern matching let’s take an example where we query an API to fetch list of three students and store them in a list called students. Based on the elements of the list we have to do specific tasks.

students = fetchCoolKids() doCoolStuff(students) def doCoolStuff(["Stu", "Alan", "_"]) do

IO.puts "We don't need a third person."

end def doCoolStuff(["Doug", _, _]) do

IO.puts "I don't need nobody."

end # Scenario 1

students = ["Doug", "Alan", "Mark"]

I don't need nobody #Scenario 2

students = ["Stu", "Alan", "Mark"]

We don't need a third person #Scenario 3

students = ["Mark", "Stu", "Alan"]

(FunctionClauseError) no function clause matching in doCoolStuff/1

Type Definitions

Scheme has few types of objects, this is how we are going to represent them in Elixir

Symbol -> Implemented as an Atom . begin => :begin

. Atom -> Implemented as an Atom or a Number .

or a . Number -> Implemented as Float or Integer .

or . List -> Implemented as a List . (1, 2, 3) => [1, 2, 3]

. Expression -> implemented either as an Atom or a List

Creating the AST

Below is the code for creating the AST with some helpful comments.

I have used recursion to go through all the tokens as there is no concept of while loop in Elixir because of immutability. parse/2 takes the list of tokens as the first argument and an accumulator acc as the second. [head | tail] form has been used to represent the list.

To understand the above code better lets take an example lisp code

(begin (define r 10) (* r r))

The tokenized form of the above lisp code would be this

["(", "begin", "(", "define", "r", "10", ")", "(", "*", "r", "r", ")", ")"]

We would pass the above list of tokens and an empty accumulator list to our parse/2 function. Pattern matching will take care of the specific function to be called. What we are trying to achieve is to make a list of operator and arguments every time we encounter '(' and ') . In an AST form the non-leaf nodes will always be a function or an operator and leaf nodes would be a symbol ( number or a string ).

Below is the output of the parsing stage

[ :begin, [ :define, :r, 10 ], [ :*, :r, :r ] ]