This is the third installment in creating a brainfuck (BF) compiler using F# and FParsec. Previous posts discussed the parsing and the IL generation. Now it it time to pull it all together into something useful.

Series:

Part 1 (Parsing)

Part 2 (IL Generation)

Part 3 (Compiler)

Part 4 (Optimization)

Using Paket, here is a sample paket.dependencies file.

1

2

3

source https:



nuget FParsec



Here is the standard boilerplate for loading assemblies and generally getting ready to go.

1

2

3

4

5

6

7

8

9

10

11

System.IO.Directory.SetCurrentDirectory(__SOURCE_DIRECTORY__)

#r "../packages/FParsec/lib/net40-client/FParsecCS.dll"

#r "../packages/FParsec/lib/net40-client/FParsec.dll"



open System

open System.IO

open System.Runtime

open System.Reflection

open System.Reflection.Emit

open System.Text.RegularExpressions

open FParsec



I’ll start with the parsing code. I include it here for completeness sake, but there isn’t much new here. From a logic perspective, I added Comment support into the parser (basically any char that isn’t a valid character). I also wrapped the parsing code in a module for organizational purposes.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61



type ActionOps = Gt | Lt | Plus | Minus | Read | Write

type AllOps =

| Op of ActionOps

| Loop of AllOps list

| Program of AllOps list

| Comment







module Parser =

type UserState = unit

type Parser < 't > = Parser< 't , UserState>



let pGreater:Parser<_> =

pchar '>' |>> fun _ -> Op Gt



let pLesser:Parser<_> =

pchar '<' |>> fun _ -> Op Lt



let pPlus:Parser<_> =

pchar '+' |>> fun _ -> Op Plus



let pMinus:Parser<_> =

pchar '-' |>> fun _ -> Op Minus



let pPeriod:Parser<_> =

pchar '.' |>> fun _ -> Op Write



let pComma:Parser<_> =

pchar ',' |>> fun _ -> Op Read



let pLBracket:Parser<_> =

pchar '[' |>> ignore



let pRBracket:Parser<_> =

pchar ']' |>> ignore



let pCommands = pGreater <|> pLesser <|> pPlus <|> pMinus <|> pPeriod <|> pComma



let (pLoop:Parser<AllOps>), (pLoopImpl:Parser<AllOps> ref) = createParserForwardedToRef()



do pLoopImpl := pLBracket >>. (many (pCommands <|> pLoop)) .>> pRBracket |>> fun x ->

Loop x



let commentChar c =

c <> '>' &&

c <> '<' &&

c <> '+' &&

c <> '-' &&

c <> '[' &&

c <> ']' &&

c <> '.' &&

c <> ','



let pComment:Parser<_> =

many1Satisfy commentChar |>> fun _ -> Comment





let buildAst code =

run (many (pCommands <|> pLoop <|> pComment)) code



This is the start of the new IL emitting code. Sticking with the standard definition, the memory array will be of size 30,000. LocalBuilder is an IL local variable object. To protect me from myself, I leverage F# types to ensure that I’m using the correct variable in the correct place. For those not familar with F#, this will become more obvious where the variables are used. Simply, the type safety this brings is immensely useful and offers me a piece of mind. As a sidenote, BF defines the use of a program counter. I don’t actually need a program counter to keep track of where I am, so it’s excluded from my implementation. It’s here, but commented out primarily for reference-sake.

1

2

3

4

5

6

7

8

9

module Emitter =





let memorySize = 30000





type MemArray = MEM of LocalBuilder

type DpPointer = DP of LocalBuilder





There will some variable creation along the way. To be consistent I create some functions to consistently implement that process. ilGenerator is basically the function the code is attached to. Creating a scalar is straight forward. Creating an array is a bit more involved. I need to create the variable, then call Newarr with the element type and array size to initialize the array. Once complete, I return the created variable

1

2

3

4

5

6

7

8

9

10

11

12

13

14



let createLocal (ilGenerator:ILGenerator) (t:Type) =

ilGenerator.DeclareLocal(t)





let createLocalArray (ilGenerator:ILGenerator) (t:Type) (size:int) =

let regex = new Regex( @"\[\]$" )

let tElement = Type.GetType(regex.Replace(t.FullName, "" ))



let var = createLocal ilGenerator t

ilGenerator.Emit(OpCodes.Ldc_I4, size)

ilGenerator.Emit(OpCodes.Newarr, tElement)

ilGenerator.Emit(OpCodes.Stloc, var)

var



Both the add and sub perform similar actions, add a value to a local variable. As a quick refresher, there will be lots of pushing onto the stack, performing actions, and popping results off a stack. As a result, these functions follow a pattern that will show up repeatably in this post. LdLoc pushes a local variable value onto the stack. Ldc_I4 pushes a 32bit int onto the stack; in this case the amount to be added/subtracted to/from the local variable value. Add and Sub pop the top 2 values off the top of the stack, perform their respective actions, and push the result onto the stack. StLoc pops a value off the stack and stores it in the local variable.

1

2

3

4

5

6

7

8

9

10

11

12

13



let addLocalInt (ilGenerator:ILGenerator) (var:LocalBuilder) (i:int) =

ilGenerator.Emit(OpCodes.Ldloc, var)

ilGenerator.Emit(OpCodes.Ldc_I4, i)

ilGenerator.Emit(OpCodes.Add)

ilGenerator.Emit(OpCodes.Stloc, var)





let subLocalInt (ilGenerator:ILGenerator) (var:LocalBuilder) (i:int) =

ilGenerator.Emit(OpCodes.Ldloc, var)

ilGenerator.Emit(OpCodes.Ldc_I4, i)

ilGenerator.Emit(OpCodes.Sub)

ilGenerator.Emit(OpCodes.Stloc, var)



The functions emitGt and emitLt increment and decrement the data pointer by leveraging the previously defined add/sub functions. This is also an important place to mention type safety (See, I said I’d get back to that). I can ensure that I don’t ever accidently send anything other than the data pointer into these functions. That can simply be done with a (dataPointer:DpPointer) , but I want to do a bit more; I want to use the LocalBuilder value within the type. If you’re not familar with the syntax, (DP(dataPointer):DpPointer) effectively ensures that dataPointer is of type DpPointer, and extracts the LocalBuilder instance into the dataPointer variable. This is a nice construct to not only ensure I send in a LocalBuilder, but that I send in the correct LocalBuilder. The additional level of security is one of the strengths type safety brings to the table.

1

2

3

4

5

6

7



let emitGt (ilGenerator:ILGenerator) (DP(dataPointer):DpPointer) =

addLocalInt ilGenerator dataPointer 1





let emitLt (ilGenerator:ILGenerator) (DP(dataPointer):DpPointer) =

subLocalInt ilGenerator dataPointer 1



emitPlus and emitMinus increment and decrement the value at data[dataPointer], respectively. This is similar to before, the primary difference is how to get/set an array value. Also note that again I use F# types to ensure I am only ever working with the data[] and dataPointer variables. To get a value from an array onto the stack I push the array variable reference on the stack Ldloc <array variable> , then the index Ldloc <index> , then a command to push the value at array[index] onto the stack Ldelem_I4 . To set a value of an array the process is similar. Push the array reference, index, and value on the stack, then call a command to pop the top value of the stack into array[index], Stelem_I4 . Beyond getting values in and out of arrays, I just have to do the appropriate add/subtract to the value before doing the set.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35



let emitPlus (ilGenerator:ILGenerator) (MEM(data):MemArray) (DP(dataPointer):DpPointer) =



ilGenerator.Emit(OpCodes.Ldloc, data)

ilGenerator.Emit(OpCodes.Ldloc, dataPointer)





ilGenerator.Emit(OpCodes.Ldloc, data)

ilGenerator.Emit(OpCodes.Ldloc, dataPointer)

ilGenerator.Emit(OpCodes.Ldelem_I4)





ilGenerator.Emit(OpCodes.Ldc_I4_1)

ilGenerator.Emit(OpCodes.Add)





ilGenerator.Emit(OpCodes.Stelem_I4)





let emitMinus (ilGenerator:ILGenerator) (MEM(data):MemArray) (DP(dataPointer):DpPointer) =



ilGenerator.Emit(OpCodes.Ldloc, data)

ilGenerator.Emit(OpCodes.Ldloc, dataPointer)





ilGenerator.Emit(OpCodes.Ldloc, data)

ilGenerator.Emit(OpCodes.Ldloc, dataPointer)

ilGenerator.Emit(OpCodes.Ldelem_I4)





ilGenerator.Emit(OpCodes.Ldc_I4_1)

ilGenerator.Emit(OpCodes.Sub)





ilGenerator.Emit(OpCodes.Stelem_I4)



Now it is time to tackle the IO portion of BF. emitComma reads a character from stdin and saves it into data[dataPointer]. To do this, I need to read the keypress from the console and convert it to a char. In F# I would do let (keypressChar:char) = Console.ReadKey().KeyChar . Using straight IL, I call the ReadKey and get_KeyChar functions. After that, it is a matter of saving the value into the data array as an int. emitPeriod displays the value at data[dataPointer] to stdout as a char. There isn’t a lot new here, push the value from the array onto the stack, then call Console.Write . Of possible interest here is the parameter type char. If I make the parameter type be typeof<int> , it would instead print the actual int value in the data array. Although not what I want as final output, its a useful note for debugging.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24



let emitComma (ilGenerator:ILGenerator) (MEM(data):MemArray) (DP(dataPointer):DpPointer) =



let consoleReadKey = createLocal ilGenerator typeof<System.ConsoleKeyInfo>

ilGenerator.EmitCall(OpCodes.Call, typeof<Console>.GetMethod( "ReadKey" , [| |]), [| typeof<ConsoleKeyInfo> |])

ilGenerator.Emit(OpCodes.Stloc, consoleReadKey)



let keypressChar = createLocal ilGenerator typeof<char>

ilGenerator.Emit(OpCodes.Ldloca_S, consoleReadKey)

ilGenerator.EmitCall(OpCodes.Call, typeof<ConsoleKeyInfo>.GetMethod( "get_KeyChar" , [| |]), [| typeof<char> |])

ilGenerator.Emit(OpCodes.Stloc, keypressChar)





ilGenerator.Emit(OpCodes.Ldloc, data)

ilGenerator.Emit(OpCodes.Ldloc, dataPointer)

ilGenerator.Emit(OpCodes.Ldloc, keypressChar)

ilGenerator.Emit(OpCodes.Stelem_I4)





let emitPeriod (ilGenerator:ILGenerator) (MEM(data):MemArray) (DP(dataPointer):DpPointer) =

ilGenerator.Emit(OpCodes.Ldloc, data)

ilGenerator.Emit(OpCodes.Ldloc, dataPointer)

ilGenerator.Emit(OpCodes.Ldelem_I4)

ilGenerator.EmitCall(OpCodes.Call,typeof<Console>.GetMethod( "Write" , [| typeof<char> |]), null );



Now that all the ActionOps are implemented, I make a generic emitOp function. There isn’t anything particularly exciting here.

1

2

3

4

5

6

7

8

9



let emitOp (ilGenerator:ILGenerator) (op:ActionOps) (data:MemArray) (dataPointer:DpPointer) =

match op with

| Gt -> emitGt ilGenerator dataPointer

| Lt -> emitLt ilGenerator dataPointer

| Plus -> emitPlus ilGenerator data dataPointer

| Minus -> emitMinus ilGenerator data dataPointer

| Read -> emitComma ilGenerator data dataPointer

| Write -> emitPeriod ilGenerator data dataPointer



Time to implement loop blocks, program, and ast processor. These are mutually recursive functions. For those new to F#, this is accomplished by let rec foo = <foo code> and bar <bar code> syntax. It is a necessary construct when foo needs to call bar, and vice versa.

The loop is a list of operators, with an associated predicate to determine if the list should continue to execute. To do this the loop leverages labels for jumping between sections of code (the predicate section and the end of loop). This is done in a two step process. One, call DefineLabel() to create the label, this allows it to be referenced. Two, MarkLabel says where in the code the label belongs. As you can imagine, I put the predicate label at the begining of the predicate, and the loop-end label after the loop’s block of code.

Tackling the predicate logic; the looping works as follows, “Continue loop as long as data[dataPointer] != 0”. To test this, I push the value from the array onto the stack, then a 0. Ceq does the compare, pushing the result onto the stack and Brtrue branches out of the loop if the value equals 0.

Once the branching logic is complete, all that’s left is processing the list of ops. This is a good segue to emitOpList. Both a loop and program are a list of operators. This is simply a matter of iterating over the list and feeding each one into processAst.

Even though processAst is the controlling function for the emitting of the AST -> IL, it looks pretty tame to everything else so far. But that’s because the hard stuff has already been done. processAst provides a match that handles the 3 types of operators; and their respective functions take care of the rest of the work.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42



let rec emitLoop (ilGenerator:ILGenerator) (MEM(data):MemArray) (DP(dataPointer):DpPointer) (block:AllOps list) =

let ilLoopPredicate = ilGenerator.DefineLabel()

let ilLoopStart = ilGenerator.DefineLabel()

let ilLoopEnd = ilGenerator.DefineLabel()











ilGenerator.MarkLabel(ilLoopPredicate)

ilGenerator.Emit(OpCodes.Ldloc, data)

ilGenerator.Emit(OpCodes.Ldloc, dataPointer)

ilGenerator.Emit(OpCodes.Ldelem_I4)



ilGenerator.Emit(OpCodes.Ldc_I4, 0 )





ilGenerator.Emit(OpCodes.Ceq)

ilGenerator.Emit(OpCodes.Brtrue, ilLoopEnd)







ilGenerator.MarkLabel(ilLoopStart)

emitOpList ilGenerator (MEM data) (DP dataPointer) block







ilGenerator.Emit(OpCodes.Br, ilLoopPredicate)

ilGenerator.MarkLabel(ilLoopEnd)





and emitOpList (ilGenerator:ILGenerator) (data:MemArray) (dataPointer:DpPointer) (ast:AllOps list) =

ast |> List.iter (processAst ilGenerator data dataPointer)





and processAst (ilGenerator:ILGenerator) (data:MemArray) (dataPointer:DpPointer) (ast:AllOps) =

match ast with

| Op op -> emitOp ilGenerator op data dataPointer

| Loop block -> emitLoop ilGenerator data dataPointer block

| Program block -> emitOpList ilGenerator data dataPointer block

| Comment -> ()



Taking a breather from all that new stuff, I head back into familiar territory. This is the code from my second post in the series; emitting an IL program. The only change of significance is that I now insert the BF logic. This consists of creating the necessary variables and generating IL based on the AST. First, the variables: data, dataPointer, and programCounter. As mentioned above I don’t need programCounter, so it’s here for reference only. I use my previously defined “create variable” functions for this. I also wrap each variable in it’s respective type (MEM, DP, PC). This ensures that as I use them, I don’t send the wrong variable to the wrong function. Second, the IL generation with processAst.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46



let cleanName name =

let regEx = new Regex( "[^A-Z0-9]" , RegexOptions.IgnoreCase)

regEx.Replace(name, "" )





let createProgram (appName:string) (ast:AllOps list) =



let appName' = cleanName appName

let exeName = sprintf "%s.exe" appName



let appDomain = AppDomain.CurrentDomain



let assemblyName = new AssemblyName()

assemblyName.Name <- appName'



let assembly = appDomain.DefineDynamicAssembly(assemblyName, AssemblyBuilderAccess.Save)



let programModule = assembly.DefineDynamicModule(appName', exeName)



let appClass = sprintf "%s.Program" appName'

let programType = programModule.DefineType(appClass, TypeAttributes.Public)



let mainMethod = programType.DefineMethod( "Main" , MethodAttributes.Static)





assembly.SetEntryPoint(mainMethod)



let mainIl = mainMethod.GetILGenerator()









let data = MEM(createLocalArray mainIl typeof<int[]> memorySize)

let dataPointer = DP(createLocal mainIl typeof<int>)





processAst mainIl data dataPointer (Program ast)



mainIl.Emit(OpCodes.Ret)





programType.CreateType() |> ignore





assembly.Save(exeName)



Tying the parser and emitter is done in the compile function. FParsec provides a Success/Failure when parsing a string. In this case, if success I emit a compiled program, otherwise I display the parsing error. I also filter out the comments prior to sending it to emitter.

1

2

3

4

5

6

module Compiler =

let compile appName code =

let ast = Parser.buildAst code

match ast with

| Success(x, _, _) -> Emitter.createProgram appName (x |> List.filter ( fun x' -> x' <> Comment))

| Failure(e, _, _) -> printfn "Error: %A" e



Here is a compilation of a Hello World program.

1

2

3

4

let programName = "HelloWorld"

let code = "++++++++++++[>++++++>++++++++>+++<<<-]>.>+++++.+++++++..+++.>----.<<+++++++++++++++.>.+++.------.--------.>+."



Compiler.compile programName code



Quite exciting, it worked!

Taking it a bit further, I can compile bf files from the commandline. To support this I remove the programName and code variables, and replace them with the below block. This will compile all *.bf files provided as arguments.

1

2

3

4

5

Environment.GetCommandLineArgs()

|> Array.filter ( fun (x:string) -> x.EndsWith( ".bf" ))

|> Array.iter ( fun x ->

printfn "compiling %s ..." x

Compiler.compile x (File.ReadAllText x))



The primary goal is complete. I can now compile BF to run in the CLR. There are a couple directions I could follow at this point. Next time, going deeper down the rabbit hole: optimization.