Here at Stream, we use Go extensively, and it has drastically improved our productivity. We have also found that by using Go, the speed is outstanding and since we started using it, we have implemented mission-critical portions of our stack, such as our in-house storage engine powered by gRPC, Raft, and RocksDB. Today we are going to look at the Go 1.11 compiler and how it compiles down your Go source code to an executable to gain an understanding of how the tools we use everyday work. We will also see why Go code is so fast and how the compiler helps. We will take a look at three phases of the compiler:

The scanner, which converts the source code into a list of tokens, for use by the parser.

The parser, which converts the tokens into an Abstract Syntax Tree to be used by code generation.

The code generation, which converts the Abstract Syntax Tree to machine code.

Note: The packages we are going to be using (go/scanner, go/parser, go/token, go/ast, etc.) are not used by the Go compiler, but are mainly provided for use by tools to operate on Go source code. However, the actual Go compiler has very similar semantics. It does not use these packages because the compiler was once written in C and converted to Go code, so the actual Go compiler is still reminiscent of that structure.

Scanner

The first step of every compiler is to break up the raw source code text into tokens, which is done by the scanner (also known as lexer). Tokens can be keywords, strings, variable names, function names, etc. Every valid program “word” is represented by a token. In concrete terms for Go, this might mean we have a token “package”, “main”, “func” and so forth. Each token is represented by its position, type, and raw text in Go. Go even allows us to execute the scanner ourselves in a Go program by using the go/scanner and go/token packages. That means we can inspect what our program looks like to the Go compiler after it has been scanned. To do so, we are going to create a simple program that prints all tokens of a Hello World program. The program will look like this: https://gist.github.com/nparsons08/fb5d7350f2f052d8f50794c010285019 We will create our source code string and initialize the scanner.Scanner struct which will scan our source code. We call Scan() as many times as we can and print the token’s position, type, and literal string until we reach the End of File (EOF) marker. When we run the program, it will print the following: https://gist.github.com/koesie10/e312024b5f52795756e81a95906bd8e1 Here we can see what the Go parser uses when it compiles a program. What we can also see is that the scanner adds semicolons where those would usually be placed in other programming languages such as C. This explains why Go does not need semicolons: they are placed intelligently by the scanner.

Parser

After the source code has been scanned, it will be passed to the parser. The parser is a phase of the compiler that converts the tokens into an Abstract Syntax Tree (AST). The AST is a structured representation of the source code. In the AST we will be able to see the program structure, such as functions and constant declarations. Go has again provided us with packages to parse the program and view the AST: go/parser and go/ast. We can use them like this to print the full AST: https://gist.github.com/nparsons08/234cc8ff0aa75067c22607d633d2e1f0 Output: https://gist.github.com/nparsons08/85f429cd024544f3b73dfa6c6d81c15d In this output, you can see that there is quite some information about the program. In the Decls fields, there is a list of all declarations in the file, such as imports, constants, variables, and functions. In this case, we only have two: our import of the fmt package and the main function. To digest it further, we can look at this diagram, which is a representation of the above data, but only includes types and in red the code that corresponds to the nodes: The main function is composed of three parts: the name, the declaration, and the body. The name is represented as an identifier with the value main. The declaration, specified by the Type field, would contain a list of parameters and return type if we had specified any. The body consists of a list of statements with all lines of our program, in this case only one. Our single fmt.Println statement consists of quite a few parts in the AST. The statement is an ExprStmt, which represents an expression, which can, for example, be a function call, as it is here, or it can be a literal, a binary operation (for example addition and subtraction), a unary operation (for instance negating a number) and many more. Anything that can be used in a function call’s arguments is an expression. Our ExprStmt contains a CallExpr, which is our actual function call. This again includes several parts, most important of which are Fun and Args. Fun contains a reference to the function call, in this case, it is a SelectorExpr, because we select the Println identifier from the fmt package. However, in the AST it is not yet known to the compiler that fmt is a package, it could also be a variable in the AST. Args contains a list of expressions which are the arguments to the function. In this case, we have passed a literal string to the function, so it is represented by a BasicLit with type STRING. It is clear that we are able to deduce a lot from the AST. That means that we can also inspect the AST further and find for example all function calls in the file. To do so, we are going to use the Inspect function from the ast package. This function will recursively walk the tree and allow us to inspect the information from all nodes. To extract all function calls, we are going to use the following code: https://gist.github.com/koesie10/ba6af59e0dd8213260e5944c1464b0b1

What we are doing here is looking for all nodes and whether they are of type *ast.CallExpr, which we just saw represented our function call. If they are, we are going to print the name of the function, which was present in the Fun member, using the printer package. The output for this code will be: fmt.Println This is indeed the only function call in our simple program, so we indeed found all function calls. After the AST has been constructed, all imports will be resolved using the GOPATH, or for Go 1.11 and up possibly modules. Then, types will be checked, and some preliminary optimizations are applied which make the execution of the program faster.