Illustration created for “A Journey With Go”, made from the original Go Gopher, created by Renee French.

ℹ️ This article is based on Go 1.13.

The inlining process replaces a function call by the body of this function. Although this optimization increases the binary size, it improves the performance of the programs. However, Go does not inline all the functions and follows some rules.

Rules

Let’s start with an example to understand what exactly inlining is. The following program, split into two files, will sum/subtract a list of numbers:

main.go func main() {

n := []float32{120.4, -46.7, 32.50, 34.65, -67.45}

fmt.Printf("The total is %.02f

", sum(n))

}



func sum(s []float32) float32 {

var t float32

for _, v := range s {

if t < 0 {

t = add(t, v)

} else {

t = sub(t, v)

}

}



return t

} op.go func add(a, b float32) float32 {

return a + b

}



func sub(a, b float32) float32 {

return a - b

}

Running the program with the flag -gcflags="-m" shows the inlined functions:

./op.go:3:6: can inline add

./op.go:7:6: can inline sub

./main.go:16:11: inlining call to sub

./main.go:14:11: inlining call to add

./main.go:7:12: inlining call to fmt.Printf

We see that the method add is inlined. However, what about the method sum ? Running the program with more verbosity with -m -m as a value in the flag explains why:

./main.go:10:6: cannot inline sum: unhandled op RANGE

Go does not inline methods that use the range operation. Indeed, some operations block the inlining, such as closure calls, select , for , defer , and goroutine creation with go . However, this is not the only rule. When parsing the AST graph, Go allocates a budget of 80 nodes for the inlining. Each node consumes one of the budgets when functions call consumes the cost of their inlining. As an example, the following instruction a = a + 1 represents five nodes: AS , NAME , ADD , NAME , LITERAL . Here is the SSA dump:

When the cost of a function exceeds the budget, the inlining is refused. Here is an example with a bigger function add :

./op.go:3:6: cannot inline add: function too complex: cost 104 exceeds budget 80

When a function follows all the rules, it can be inlined. However, that optimization comes with some issues regarding the developer experience.

Challenge

During the process of inlining, it removes some function calls, meaning the program is getting modified. However, when a panic occurs, the developers need to know the exact stack traces to get the file and the line where it happened. Here is the same program with an inlined method containing a panic:

func add(a, b float32) float32 {

if b < 0 {

panic(`Do not add negative number`)

}



return a+b

}

Running the program shows the panic at the correct line although the code is inlined:

panic: Do not add negative number



goroutine 1 [running]:

main.add(...)

op.go:5

main.sum(0xc00007cf2c, 0x5, 0x5, 0xc00007cf20)

main.go:14 +0x80

main.main()

main.go:7 +0x59

exit status 2

Go keeps a mapping internally with the inlined functions. It generates first an inline tree that you can visualize thanks to the flag -gcflags="-d pctab=pctoinline" . Here is the tree for the method sum build from the assembly code:

The value -1 represents the parent function sub . Go maps the inlined functions in the generated code. It also maps the lines, you can visualize it with the flag -gcflags="-d pctab=pctoline" . Here is the output for the method sum :

The files are mapped as well, and can be displayed with the flag -gcflags="-d pctab=pctofile" . Here is the output:

We now have a proper mapping of each the generated instructions:

This table can now be embedded in the binary and read at the runtime to generate accurate stack traces.

Impact

Inlining is important and be critical for applications that need high performance. A function call has an overhead — creation of a new stack frame, save and restore registers — and can be avoided with inlining. However, the copy of the body rather than a function call increases the binary size. Here is an example with the benchmark suite go1 with and without inlining:

name old time/op new time/op delta

BinaryTree17-8 2.34s ± 2% 2.43s ± 3% +3.77%

Fannkuch11-8 2.21s ± 1% 2.26s ± 1% +2.01%

FmtFprintfEmpty-8 33.6ns ± 6% 35.2ns ± 3% +4.85%

FmtFprintfString-8 55.3ns ± 3% 62.8ns ± 1% +13.48%

FmtFprintfInt-8 63.1ns ± 3% 70.0ns ± 2% +11.04%

FmtFprintfIntInt-8 95.9ns ± 3% 102.3ns ± 3% +6.68%

FmtFprintfPrefixedInt-8 105ns ± 4% 111ns ± 1% +5.83%

FmtFprintfFloat-8 165ns ± 4% 175ns ± 1% +6.16%

FmtManyArgs-8 405ns ± 2% 427ns ± 0% +5.38%

GobDecode-8 4.69ms ± 2% 4.78ms ± 4% +1.77%

GobEncode-8 3.84ms ± 2% 3.93ms ± 3% ~

Gzip-8 210ms ± 3% 208ms ± 1% ~

Gunzip-8 28.1ms ± 7% 29.4ms ± 1% +4.69%

HTTPClientServer-8 70.0µs ± 2% 70.9µs ± 1% +1.21%

JSONEncode-8 7.28ms ± 5% 7.00ms ± 2% -3.91%

JSONDecode-8 33.9ms ± 3% 33.1ms ± 1% -2.32%

Mandelbrot200-8 3.74ms ± 0% 3.74ms ± 1% ~

The performance with inlining are ~5/6% better than without for this benchmark suite.