Illustration created for “A Journey With Go”, made from the original Go Gopher, created by Renee French.

ℹ️ This article is based on Go 1.12.

Go provides a light and smart goroutines management. Light because the goroutine stack starts at 2Kb only, and smart since goroutines can grow / shrink automatically according to our needs.

Regarding the size of the stack, we can find it in runtime/stack.go :

// The minimum size of stack used by Go code

_StackMin = 2048

We should note that it has evolved through the time:

Go 1.2: goroutine stack has been increased from 4Kb to 8Kb.

Go 1.4: goroutine stack has decreased from 8Kb to 2Kb.

The stack size has moved due to the strategy of the stack allocation. We will go back to this topic later in this article.

This default stack size is sometimes not enough to run our program. This is when Go automatically adjusts the size of the stack.

Dynamic stack size

If Go can automatically grow the size of the stack, it is also able to determine that the allocation size will not have to change. Let’s take an example and analyze how it works:

func main() {

a := 1

b := 2



r := max(a, b)

println(`max: `+strconv.Itoa(r))

}



func max(a int, b int) int {

if a >= b {

return a

}



return b

}

This first example just calculates the higher number among 2 integers. In order to know how Go manages the allocation of the goroutine’s stack, we can look at the Go’s assembler code with the command: go build -gcflags -S main.go . The output — I just left the lines that are related to the stack allocation — give us some interesting lines that can show what Go is doing:

"".main STEXT size=186 args=0x0 locals=0x70

0x0000 00000 (/go/src/main.go:5) TEXT "".main(SB), ABIInternal, $112-0

[...]

0x00b0 00176 (/go/src/main.go:5) CALL runtime.morestack_noctxt(SB) [...] 0x0000 00000 (/go/src/main.go:13) TEXT "".max(SB), NOSPLIT|ABIInternal, $0-24

There are two instructions that involves the stack changes:

- CALL runtime.morestack_noctxt : this method will increase the size of the stack if it needs more.

- NOSPLIT : this instruction means that the stack overflow check is not needed. It is similar to the compiler directive //go:nosplit .

If we look at the method runtime.morestack_noctxt , it will call the method newstack from runtime/stack.go :

func newstack() {

[...]

// Allocate a bigger segment and move the stack.

oldsize := gp.stack.hi - gp.stack.lo

newsize := oldsize * 2

if newsize > maxstacksize {

print("runtime: goroutine stack exceeds ", maxstacksize, "-byte limit

")

throw("stack overflow")

}



// The goroutine must be executing in order to call newstack,

// so it must be Grunning (or Gscanrunning).

casgstatus(gp, _Grunning, _Gcopystack)



// The concurrent GC will not scan the stack while we are doing the copy since

// the gp is in a Gcopystack status.

copystack(gp, newsize, true)

if stackDebug >= 1 {

print("stack grow done

")

}

casgstatus(gp, _Gcopystack, _Grunning)

}

The size of the current stack is first calculated from the boundaries gp.stack.hi and gp.stack.li that are pointers to the beginning and end of the stack:

type stack struct {

lo uintptr

hi uintptr

}

Then the current size is multiplied by 2 and checked if it does not exceed the max allowed size — that size depends on the architecture:

// Max stack size is 1 GB on 64-bit, 250 MB on 32-bit.

// Using decimal instead of binary GB and MB because

// they look nicer in the stack overflow failure message.

if sys.PtrSize == 8 {

maxstacksize = 1000000000

} else {

maxstacksize = 250000000

}

Now that we know the behavior, we can write a simple example to verify all of that. In order to debug, we will set the constant stackDebug that we have seen in the newstack method to 1 and run:

func main() {

var x [10]int

a(x)

}



//go:noinline

func a(x [10]int) {

println(`func a`)

var y [100]int

b(y)

}



//go:noinline

func b(x [100]int) {

println(`func b`)

var y [1000]int

c(y)

}



//go:noinline

func c(x [1000]int) {

println(`func c`)

}

The instruction //go:noinline will avoid inlining all functions in the main function. If the inlining is done by the compiler, we will not see the dynamic growth of the stacks in each function prolog.

Here is a part of the debug we got:

runtime: newstack sp=0xc00002e6d8 stack=[0xc00002e000, 0xc00002e800]

stack grow done

func a runtime: newstack sp=0xc000076888 stack=[0xc000076000, 0xc000077000]

stack grow done runtime: newstack sp=0xc00003f888 stack=[0xc00003e000, 0xc000040000]

stack grow done runtime: newstack sp=0xc000081888 stack=[0xc00007e000, 0xc000082000]

stack grow done

func b runtime: newstack sp=0xc0000859f8 stack=[0xc000082000, 0xc00008a000]

func c

We can see that the stack has grown 4 times. Indeed, the function prolog will grow the stack as much as necessary to fit with the needs. As we have seen in the code, the stack size is defined by the boundaries of the stack, so we can calculate the new stack size in each case — the instruction newstack stack=[...] provides the pointers of the current stack boundaries:

runtime: newstack sp=0xc00002e6d8 stack=[0xc00002e000, 0xc00002e800]

0xc00002e800 - 0xc00002e000 = 2048 runtime: newstack sp=0xc000076888 stack=[0xc000076000, 0xc000077000]

0xc000077000 - 0xc000076000 = 4096 runtime: newstack sp=0xc00003f888 stack=[0xc00003e000, 0xc000040000]

0xc000040000 - 0xc00003e000 = 8192 runtime: newstack sp=0xc000081888 stack=[0xc00007e000, 0xc000082000]

0xc000082000 - 0xc00007e000 = 16384 runtime: newstack sp=0xc0000859f8 stack=[0xc000082000, 0xc00008a000]

0xc00008a000 - 0xc000082000 = 32768

The investigation in the internals did show us that the stack of a Goroutine starts a 2Kb and increased as much as necessary in the function prolog, added at the compilation, till the memory is enough or the limit of the stack is reached.

Stack allocation management

The dynamic allocation system is not the only point that could impact our applications. The way it is allocated could have a great impact as well. Let’s try to understand how it is managed from the full trace of the two first stack growths:

runtime: newstack sp=0xc00002e6d8 stack=[0xc00002e000, 0xc00002e800]

copystack gp=0xc000000300 [0xc00002e000 0xc00002e6e0 0xc00002e800] -> [0xc000076000 0xc000076ee0 0xc000077000]/4096

stackfree 0xc00002e000 2048

stack grow done runtime: newstack sp=0xc000076888 stack=[0xc000076000, 0xc000077000]

copystack gp=0xc000000300 [0xc000076000 0xc000076890 0xc000077000] -> [0xc00003e000 0xc00003f890 0xc000040000]/8192

stackfree 0xc000076000 4096

stack grow done

The first instruction shows the address of the current stack, stack=[0xc00002e000, 0xc00002e800] and will copy it to a new one twice as big, copystack [0xc00002e000 [...] 0xc00002e800] -> [0xc000076000 [...] 0xc000077000] , 4096 bits length as we have seen previously. Then the previous stack is now freed: stackfree 0xc00002e000 . Here is a schema that could help to visualize what is happening:

Golang stack growth with contiguous stack

The instruction copystack copies the entire stack and will move all addresses to this new stack. We can verify that easily with the small modification of your code:

func main() {

var x [10]int

println(&x)

a(x)

println(&x)

}

It now prints the address of the value:

0xc00002e738

[...]

0xc000089f38

The address 0xc00002e738 is contained in the first stack address we saw stack=[0xc00002e000, 0xc00002e800] , while 0xc000089f38 is included in the last stack boundaries stack=[0xc000082000, 0xc00008a000] that we have in the debug trace. It confirms that all values have been moved from stack to stack.

Also, it is interesting to note that the stack will shrink, if needed, when the garbage collection is triggered.

In our example, after the function call, there is no other valid frames than the main one in the stack, so the system will be able to shrink it if the garbage collector runs. For that, we can just force the garbage collector to run:

func main() {

var x [10]int

println(&x)

a(x)

runtime.GC()

println(&x)

}

The debug trace now displays the shrink of the stack:

func c

shrinking stack 32768->16384

copystack gp=0xc000000300 [0xc000082000 0xc000089e60 0xc00008a000] -> [0xc00007e000 0xc000081e60 0xc000082000]/16384

As we can see, the stack size has been divided by 2 and re-used a previous stack address stack=[0xc00007e000, 0xc000082000] . Here again we can see in the runtime/stack.go — shrinkstack() that the shrink always divides the current size by 2:

oldsize := gp.stack.hi - gp.stack.lo

newsize := oldsize / 2

Contiguous stack VS segmented stack

The strategy to copy the stack into a bigger space is called contiguous stack as opposed to segmented stack. Go has moved to a contiguous stack in Go 1.3. In order to see the difference, we will run the same example with Go 1.2. Here again, we will need to update the constant stackDebug to display the trace. For that, since the runtime was written in C for this version, we will have to compile the source . Here is the result:

func a runtime: newstack framesize=0x3e90 argsize=0x320 sp=0x7f8875953848 stack=[0x7f8875952000, 0x7f8875953fa0]

-> new stack [0xc21001d000, 0xc210021950]

func b func c

runtime: oldstack gobuf={pc:0x400cff sp:0x7f8875953858 lr:0x0} cret=0x1 argsize=0x320

The current stack stack=[0x7f8875952000, 0x7f8875953fa0] is 8Kb in length (8192 bytes + the size of the top of the stack) and the new stack created is 18864 bytes (18768 bytes + the size of the top of the stack). The memory to be allocated is the following:

// allocate new segment.

framesize += argsize;

framesize += StackExtra; // room for more functions, Stktop.

if(framesize < StackMin)

framesize = StackMin;

framesize += StackSystem;

For the constants, StackExtra is set to 2048, StackMin is set to 8192, and StackSystem is set to a minimum of 0 till more than 512.

So, our new stack is composed as: 16016 (frame size) + 800 (arguments) + 2048 (StackExtra) + 0 (StackSystem).

Once all the functions are called, the new stack is now freed (log runtime: oldstack ). This behavior was one of the reasons that pushed Golang team to move to a contiguous stack:

Current split stack mechanism has a “hot split” problem — if the stack is almost full, a call will force a new stack chunk to be allocated. When that call returns, the new stack chunk is freed. If the same call happens repeatedly in a tight loop, the overhead of the alloc/free causes significant overhead

Go had to increase the minimum of the stack in 1.2 to 8Kb for this reason and was later able to reduce it back to 2Kb after the implementation of the contiguous stack.

Here is an update of our previous graph with the segmented stack:

Golang stack growth with segmented stack

Conclusion

The stack management by Go is efficient and quite easy to understand. Golang is not the only language that has chosen to not use the segmented stack, Rust has also decided to not go for this solution for the same reasons.

If you want to go deeper into the stack details, I also suggest you read the blog post by Dave Cheney that talks about the redzone, along with the post from Bill Kennedy that explains the frames in the stack.