I like the tone in blogs where the author doesn’t know something, then works through it in the blog until both they and the reader knows it. This isn’t one of those. In this case I know something, and I’ve realised not everyone does, particularly if they’ve come to Go from Python or Ruby, where this kind of stuff barely matters, rather than from C, where it constantly punches you in the face.

I’m going to try to explain how Go lays out structures in memory, and what they look like in terms of bits and bytes. Hopefully I’ll succeed, otherwise reading this will be very dull and confusing.

Imagine you have a structure like the following.

type MyData struct {

aByte byte

aShort int16

anInt32 int32

aSlice []byte

}

Then what actually is this structure? Fundamentally, its a description of how you lay out data in memory. But what does that mean, and how does the compiler lay things out? Lets have a look. First lets use reflection to examine the fields in the structure.

Upon Reflection

Here’s some code that uses reflection to find out the size of our fields, and their offset (where they lie in memory relative to the start of the structure). Reflection is cool. It tells us what the compiler thinks about types, including structures.

// First ask Go to give us some information about the MyData type

typ := reflect.TypeOf(MyData{})

fmt.Printf("Struct is %d bytes long

", typ.Size()) // We can run through the fields in the structure in order

n := typ.NumField()

for i := 0; i < n; i++ {

field := typ.Field(i)

fmt.Printf("%s at offset %v, size=%d, align=%d

",

field.Name, field.Offset, field.Type.Size(),

field.Type.Align())

}

And here’s the result. As well as the offset and size of each field, I’ve also printed the align for each field, which I’ll obliquely refer to later.

Struct is 32 bytes long

aByte at offset 0, size=1, align=1

aShort at offset 2, size=2, align=2

anInt32 at offset 4, size=4, align=4

aSlice at offset 8, size=24, align=8

aByte is the first field in our structure, at offset 0. It uses 1 byte of memory.

aShort is the second field. It uses 2 bytes of memory. Mysteriously it is at offset 2. Why is this? The answer is a mixture of safety, efficiency and convention. CPUs are better at accessing 2 byte numbers that lie at addresses that are a multiple of 2 bytes (on a “2-byte boundary”), and accessing 4 byte quantities that lie on a 4-byte boundary, etc, up to the CPU’s natural integer size, which on modern CPUs is 8 bytes (64 bits).

On some older RISC CPUs accessing mis-aligned numbers caused a fault: on some UNIX systems this would be a SIGBUS, and it would stop your program (or the kernel) dead in its tracks. Some systems had the ability to handle these faults and fix-up the misalignment: your code would run, but it would run slowly as additional code would be run by the OS to fix up the mistake. I believe Intel & ARM CPUs just handle any misalignment on-chip: perhaps we’ll test that, and any performance impact, in a later post.

Anyway, alignment is the reason the Go compiler skips a byte before placing the field aShort so that it sits on a 2-byte boundary. And because of this we can squeeze another field into the structure without making it any larger. Here’s a new version of our structure with a new field anotherByte immediately after aByte .

type MyData struct {

aByte byte

anotherByte byte

aShort int16

anInt32 int32

aSlice []byte

}

If we run the reflection code again we see that anotherByte fits in the spare space between aByte and aShort . It sits at offset 1, and aShort is still at offset 2. And now is probably the time to pay attention to that mysterious align field I referred to earlier. This tells us, and the Go complier, how the field needs to be aligned.

Struct is 32 bytes long

aByte at offset 0, size=1, align=1

anotherByte at offset 1, size=1, align=1

aShort at offset 2, size=2, align=2

anInt32 at offset 4, size=4, align=4

aSlice at offset 8, size=24, align=8

Show me the memory!

But what does our structure actually look like in memory? Lets see if we can find out. First let’s built an instance of MyData with some values filled in. I’ve picked values that should be easy to spot in memory.

data := MyData{

aByte: 0x1,

aShort: 0x0203,

anInt32: 0x04050607,

aSlice: []byte{

0x08, 0x09, 0x0a,

},

}

Now some code to access the bytes that make up this structure. We want to take this instance of our structure, find its address in memory, and print out the bytes in that memory.

We use the alarmingly named unsafe package to help us do this. This lets us bypass the Go type system to convert a pointer to our structure to a 32 byte array, which will show us the bytes that make up the memory behind our structure.

dataBytes := (*[32]byte)(unsafe.Pointer(&data))

fmt.Printf("Bytes are %#v

", dataBytes)

We run our unsafe code, cross our fingers, and nothing bad happens. This is the result, with the first field, aByte , from our structure in bold. This is hopefully what you expect, the single byte aByte = 0x01 at offset 0.

Bytes are &[32]uint8{0x1, 0x0, 0x3, 0x2, 0x7, 0x6, 0x5, 0x4, 0x5a, 0x5, 0x1, 0x20, 0xc4, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}

And the least shall be first

Next we look at aShort . This is at offset 2 with length 2. If you remember, aShort = 0x0203 , but the data shows the bytes in the other order. This is because most modern CPUs are Little-Endian: the lowest order bytes from the value come first in memory.

Bytes are &[32]uint8{0x1, 0x0, 0x3, 0x2, 0x7, 0x6, 0x5, 0x4, 0x5a, 0x5, 0x1, 0x20, 0xc4, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}

The same thing happens for anInt32 = 0x04050607 . The lowest-order byte comes first in memory.

Bytes are &[32]uint8{0x1, 0x0, 0x3, 0x2, 0x7, 0x6, 0x5, 0x4, 0x5a, 0x5, 0x1, 0x20, 0xc4, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}

Mysterious interlude

Now what do we see next? This is aSlice = []byte{0x08, 0x09, 0x0a} , 24 bytes at offset 8. I don’t see any sign of my sequence 0x08, 0x09, 0x0a anywhere in this. What’s going on?

Bytes are &[32]uint8{0x1, 0x0, 0x3, 0x2, 0x7, 0x6, 0x5, 0x4, 0x5a, 0x5, 0x1, 0x20, 0xc4, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}

The Go reflect package has the answer. A slice is represented in Go by the following structure, which starts with a pointer Data to the memory holding the data in the slice; then the length Len of the useful data in that memory, and the size Cap of the piece of memory.



Data

Len

Cap

} type SliceHeader struct {Data uintptr Len int Cap int

If we feed this into our code we get the following offsets and sizes. The Data pointer and the two lengths are 8 bytes each, with 8 byte alignment.

Struct is 24 bytes long

Data at offset 0, size=8, align=8

Len at offset 8, size=8, align=8

Cap at offset 16, size=8, align=8

If we look again at the memory behind out structure we can see the Data is at address 0x000000c42001055a. After that we see both the Len and Cap are 3, the length of our data.

Bytes are &[32]uint8{0x1, 0x0, 0x3, 0x2, 0x7, 0x6, 0x5, 0x4, 0x5a, 0x5, 0x1, 0x20, 0xc4, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}

We can get access these data bytes directly with the following code. This first gets us direct access to the slice header, then prints out the memory that Data points to.

dataslice := *(*reflect.SliceHeader)(unsafe.Pointer(&data.aSlice))

fmt.Printf("Slice data is %#v

",

(*[3]byte)(unsafe.Pointer(dataslice.Data)))

And this is what we see.

Slice data is &[3]uint8{0x8, 0x9, 0xa}

And that’s plenty enough for now. Hit the “like” button if you, erm…, liked reading this.