Introduction

Go is a strongly typed language. This means that you can’t concatenate for example a string with an integer without first converting the integer to a string. For this to be enforced, there needs to be a way for the runtime to track all the different types. In terms of Go, all the types have a definition that is included in the binary. By parsing all of these type definitions, it is possible to reconstruct all the types inside the binary which can aid the analysis of a suspicious application/malware. This post will walk through where this data is located, how to extract and parse it so the type definitions can be reconstructed for all the types in the binary.

It all starts with moduledata

As described in a previous blog post, the moduledata structure holds a pointer to some very important data structures in the Go binary. For recovering type-information, we are mainly interested in two data structures: types and typelinks . Below is the current moduledata structure as of this writing.

type moduledata struct { pclntable []byte ftab []functab filetab []uint32 findfunctab uintptr minpc, maxpc uintptr text, etext uintptr noptrdata, enoptrdata uintptr data, edata uintptr bss, ebss uintptr noptrbss, enoptrbss uintptr end, gcdata, gcbss uintptr types, etypes uintptr textsectmap []textsect typelinks []int32 // offsets from types itablinks []*itab ptab []ptabEntry pluginpath string pkghashes []modulehash modulename string modulehashes []modulehash hasmain uint8 // 1 if module contains the main function, 0 otherwise gcdatamask, gcbssmask bitvector typemap map[typeOff]*_type // offset to *_rtype in previous module bad bool // module failed to load and should be ignored next *moduledata }

The moduledata structure has been relatively stable in the last few releases of the Go compiler. In version 1.8 the field textsectmap was added which means the offset for the typelinks slice is different between 1.7 and 1.8+, the moduledata structure for 1.7 is shown below, otherwise, it has been unchanged.

type moduledata struct { pclntable []byte ftab []functab filetab []uint32 findfunctab uintptr minpc, maxpc uintptr text, etext uintptr noptrdata, enoptrdata uintptr data, edata uintptr bss, ebss uintptr noptrbss, enoptrbss uintptr end, gcdata, gcbss uintptr types, etypes uintptr typelinks []int32 // offsets from types itablinks []*itab modulename string modulehashes []modulehash gcdatamask, gcbssmask bitvector typemap map[typeOff]*_type // offset to *_rtype in previous module next *moduledata }

All the type-information is located in the types data. The types data not only holds the type-information, but it also holds other data about the types. To find the type-information, the typelinks slice is needed. This slice holds offsets from the beginning of the types to where the information of a type is stored. Unfortunately, offsets for all types are not located within this slice, but it is still possible to find all types using this array.

Parsing the type-information

The offsets in the typelinks points to a data structure that describes the type. The data structure is used by Go track all the different types within the binary. The structure is defined in three places: the compiler, the reflect package, and the runtime package. In the runtime package, the name of the structure is _type and in the reflect package it is called rtype . The definition of the rtype structure is shown below.

type rtype struct { size uintptr ptrdata uintptr // number of bytes in the type that can contain pointers hash uint32 // hash of type; avoids computation in hash tables tflag tflag // extra type-information flags align uint8 // alignment of variable with this type fieldAlign uint8 // alignment of struct field with this type kind uint8 // enumeration for C alg *typeAlg // algorithm table gcdata *byte // garbage collection data str nameOff // string form ptrToThis typeOff // type for pointer to this type, may be zero }

As said earlier, all types in the binary have a corresponding _type / rtype structure. This includes all the primitive types and user-defined types. The kind field is an enum value corresponding to the underlying primitive type. All the possible options are shown below.

const ( Invalid Kind = iota Bool Int Int8 Int16 Int32 Int64 Uint Uint8 Uint16 Uint32 Uint64 Uintptr Float32 Float64 Complex64 Complex128 Array Chan Func Interface Map Ptr Slice String Struct UnsafePointer )

Another interesting field is str . This value is an offset from the beginning of the types data to where a packed byte structure exists with the type’s name and other string information. For example, the primitive type Int will also have the name of int , but derived types are different. Say you have defined a type superInt as below. Its name would be superInt while the kind enum is an Int .

type superInt int

The tflag field is a bitmask that is used to inform about potentially other data after the structure as described in the source code snippet shown below.

// tflag is used by an rtype to signal what extra type-information is // available in the memory directly following the rtype value. // // tflag values must be kept in sync with copies in: // cmd/compile/internal/gc/reflect.go // cmd/link/internal/ld/decodesym.go // runtime/type.go type tflag uint8 const ( // tflagUncommon means that there is a pointer, *uncommonType, // just beyond the outer type structure. // // For example, if t.Kind() == Struct and t.tflag&tflagUncommon != 0, // then t has uncommonType data and it can be accessed as: // // type tUncommon struct { // structType // u uncommonType // } // u := &(*tUncommon)(unsafe.Pointer(t)).u tflagUncommon tflag = 1 << 0 // tflagExtraStar means the name in the str field has an // extraneous '*' prefix. This is because for most types T in // a program, the type *T also exists and reusing the str data // saves binary size. tflagExtraStar tflag = 1 << 1 // tflagNamed means the type has a name. tflagNamed tflag = 1 << 2 )

An uncommonType

As mentioned in the previous section, some times can be uncommon types. So what are uncommon types? It turns out that they are more common than you think. In Go, any type can have methods associated with it. This is done by the example shown below.

type T struct{} func (t T) myMethod()

In the code snippet, myMethod is method for the type T . This makes T an uncommon type. In other words, uncommon types are types with methods.

Information about the type’s methods is defined in the uncommon structure. As described in the section above, this structure is located right after the type structure. The layout of the uncommonType structure is shown below. It holds information about the import path, the number of methods (total and exported), and an offset from this structure to an array of method data structures. This is the current definition of the structure as the release of Go 1.13beta1 and its general shape has been like this since the first release of Go 1.7. Versions before 1.7 have a very different look.

type uncommonType struct { pkgPath nameOff // import path; empty for built-in types like int, string mcount uint16 // number of methods xcount uint16 // number of exported methods moff uint32 // offset from this uncommontype to [mcount]method _ uint32 // unused }

Go 1.7beta1 was the first release with the new design of this structure. Its uncommonType is shown below. It is much smaller than the current one, but it essentially holds the same information. This structure definition is unique and does not exist any binaries produced by other versions of the Go compiler.

type uncommonType struct { pkgPath nameOff // import path; empty for built-in types like int, string mcount uint16 // number of methods moff uint16 // offset from this uncommontype to [mcount]method }

The general shape of the structure was released with the release of Go 1.7beta2. It is the same size as the current structure but the xcount field is unused. For extracting the methods, this has no noticeable effect.

type uncommonType struct { pkgPath nameOff // import path; empty for built-in types like int, string mcount uint16 // number of methods _ uint16 // unused moff uint32 // offset from this uncommontype to [mcount]method _ uint32 // unused }

One of the fields in the structure, moff , points to an array of method structures. The definition of this structure is shown below.

// Method on non-interface type type method struct { name nameOff // name of method mtyp typeOff // method type (without receiver) ifn textOff // fn used in interface call (one-word receiver) tfn textOff // fn used for normal method call }

The mtyp field is an offset to the function type for the method. It is a _type/rtype structure with the kind value of Func . More on this type later. Both of the ifn and tfn fields points to offsets in the text section of the binary. This where function code is located.

When analyzing real binaries, it turns out that some methods do not have a method type or an offset in the text section. Below is an analysis of a binary. In the snippet, the method array for *strconv.decimal is walked and the values are printed. It can be seen that most of them do not have a method type and some of the functions do not have offsets to function code.

*strconv.decimal has 9 methods Method 1 name: Assign Function at 0x58930 and 0x58930 Method 2 name: Round Function at 0x59170 and 0x59170 Method 3 name: RoundDown Function at 0x592d0 and 0x592d0 Method 4 name: RoundUp Function at 0x59320 and 0x59320 Method 5 name: RoundedInteger Function at 0x0 and 0x0 Method 6 name: Shift Function at 0x590a0 and 0x590a0 Method 7 name: String Method type: func() string Function at 0x58310 and 0x58310 Method 8 name: floatBits Function at 0x0 and 0x0 Method 9 name: set Method type: func(string) bool Function at 0x0 and 0x0

The symbols in the binary, shown below, also confirms that some functions are missing.

0x00458720 sym.strconv.__decimal_.String 0x00458bf0 sym.strconv.__decimal_.Assign 0x00459130 sym.strconv.__decimal_.Shift 0x00459200 sym.strconv.__decimal_.Round 0x004592d0 sym.strconv.__decimal_.RoundUp 0x00459710 sym.strconv.__extFloat_.FixedDecimal 0x00459c10 sym.strconv.__extFloat_.ShortestDecimal 0x0045e210 sym.type..hash.strconv.decimal 0x0045e270 sym.type..eq.strconv.decimal

It turns out that the Go compiler does some pruning of methods that are not used. While not all information is always present, the name of the method is still available which can be used for further analysis.

Some of Go Types

Each primitive type has a corresponding data type in the runtime. All of these data types are structures and the _type / rtype is the first field. It is an anonymous field so hence embedded. This means, when parsing the type data, all the extra data for the specific type is usually located right after the _type / rtype data. The kind field can be used to figure out what type and what data will be right after the _type / rtype structure.

Struct type

The structType data type, shown below, is used to store information about each type derived from the primitive struct type. It has two extra field, pkgPath , and fields . The pkgPath field is the import name of the package while the fields is a slice of structField , also shown below, which are used to store information about the fields. The structField structure has three fields. The first one is the name of the field, the second is a pointer to a _type / rtype structure that can be used to determine the type of the field, the last is an integer that encodes the offset and if the field is embedded/anonymous.

// structType represents a struct type. type structType struct { rtype pkgPath name fields []structField // sorted by offset } // Struct field type structField struct { name name // name is always non-empty typ *rtype // type of field offsetEmbed uintptr // byte offset of field<<1 | isEmbedded } func (f *structField) offset() uintptr { return f.offsetEmbed >> 1 } func (f *structField) embedded() bool { return f.offsetEmbed&1 != 0 }

If the struct type has some methods attached to it, it is an uncommon type. In this scenario, the uncommon data structure is right after the structType data as shown below.

type structTypeUncommon struct { structType u uncommonType }

Pointer type

Pointers to types have their own type called ptrType , it is shown in the code block below. It essentially just adds a pointer to a _type / rtype for the type it points to. This means, for example, *int and *uint are two different types and have their own ptrType structure stored in the binary.

// ptrType represents a pointer type. type ptrType struct { rtype elem *rtype // pointer element (pointed at) type }

One note when it comes to methods. If a pointer receiver is used when defining a method, as seen in the example below, the methods will be attached to *myThing and not myThing .

type myThing struct{} func (m *myThing) DoSomething()

Interface type

The data structure for interfaces is simple and is shown below. It has essentially two additional fields. One for the import pathname and a slice of imethod . The imethod structure, also shown below, provides information about the functions that need to be implemented to satisfy the interface. The first field in the imethod structure is the name. This is the function name. The second field is the offset to a _type / rtype structure. This structure is of the “kind” function and hence provide information about the function definition, i.e., types for the function arguments and return values.

// interfaceType represents an interface type. type interfaceType struct { rtype pkgPath name // import path methods []imethod // sorted by hash } // imethod represents a method on an interface type type imethod struct { name nameOff // name of method typ typeOff // .(*FuncType) underneath }

Map type

The map type is probably the most complex structures of all the types. It is shown below. It has information about a bunch of sizes that are used under the hood. Luckily, this is created by the compiler and the programmer has no control over it so it can be ignored. The fields that are of interest are key and elem . By parsing these values, it is possible to reconstruct the source code representation of the type definition. The fields are pointers to two _type / rtype structures and essentially corresponds to map[key]elem .

// mapType represents a map type. type mapType struct { rtype key *rtype // map key type elem *rtype // map element (value) type bucket *rtype // internal bucket structure keysize uint8 // size of key slot valuesize uint8 // size of value slot bucketsize uint16 // size of bucket flags uint32 }

Slice and array type

The slice and array types are very similar, both shown below. The slice type information is recorded in the elem field and for arrays, the length is stored in the len field.

// sliceType represents a slice type. type sliceType struct { rtype elem *rtype // slice element type } // arrayType represents a fixed array type. type arrayType struct { rtype elem *rtype // array element type slice *rtype // slice type len uintptr }

Channel type

Similar to the array, slice, and map type, the chanType also has a field called elem to track what type is sent over the channel. It also has an enum to indicate if the channel only can receive, only send, or send and receive.

// chanType represents a channel type. type chanType struct { rtype elem *rtype // channel element type dir uintptr // channel direction (ChanDir) }

Function type

Since functions in Go are first-class citizens, there is also a type definition for function types. The following code snippet is taken from the standard library describing the type. Since it’s possible for all types to have methods, making them an uncommonType , function types can also have methods. When this happens, the code snippet below describes how the data is stored in the binary. The funcType just has two additional fields after the rtype / _type structure, a uint16 for the number of function arguments and a uint16 for the number of function return values. The type-information for the function arguments and return values are stored in an array right after the funcType data structure.

// funcType represents a function type. // // A *rtype for each in and out parameter is stored in an array that // directly follows the funcType (and possibly its uncommonType). So // a function type with one method, one input, and one output is: // // struct { // funcType // uncommonType // [2]*rtype // [0] is in, [1] is out // } type funcType struct { rtype inCount uint16 outCount uint16 // top bit is set if last input parameter is ... }

Conclusion

All the types used by a Go application are stored within a types section inside the binary. By parsing this data structure, it is possible to fully recover all the function definitions. This includes private types and fields.