ja

This article describes some techniques used in the Swift language Runtime Library.

What’s the Runtime library?

Runtime Library for Swift language is named swiftCore which is required to execute Swift application. You might haven’t care about the library in most situations, but it’s linked to an executable file dynamically. Since ABI stability has been achieved since Swift 5.0, the runtime library is installed in /usr/lib/swift/libswiftCore.dylib after macOS 10.14.4.

The Swift runtime library includes some functions to implement Swift’s dynamic language features.

The two aspects of language features

Language features can be categorized into two by time of use.

Static language features Dynamic language features

Static language features are used before execution. For example, syntax, type system and so on. On the other hand, Dynamic language feature are used while executing. For example, runtime type system, exception handling, binary compatibility and etc…

The main purpose of runtime library is to support these Dynamic language features. The features are inlinable in a binary swiftc generates, but since some features are complex, code size can be bigger. So these language features are separated in runtime libraries.

Example: Memory allocation

Let’s follow how Swift allocate memory as an example of runtime function.

Most of Swift’s struct can be fixed size, so compiler generates code to allocate fixed size memory.

In this example, pet variable is Pet type, so compiler can estimate the size.

// 24 byte

struct Pet {

let name: String // 16 byte

let age: Int // 8 byte

} let pet = Pet(name: ..., age: ...) // malloc 24 byte

On the other hand, there are some case that can’t be fixed size allocation.

Let’s see that class instance allocation.

// 8 byte

class View {

var point: (x: Float, y: Float) // 8 byte

required init(...) { } func copy() -> Self {

return Self(...) // malloc n byte?

}

} // 24 byte

class TextView: View {

var text: String // 16 byte

required init(...) { }

} let view: View = TextView(...) // Can't determine the size of allocation by type

let anotherView: View = view.copy()

In this example, the View type implements copy method which creates an instance from Self type. Self type can be the View or TextView type, but compiler cannot statically determine which type is used at runtime.

In order for such Self code to execute correctly, the instance itself have to have the instance's type info at runtime.

Metadata

The type information used at runtime are called Type Metadata in Swift. They are basically invisible to Swift users and not usable directly from Swift.

Type Metadata stores type sizes, function tables for dynamic dispatch, generic type parameters, pointer to Type Descriptors, and so on.

The runtime library implements dynamic language features using those Metadata.

In the memory allocation example, the type metadata embedded in the instance is extracted out and passed to the runtime function so that the instance size can be determined dynamically.

Type Descriptor has only info independent of a specific type instance, not affected by generic type parameters.

Generic type’s metadata can be created dynamically for specific type parameters at runtime. However, using the two-layers structure of Type Metadata and Type Descriptor, the same Type Descriptor can be reused for each generic type metadata.

Background knowledge

TEXT segment

First, I have to describe the layout of object file to tell you how effective the techniques are.

Here, I will write about only Mach-O format used in macOS. But there are not so much change with other binary file formats.

Basically, the executable file is separated into a TEXT and DATA segments. TEXT segment contains machine code and DATA segment contains global variables. The major difference between these segments is whether it is writable or not while running. The TEXT segment is read-only during execution, but the DATA segment is writable.

The read-only TEXT segment is very useful when using dynamic libraries or forking processes.

When loading the same dynamic library in two processes, a program loader can share the memory space allocated for the TEXT segment because they are read-only. In other words, the greater percentage of the TEXT segment in a binary, the more efficiently the memory space is used.

Swift uses effective methods to put as much metadata as possible in the TEXT segment.

Data structure techniques

Here is main part of this article!

Relative Pointer

Relative Pointer is a pointer that has an offset from the address of the pointer itself to the target address.

All pointers included in Swift metadata are in this Relative Pointer format.

struct RelativePointer <Pointee> {

var offset: Int32

mutating func pointee ()-> Pointee {

withUnsafePointer (to: & self) {[offset] pointer in

let rawPointer = UnsafeRawPointer (pointer)

let advanced = rawPointer.advanced (by: Int (offset))

return advanced.assumingMemoryBound (to: Pointee.self)

}.pointee

}

}

There are several advantages to use RelativePointer instead of normal pointers.

Saving binary size Reduce relocation on load time Metadata get to be able to be included in the TEXT segment

First, Relative Pointer uses a linker relocation system that calculates the address difference between two symbols. Since the result of this relocation is made on the premise that it fits into a signed 32 bit integer, a normal pointer consumes 64 bits, but a Relative Pointer can be represented by half that 32 bits.

Since normal pointers are relocated after launch, there are some overhead before the program starts. On the other hand, the calculation of the address difference between two symbols has been done at link time, so there is no overhead on load time.

And, by eliminating relocation record on load time, metadata can be position-independent data. In other words, like the executable program, it can be placed in the TEXT segment and share memory space with other processes.

Of course, there is an overhead of adding offsets when dereference the pointer, but Swift uses this method to take the above advantages.

Indirect Pointer

There are also several techniques used with Relative Pointer. One example is Indirect Pointer.

Indirect Pointer is a technique for expressing pointers via GOT and normal pointers in the same type. This uses lower bits that are not used due to alignment.

For example, a 32-bit numeric type is arranged so that the address is a multiple of 4.

In other words, the lower 2 bits of the address are always 0. Also, since the Relative Pointer itself is a 32-bit signed integer, the lower 2 bits of the difference value are also 0.

The Swift runtime library uses this low-order bit to store some state in the pointer.

Indirect Pointer uses lower 1 bit to express whether the pointer reference through GOT or not.

When the pointer references through GOT, dereference the pointer twice.

struct RelativeIndirectablePointer<Pointee>

/* where alignof(Pointee) => 2 */

{

let offsetWithIndirectFlag: Int32 mutating func pointee() -> Pointee {

let offset: Int32

if isIndirect {

offset = offsetWithIndirectFlag & ~isIndirectMask

} else {

offset = offsetWithIndirectFlag

}

return withUnsafePointer(to: &self) { pointer -> UnsafePointer<Pointee> in

let rawPointer = UnsafeRawPointer(pointer)

let advanced = rawPointer.advanced(by: Int(offset))

if isIndirect {

let got = advanced.assuimgMemoryBound(to: UnsafePointer<Pointee>.self)

return got.assumingMemoryBound(to: Pointee.self)

} else {

return advanced.assumingMemoryBound(to: Pointee.self)

}

}.pointee

} var isIndirect: Bool {

offsetWithIndirectFlag & isIndirectMask != 0

} var isIndirectMask: Int32 { 0x01 }

}

Int Paired Pointer

In addition, this pointer extracts unused lower bits as small integer.

struct RelativeDirectPointerIntPair<Pointee, IntTy: BinaryInteger>

/* where alignof(Pointee) => 2 */

{

let offsetWithInt: Int32 mutating func pointee() -> Pointee {

let offset = offsetWithInt & ~intMask

return withUnsafePointer(to: &self) { pointer in

let rawPointer = UnsafeRawPointer(pointer)

let advanced = rawPointer.advanced(by: Int(offset))

return advanced.assumingMemoryBound(to: Pointee.self)

}.pointee

} var value: IntTy {

IntTy(offsetWithInt & intMask)

} var intMask: Int32 {

Int32(

min(

MemoryLayout<Pointee>.alignment,

MemoryLayout<Int32>.alignment

) - 1

)

}

}



Symbolic Reference

Symbolic Reference is used as one of the kinds of mangling.

Normally, when retrieving metadata from a mangled type name, the metadata is searched by demangled type name. However, if the target object exists in same module as referrer, it can be more efficient to reference it directly without demangling the mangled name.

The normal mangler mangles a type name into unique identifier, but symbolic reference embeds the address of the target object as a part of the mangled string.

Symbolic Reference starts with a control character to distinguish it from a normal mangled string, 0x01 to 0x0C are reserved for Symbolic Reference.

A 4-byte Relative Pointer is embedded after the control character and is decoded by the runtime library.

This mechanism makes it possible to refer to the metadata defined in the module without searching cost.

__swift_instantiateConcreteTypeFromMangledName

This function is used to get metadata from a type name. The argument is a {i32, i32} structure that caches the result of this function.

This argument structure can be expressed in C++ like below.

union Input {

struct {

RelativePointer<CChar> typeName; // 4byte

int32_t negativeTypeNameLength; // 4byte

} nonCached;



TypeMetadata *cached; // 8 byte

}

There are two cache object states, no cache and cached, and the usage of 64-bit are changed for each state. Since the layout changes depending on the endian, assume that the system is little endian in this article.

When there is no cache, the first 32 bits are relative pointer to the type name, and the last 32 bits are the length of the type name with negative signed. Since the type name can be Symbolic Reference, null characters may be included as part of the address. Therefore, the length of the type name is necessary because the null character cannot be used as a terminator.

In the cached state, the cached absolute pointer to the metadata is stored using 64 bit.

These two states can be distinguished by looking at a cache object as a signed 64-bit integer: if it is negative, there is no cache, and if it is positive, there is cache. This is because the length of the type name is stored in the back 32 bits with a negative signed. The whole 64 bits as a signed integer always be a negative number when there is no cache.

It is more efficient to look at the whole 64 bits as a integer than simply comparing the back 32 bits.

Summary

In this way, the runtime library has many techniques for maximizing the use of memory space. On the other hand, most of the structures are based on Relative Pointer, so if you try to support more than 32bit pointer sizes, you need to adjust the runtime structure in many places. When making the adjustment, it is necessary to synchronize the layout of the metadata output by the compiler with runtime library. However, in the current implementation, there is no system for synchronization. It’s necessary to adjust them by hand.

Since this greatly affects the portability of Swift, we need to have some kind of system for centrally managing the metadata layout. For example, a code generator that generates a C structure for the runtime library from the LLVM IR output by the compiler might solve this problem. (Swift metadata may be difficult because of the variable-length layout of TrailingObjects.)

Although LLVM plays a big role as a foundation for compiler implementation, I have rarely seen a library that supports interface definitions between the runtime library and the compiler. If the technology of such a layer is enriched, it may become a world where more languages ​​can achieve ABI stability.