The Research Problems of Implementing Go Russ Cox Google

About the Talk I gave this talk at Google's Cambridge, Massachusetts office at an event for area Ph.D. students. The purpose of the event and the talk was to give a sense of some of the research that goes on at Google. The talk presents some research questions motivated by Go. We have answered some well, but others remain open. 2

About Go Go is an open source programming language that makes it easy to build simple, reliable, and efficient software. Design began in late 2007. Robert Griesemer, Rob Pike, Ken Thompson

Russ Cox, Ian Lance Taylor Became open source in November 2009. Developed entirely in the open; very active community.

Language stable as of Go 1, early 2012.

Work continues. 3

Motivation for Go 4

Motivation for Go Started as an answer to software problems at Google: multicore processors

networked systems

massive computation clusters

scale: 10⁷ lines of code

scale: 10³ programmers

scale: 10⁶⁺ machines (design point) Deployed: parts of YouTube, dl.google.com, Blogger, Google Code, Google Fiber, ... 5

Go A simple but powerful and fun language. start with C, remove complex parts

add interfaces, concurrency

also: garbage collection, closures, reflection, strings, ... For more background on design: Less is exponentially more

Go at Google: Language Design in the Service of Software Engineering 6

Research and Go Go is designed for building production systems at Google. Goal: make that job easier, faster, better.

Non-goal: break new ground in programming language research Plenty of research questions about how to implement Go well. Concurrency

Polymorphism

Garbage collection

Program translation 7

Concurrency 8

Concurrency Go provides two important concepts: A goroutine is a thread of control within the program, with its own local variables and stack. Cheap, easy to create. A channel carries typed messages between goroutines. 9

Concurrency package main import "fmt" func main() { c := make(chan string) go func() { c <- "Hello" c <- "World" }() fmt.Println(<-c, <-c) } 10

Concurrency: CSP Channels adopted from Hoare's Communicating Sequential Processes. Orthogonal to rest of language

Can keep familiar model for computation

Focus on composition of regular code Go enables simple, safe concurrent programming.

It doesn't forbid bad programming. Caveat: not purely memory safe; sharing is legal.

Passing a pointer over a channel is idiomatic. Experience shows this is practical. 11

Concurrency Sequential network address resolution, given a work list: // +build ignore,OMIT package main import ( "fmt" "math/rand" "time" ) func lookup() { for _, w := range worklist { w.addrs, w.err = LookupHost(w.host) } } func main() { rand.Seed(time.Now().UnixNano()) t0 := time.Now() lookup() fmt.Printf("

") for _, w := range worklist { if w.err != nil { fmt.Printf("%s: error: %v

", w.host, w.err) continue } fmt.Printf("%s: %v

", w.host, w.addrs) } fmt.Printf("total lookup time: %.3f seconds

", time.Since(t0).Seconds()) } var worklist = []*Work{ {host: "fast.com"}, {host: "slow.com"}, {host: "fast.missing.com"}, {host: "slow.missing.com"}, } type Work struct { host string addrs []string err error } func LookupHost(name string) (addrs []string, err error) { t0 := time.Now() defer func() { fmt.Printf("lookup %s: %.3f seconds

", name, time.Since(t0).Seconds()) }() h := hosts[name] if h == nil { h = failure } return h(name) } type resolver func(string) ([]string, error) var hosts = map[string]resolver{ "fast.com": delay(10*time.Millisecond, fixedAddrs("10.0.0.1")), "slow.com": delay(2*time.Second, fixedAddrs("10.0.0.4")), "fast.missing.com": delay(10*time.Millisecond, failure), "slow.missing.com": delay(2*time.Second, failure), } func fixedAddrs(addrs ...string) resolver { return func(string) ([]string, error) { return addrs, nil } } func delay(d time.Duration, f resolver) resolver { return func(name string) ([]string, error) { time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2)))) return f(name) } } func failure(name string) ([]string, error) { return nil, fmt.Errorf("unknown host %v", name) } 12

Concurrency Parallel network address resolution, given a work list: // +build ignore,OMIT package main import ( "fmt" "math/rand" "time" ) func lookup() { done := make(chan bool, len(worklist)) for _, w := range worklist { go func(w *Work) { w.addrs, w.err = LookupHost(w.host) done <- true }(w) } for i := 0; i < len(worklist); i++ { <-done } } func main() { rand.Seed(time.Now().UnixNano()) t0 := time.Now() lookup() fmt.Printf("

") for _, w := range worklist { if w.err != nil { fmt.Printf("%s: error: %v

", w.host, w.err) continue } fmt.Printf("%s: %v

", w.host, w.addrs) } fmt.Printf("total lookup time: %.3f seconds

", time.Since(t0).Seconds()) } var worklist = []*Work{ {host: "fast.com"}, {host: "slow.com"}, {host: "fast.missing.com"}, {host: "slow.missing.com"}, } type Work struct { host string addrs []string err error } func LookupHost(name string) (addrs []string, err error) { t0 := time.Now() defer func() { fmt.Printf("lookup %s: %.3f seconds

", name, time.Since(t0).Seconds()) }() h := hosts[name] if h == nil { h = failure } return h(name) } type resolver func(string) ([]string, error) var hosts = map[string]resolver{ "fast.com": delay(10*time.Millisecond, fixedAddrs("10.0.0.1")), "slow.com": delay(2*time.Second, fixedAddrs("10.0.0.4")), "fast.missing.com": delay(10*time.Millisecond, failure), "slow.missing.com": delay(2*time.Second, failure), } func fixedAddrs(addrs ...string) resolver { return func(string) ([]string, error) { return addrs, nil } } func delay(d time.Duration, f resolver) resolver { return func(name string) ([]string, error) { time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2)))) return f(name) } } func failure(name string) ([]string, error) { return nil, fmt.Errorf("unknown host %v", name) } 13

Implementing Concurrency Challenge: Make channel communication scale start with one global channel lock

per-channel locks, locked in address order for multi-channel operations Research question: lock-free channels? 14

Polymorphism 15

Interfaces An interface defines a set of methods. package io type Writer interface { Write(data []byte) (n int, err error) } 16

Interfaces A type implements the interface by implementing the methods. package bytes type Buffer struct { ... } func (b *Buffer) Write(data []byte) (n int, err error) { ... } 17

Interfaces An implementation of an interface can be assigned to a variable of that interface type. package fmt func Fprintf(w io.Writer, format string, args ...interface{}) 18

Interfaces // +build ignore,OMIT package main import ( "bytes" "fmt" "io" "os" ) var _ = io.Copy func main() { b := new(bytes.Buffer) var w io.Writer w = b fmt.Fprintf(w, "hello, %s

", "world") os.Stdout.Write(b.Bytes()) } 19

Interface Advantages no dependence between interface and implementation

easy testing

avoids overdesign, rigid hierarchy of inheritance-based OO The source of all generality in the Go language. 20

Implementing Interfaces How do you make method dispatch efficient? b := new(bytes.Buffer) var w io.Writer w = b fmt.Fprintf(w, "hello, %s

", "world") ... w.Write(text) // what happens here? At w.Write call, how does the runtime find the method to call? 21

Implementing Interfaces How do you make method dispatch efficient? b := new(bytes.Buffer) var w io.Writer w = b // do the work here! fmt.Fprintf(w, "hello, %s

", "world") ... w.Write(text) // plain indirect function call Interface holds two words: "itable" and actual value (or pointer to value). Itable contains type information plus list of function pointers for methods in interface. w.itable.fn[1](w.data, text) Conversion sites usually trivial to cache. 22

Interfaces for Algorithms package sort type Interface interface { Len() int // return number of elements, len(x) Less(i, j int) bool // report whether x[i] < x[j] Swap(i, j int) // x[i], x[j] = x[j], x[i] } func Sort(data Interface) Requires some boilerplate for each use: type bySubject []Thread func (x bySubject) Less(i, j int) bool { return x[i].Subject < x[j].Subject } func (x bySubject) Swap(i, j int) { x[i], x[j] = x[j], x[i] } func (x bySubject) Len() int { return len(x) } sort.Sort(bySubject(threads)) 23

Polymorphism: Can we do better? func Sort(data []T, less func(x, y *T) bool) sort.Sort(threads, func(x, y *Thread) bool { return x.Subject < y.Subject }) Research question: what's a reasonable semantics?

Research question: what's a reasonable implementation? C says don't bother.

C++ makes many copies of the same function.

Java boxes everything implicitly: one function, but expensive data model.

Java discards run-time type information. Do you want slow programmers, slow compilers and bloated binaries, or slow execution? 24

Garbage Collection 25

Garbage Collection Garbage collection simplifies APIs. In C and C++, too much API design (and too much programming effort!) is about memory management. Fundamental to concurrency: too hard to track ownership otherwise. Fundamental to interfaces: memory management details do not bifurcate otherwise-similar APIs. Of course, adds cost, latency, complexity in run time system. 26

Avoiding Garbage Collection Observation: garbage collection is a service, and like any service it can be overloaded, oversubscribed. Go lets you limit allocation by controlling memory layout. type Point struct { X, Y int } type Rectangle struct { Min, Max Point } 27

Implementing Garbage Collection Language decision: interior pointers are allowed, as are foreign pointers Cannot reuse Java GC algorithms directly.

But gives programmer more control over allocation. Allocator: objects are allocated in pages with other objects of the same size. Current GC: stop the world, parallel mark, start the world, background sweep. Research question: how to make collector lower latency, possibly incremental? 28

Program Translation 29

Program Translation Go programs can be parsed without context (unlike C and C++).

Go ships with a standard program formatter. Makes automated edits indistinguishable from manual edits. $ cat x.go package main var b bytes.Buffer $ gofmt -r 'bytes.Buffer -> bytes.Writer' x.go package main var b bytes.Writer $ More advanced rewrites: "go fix" for API adjustments. 30

Program Translation Research Question: What about translating other programs to Go? Exploring the conversion of C programs to Go today. Decide return type (for example, int vs bool).

Decide which variables are pointers vs arrays.

Decide which functions are really methods.

Decide natural package boundaries. What about other languages? 31

Research and Go Plenty of research questions about how to implement Go well. Concurrency

Polymorphism

Garbage collection

Program translation 32