The Research Problems of Implementing Go

Russ Cox

Google

About the Talk

I gave this talk at Google's Cambridge, Massachusetts office at an event for area Ph.D. students. The purpose of the event and the talk was to give a sense of some of the research that goes on at Google. The talk presents some research questions motivated by Go. We have answered some well, but others remain open.

About Go

Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.

Design began in late 2007.

Became open source in November 2009.

Developed entirely in the open; very active community.
Language stable as of Go 1, early 2012.
Work continues.

Motivation for Go

Motivation for Go

Started as an answer to software problems at Google:

Deployed: parts of YouTube, dl.google.com, Blogger, Google Code, Google Fiber, ...

Go

A simple but powerful and fun language.

For more background on design:

Research and Go

Go is designed for building production systems at Google.

Plenty of research questions about how to implement Go well.

Concurrency

Concurrency

Go provides two important concepts:

A goroutine is a thread of control within the program, with its own local variables and stack. Cheap, easy to create.

A channel carries typed messages between goroutines.

Concurrency

package main

import "fmt"

func main() {
    c := make(chan string)
    go func() {
        c <- "Hello"
        c <- "World"
    }()
    fmt.Println(<-c, <-c)
}

Concurrency: CSP

Channels adopted from Hoare's Communicating Sequential Processes.

Go enables simple, safe concurrent programming.
It doesn't forbid bad programming.

Caveat: not purely memory safe; sharing is legal.
Passing a pointer over a channel is idiomatic.

Experience shows this is practical.

Concurrency

Sequential network address resolution, given a work list:

// +build OMIT

package main

import (
	"fmt"
	"math/rand"
	"time"
)

func lookup() {
    for _, w := range worklist {
        w.addrs, w.err = LookupHost(w.host)
    }
}

func main() {
	rand.Seed(time.Now().UnixNano())

	t0 := time.Now()
	lookup()

	fmt.Printf("\n")
	for _, w := range worklist {
		if w.err != nil {
			fmt.Printf("%s: error: %v\n", w.host, w.err)
			continue
		}
		fmt.Printf("%s: %v\n", w.host, w.addrs)
	}
	fmt.Printf("total lookup time: %.3f seconds\n", time.Since(t0).Seconds())
}

var worklist = []*Work{
	{host: "fast.com"},
	{host: "slow.com"},
	{host: "fast.missing.com"},
	{host: "slow.missing.com"},
}

type Work struct {
	host  string
	addrs []string
	err   error
}

func LookupHost(name string) (addrs []string, err error) {
	t0 := time.Now()
	defer func() {
		fmt.Printf("lookup %s: %.3f seconds\n", name, time.Since(t0).Seconds())
	}()
	h := hosts[name]
	if h == nil {
		h = failure
	}
	return h(name)
}

type resolver func(string) ([]string, error)

var hosts = map[string]resolver{
	"fast.com":         delay(10*time.Millisecond, fixedAddrs("10.0.0.1")),
	"slow.com":         delay(2*time.Second, fixedAddrs("10.0.0.4")),
	"fast.missing.com": delay(10*time.Millisecond, failure),
	"slow.missing.com": delay(2*time.Second, failure),
}

func fixedAddrs(addrs ...string) resolver {
	return func(string) ([]string, error) {
		return addrs, nil
	}
}

func delay(d time.Duration, f resolver) resolver {
	return func(name string) ([]string, error) {
		time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2))))
		return f(name)
	}
}

func failure(name string) ([]string, error) {
	return nil, fmt.Errorf("unknown host %v", name)
}

Concurrency

Parallel network address resolution, given a work list:

// +build OMIT

package main

import (
	"fmt"
	"math/rand"
	"time"
)

func lookup() {
    done := make(chan bool, len(worklist))

    for _, w := range worklist {
        go func(w *Work) {
            w.addrs, w.err = LookupHost(w.host)
            done <- true
        }(w)
    }

    for i := 0; i < len(worklist); i++ {
        <-done
    }
}

func main() {
	rand.Seed(time.Now().UnixNano())

	t0 := time.Now()
	lookup()

	fmt.Printf("\n")
	for _, w := range worklist {
		if w.err != nil {
			fmt.Printf("%s: error: %v\n", w.host, w.err)
			continue
		}
		fmt.Printf("%s: %v\n", w.host, w.addrs)
	}
	fmt.Printf("total lookup time: %.3f seconds\n", time.Since(t0).Seconds())
}

var worklist = []*Work{
	{host: "fast.com"},
	{host: "slow.com"},
	{host: "fast.missing.com"},
	{host: "slow.missing.com"},
}

type Work struct {
	host  string
	addrs []string
	err   error
}

func LookupHost(name string) (addrs []string, err error) {
	t0 := time.Now()
	defer func() {
		fmt.Printf("lookup %s: %.3f seconds\n", name, time.Since(t0).Seconds())
	}()
	h := hosts[name]
	if h == nil {
		h = failure
	}
	return h(name)
}

type resolver func(string) ([]string, error)

var hosts = map[string]resolver{
	"fast.com":         delay(10*time.Millisecond, fixedAddrs("10.0.0.1")),
	"slow.com":         delay(2*time.Second, fixedAddrs("10.0.0.4")),
	"fast.missing.com": delay(10*time.Millisecond, failure),
	"slow.missing.com": delay(2*time.Second, failure),
}

func fixedAddrs(addrs ...string) resolver {
	return func(string) ([]string, error) {
		return addrs, nil
	}
}

func delay(d time.Duration, f resolver) resolver {
	return func(name string) ([]string, error) {
		time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2))))
		return f(name)
	}
}

func failure(name string) ([]string, error) {
	return nil, fmt.Errorf("unknown host %v", name)
}

Implementing Concurrency

Challenge: Make channel communication scale

Research question: lock-free channels?

Polymorphism

Interfaces

An interface defines a set of methods.

package io

type Writer interface {
    Write(data []byte) (n int, err error)
}

Interfaces

A type implements the interface by implementing the methods.

package bytes

type Buffer struct {
    ...
}

func (b *Buffer) Write(data []byte) (n int, err error) {
    ...
}

Interfaces

An implementation of an interface can be assigned to a variable of that interface type.

package fmt

func Fprintf(w io.Writer, format string, args ...interface{})

Interfaces

// +build OMIT

package main

import (
	"bytes"
	"fmt"
	"io"
	"os"
)

var _ = io.Copy

func main() {
    b := new(bytes.Buffer)
    var w io.Writer
    w = b
    fmt.Fprintf(w, "hello, %s\n", "world")
    os.Stdout.Write(b.Bytes())
}

Interface Advantages

The source of all generality in the Go language.

Implementing Interfaces

How do you make method dispatch efficient?

b := new(bytes.Buffer)
var w io.Writer
w = b
fmt.Fprintf(w, "hello, %s\n", "world")
    ... w.Write(text) // what happens here?

At w.Write call, how does the runtime find the method to call?

Implementing Interfaces

How do you make method dispatch efficient?

b := new(bytes.Buffer)
var w io.Writer
w = b                 // do the work here!
fmt.Fprintf(w, "hello, %s\n", "world")
    ... w.Write(text) // plain indirect function call

Interface holds two words: "itable" and actual value (or pointer to value).

Itable contains type information plus list of function pointers for methods in interface.

w.itable.fn[1](w.data, text)

Conversion sites usually trivial to cache.

Interfaces for Algorithms

package sort

type Interface interface {
    Len() int           // return number of elements, len(x)
    Less(i, j int) bool // report whether x[i] < x[j]
    Swap(i, j int)      // x[i], x[j] = x[j], x[i]
}

func Sort(data Interface)

Requires some boilerplate for each use:

type bySubject []Thread

func (x bySubject) Less(i, j int) bool { return x[i].Subject < x[j].Subject }
func (x bySubject) Swap(i, j int)      { x[i], x[j] = x[j], x[i] }
func (x bySubject) Len() int           { return len(x) }

sort.Sort(bySubject(threads))

Polymorphism: Can we do better?

func Sort(data []T, less func(x, y *T) bool)

sort.Sort(threads, func(x, y *Thread) bool {
    return x.Subject < y.Subject
})

Research question: what's a reasonable semantics?
Research question: what's a reasonable implementation?

Do you want slow programmers, slow compilers and bloated binaries, or slow execution?

Garbage Collection

Garbage Collection

Garbage collection simplifies APIs.

Fundamental to concurrency: too hard to track ownership otherwise.

Fundamental to interfaces: memory management details do not bifurcate otherwise-similar APIs.

Of course, adds cost, latency, complexity in run time system.

Avoiding Garbage Collection

Observation: garbage collection is a service, and like any service it can be overloaded, oversubscribed.

Go lets you limit allocation by controlling memory layout.

type Point struct {
    X, Y int
}

type Rectangle struct {
    Min, Max Point
}

Implementing Garbage Collection

Language decision: interior pointers are allowed, as are foreign pointers

Allocator: objects are allocated in pages with other objects of the same size.

Current GC: stop the world, parallel mark, start the world, background sweep.

Research question: how to make collector lower latency, possibly incremental?

Program Translation

Program Translation

Go programs can be parsed without context (unlike C and C++).
Go ships with a standard program formatter.

Makes automated edits indistinguishable from manual edits.

$ cat x.go
package main

var b bytes.Buffer

$ gofmt -r 'bytes.Buffer -> bytes.Writer' x.go
package main

var b bytes.Writer

$

More advanced rewrites: "go fix" for API adjustments.

Program Translation

Research Question: What about translating other programs to Go?

Exploring the conversion of C programs to Go today.

What about other languages?

Research and Go

Plenty of research questions about how to implement Go well.

Thank you

Russ Cox

Google

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)