Go for C programmers

ACCU, Silicon Valley Chapter, Nov. 14, 2012

Robert Griesemer

Google Inc.

Organization of this talk

Part 1: Introduction to Go

Break

Part 2: Object-oriented and concurrent programming in Go

Motivation

Have

Incomprehensible and unsafe code.

Extremely slow builds.

Missing concurrency support.

Outdated tools.

Want

Readable, safe, and efficient code.

A build system that scales.

Good concurrency support.

Tools that can operate at Google-scale.

Meet Go

Compact, yet expressive

Statically typed and garbage collected

Object- but not type-oriented

Strong concurrency support

Efficient implementation

Rich standard library

Fast compilers

Scalable tools

Hello, World

package main

import "fmt"

func main() {
    fmt.Println("Hello, 世界")
}

Syntax

Syntax

Syntax is not important... - unless you are a programmer.
Rob Pike.

The readability of programs is immeasurably more important than their writeability.
Hints on Programming Language Design
C. A. R. Hoare 1973

Too verbose

scoped_ptr<logStats::LogStats>
    logStats(logStats::LogStats::NewLogStats(FLAGS_logStats, logStats::LogStats::kFIFO));

Too dense

(n: Int) => (2 to n) |> (r => r.foldLeft(r.toSet)((ps, x) =>
    if (ps(x)) ps -- (x * x to n by x) else ps))

Just right

t := time.Now()
switch {
case t.Hour() < 12:
    return "morning"
case t.Hour() < 18:
    return "afternoon"
default:
    return "evening"
}

Reading Go code

Packages

A Go program consists of packages.

A package consists of one or more source files (.go files).

Each source file starts with a package clause followed by declarations.

package main

import "fmt"

func main() {
    fmt.Println("Hello, 世界")
}

By convention, all files belonging to a single package are located in a single directory.

Declarations, "Pascal-style" (left to right)

Pattern: keyword names [type] [initializer]

import "fmt"

const digits = "0123456789abcdef"

type Point struct {
    x, y int
    tag  string
}

var s [32]byte

var msgs = []string{"Hello, 世界", "Ciao, Mondo"}

func itoa(x, base int) string

Why?

p, q *Point

func adder(delta int) func(x int) int

Constants

In Go, constants are mathematically precise.

There is no need for qualifiers (-42LL, 7UL, etc.)

const (
    MaxUInt = 1<<64 - 1
    Pi      = 3.14159265358979323846264338327950288419716939937510582097494459
    Pi2     = Pi * Pi
    Delta   = 2.0
)

Only when used, constants are truncated to size:

// +build OMIT

package main

import "fmt"

// START1 OMIT
const (
	MaxUInt = 1<<64 - 1
	Pi      = 3.14159265358979323846264338327950288419716939937510582097494459
	Pi2     = Pi * Pi
	Delta   = 2.0
)

// STOP1 OMIT

func main() {
    var x uint64 = MaxUInt
    var pi2 float32 = Pi2
    var delta int = Delta
	fmt.Println("x =", x)
	fmt.Println("pi2 =", pi2)
	fmt.Println("delta =", delta)
}

Huge win in readability and ease of use.

Types

Familiar:
- Basic types, arrays, structs, pointers, functions.

But:
- string is a basic type.
- No automatic conversion of basic types in expressions.
- No pointer arithmetic; pointers and arrays are different.
- A function type represents a function; context and all.

New:
- Slices instead of array pointers + separate length: []int
- Maps because everybody needs them: map[string]int
- Interfaces for polymorphism: interface{}
- Channels for communication between goroutines: chan int

Variables

Familiar:

var i int
var p, q *Point
var threshold float64 = 0.75

New: Type can be inferred from type of initializing expression.

var i = 42       // type of i is int
var z = 1 + 2.3i // type of z is complex128

Shortcut (inside functions):

    i := 42 // type of i is int

It is safe to take the address of any variable:

    return &i

Functions

Functions may have multiple return values:

func atoi(s string) (i int, err error)

Functions are first-class citizens (closures):

func adder(delta int) func(x int) int {
    f := func(x int) int {
        return x + delta
    }
    return f
}
// +build OMIT

package main

import "fmt"

// START1 OMIT
func adder(delta int) func(x int) int {
	f := func(x int) int { // HL
		return x + delta // HL
	} // HL
	return f
}

// STOP1 OMIT

func main() {
    var inc = adder(1)
    fmt.Println(inc(0))
    fmt.Println(adder(-1)(10))
}

Statements

    t := x
    switch {
    case x == 0:
        return "0"
    case x < 0:
        t = -x
    }
    var s [32]byte
    i := len(s)
    for t != 0 { // Look, ma, no ()'s!
        i--
        s[i] = digits[t%base]
        t /= base
    }
    if x < 0 {
        i--
        s[i] = '-'
    }
    return string(s[i:])

Go statements for C programmers

Cleanups:

New:

Assignments

Assignments may assign multiple values simultaneously:

a, b = x, y

Equivalent to:

t1 := x
t2 := y
a = t1
b = t2

For instance:

a, b = b, a       // swap a and b
i, err = atoi(s)  // assign results of atoi
i, _ = atoi(991)  // discard 2nd value

Switch statements

Switch statements may have multiple case values, and break is implicit:

switch day {
case 1, 2, 3, 4, 5:
    tag = "workday"
case 0, 6:
    tag = "weekend"
default:
    tag = "invalid"
}

The case values don't have to be constants:

switch {
case day < 0 || day > 6:
    tag = "invalid"
case day == 0 || day == 6:
    tag = "weekend"
default:
    tag = "workday"
}

For loops

    for i := 0; i < len(primes); i++ {
        fmt.Println(i, primes[i])
    }

A range clause permits easy iteration over arrays and slices:

// +build OMIT

package main

import "fmt"

var primes = [...]int{2, 3, 5, 7, 11, 13, 17, 19}

func _() {
	// START1 OMIT
	for i := 0; i < len(primes); i++ {
		fmt.Println(i, primes[i])
	}
	// STOP1 OMIT

	// START2 OMIT
	var sum int
	for _, x := range primes {
		sum += x
	}
	// STOP2 OMIT
}

func main() {
    for i, x := range primes {
        fmt.Println(i, x)
    }
}

Unused values are discarded by assigning to the blank (_) identifier:

    var sum int
    for _, x := range primes {
        sum += x
    }

Dependencies

Dependencies in Go

An import declaration is used to express a dependency on another package:

import "net/rpc"

Here, the importing package depends on the Go package "rpc".

The import path ("net/rpc") uniquely identifies a package; multiple packages may have the same name, but they all reside at different locations (directories).

By convention, the package name matches the last element of the import path (here: "rpc").

Exported functionality of the rpc package is available via the package qualifier (rpc):

rpc.Call

A Go import declaration takes the place of a C include.

Naming: An excursion

How names work in a programming language is critical to readability.
Scopes control how names work.
Go has very simple scope hierarchy:

Locality of names

Upper case names are exported: Name vs. name.

The package qualifier is always present for imported names.

(First component of) every name is always declared in the current package.

One of the best (and hardest!) decisions made in Go.

Locality scales

No surprises when importing:

Names do not leak across boundaries.

In C, C++, Java the name y could refer to anything.
In Go, y (or even Y) is always defined within the package.
In Go, x.Y is clear: find x locally, Y belongs to it, and there is only one such Y.

Immediate consequences for readability.

Back to imports

Importing a package means reading a package's exported API.

This export information is self-contained. For instance:

B's export data contains all necessary information about C. There is no need for a compiler to read the export data of C.

This has huge consequences for build times!

Dependencies in C

.h files are not self-contained.

As a result, a compiler ends up reading core header files over and over again.

ifdef still requires the preprocessor to read a lot of code.

No wonder it takes a long time to compile...

At Google scale: dependencies explode, are exponential, become almost non-computable.

A large Google C++ build can read the same header file tens of thousands of times!

Tools

A brief overview

Two compilers: gc, gccgo

Support for multiple platforms: x86 (32/64bit), ARM (32bit), Linux, BSD, OS X, ...

Automatic formatting of source code: gofmt

Automatic documentation extraction: godoc

Automatic API adaption: gofix

All (and more!) integrated into the go command.

Building a Go program

A Go program can be compiled and linked without additional build information (make files, etc.).

By convention, all files belonging to a package are found in the same directory.

All depending packages are found by inspecting the import paths of the top-most (main) package.

A single integrated tool takes care of compiling individual files or entire systems.

The go command

Usage:

go command [arguments]

Selected commands:

build       compile packages and dependencies
clean       remove object files
doc         run godoc on package sources
fix         run go tool fix on packages
fmt         run gofmt on package sources
get         download and install packages and dependencies
install     compile and install packages and dependencies
list        list packages
run         compile and run Go program
test        test packages
vet         run go tool vet on packages

Break

Object-oriented programming

What is object-oriented programming?

"Object-oriented programming (OOP) is a programming paradigm using objects – usually instances of a class – consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction, encapsulation, messaging, modularity, polymorphism, and inheritance. Many modern programming languages now support forms of OOP, at least as an option."

(Wikipedia)

OOP requires very little extra programming language support

We only need

Claim: Data abstraction, encapsulation, and modularity are mechanisms
in their own right, not OOP specific, and a modern language (OO or not)
should have support for them independently.

Object-oriented programming in Go

Methods without classes

Interfaces without hierarchies

Code reuse without inheritance

Specifically:

Methods

package main

import "fmt"

type Point struct{ x, y int }

func PointToString(p Point) string {
    return fmt.Sprintf("Point{%d, %d}", p.x, p.y)
}

func (p Point) String() string {
    return fmt.Sprintf("Point{%d, %d}", p.x, p.y)
}

func main() {
    p := Point{3, 5}
    fmt.Println(PointToString(p)) // static dispatch
    fmt.Println(p.String())       // static dispatch
    fmt.Println(p)
}

Methods can be attached to any type

package main

import "fmt"

type Celsius float32
type Fahrenheit float32

func (t Celsius) String() string           { return fmt.Sprintf("%g°C", t) }
func (t Fahrenheit) String() string        { return fmt.Sprintf("%g°F", t) }
func (t Celsius) ToFahrenheit() Fahrenheit { return Fahrenheit(t*9/5 + 32) }

func main() {
    var t Celsius = 21
    fmt.Println(t.String())
    fmt.Println(t)
    fmt.Println(t.ToFahrenheit())
}

Interfaces

type Stringer interface {
     String() string
}

type Reader interface {
     Read(p []byte) (n int, err error)
}

type Writer interface {
     Write(p []byte) (n int, err error)
}

type Empty interface{}

An interface defines a set of methods.
A type that implements all methods of an interface is said to implement the interface.
All types implement the empty interface interface{}.

Dynamic dispatch

// +build OMIT

package main

import "fmt"

type Point struct{ x, y int }

func (p Point) String() string { return fmt.Sprintf("Point{%d, %d}", p.x, p.y) }

type Celsius float32
type Fahrenheit float32

func (t Celsius) String() string           { return fmt.Sprintf("%g°C", t) }
func (t Fahrenheit) String() string        { return fmt.Sprintf("%g°F", t) }
func (t Celsius) ToFahrenheit() Fahrenheit { return Fahrenheit(t*9/5 + 32) }

func main() {
    type Stringer interface {
        String() string
    }

    var v Stringer
    var corner = Point{1, 1}
    var boiling = Celsius(100)

    v = corner
    fmt.Println(v.String()) // dynamic dispatch
    fmt.Println(v)

    v = boiling.ToFahrenheit()
    fmt.Println(v.String()) // dynamic dispatch
    fmt.Println(v)
}

A value (here: corner, boiling) of a type (Point, Celsius) that implements
an interface (Stringer) can be assigned to a variable (v) of that interface type.

Composition and chaining

Typically, interfaces are small (1-3 methods).

Pervasive use of key interfaces in the standard library make it easy to chain APIs together.

package io
func Copy(dst Writer, src Reader) (int64, error)

The io.Copy function copies by reading from any Reader and writing to any Writer.

Interfaces are often introduced ad-hoc, and after the fact.

There is no explicit hierarchy and thus no need to design one!

cat

package main

import (
    "flag"
    "io"
    "os"
)

func main() {
    flag.Parse()
    for _, arg := range flag.Args() {
        f, err := os.Open(arg)
        if err != nil {
            panic(err)
        }
        defer f.Close()
        _, err = io.Copy(os.Stdout, f)
        if err != nil {
            panic(err)
        }
    }
}

Interfaces in practice

Methods on any types and ad hoc interfaces make for a light-weight OO programming style.

Go interfaces enable post-facto abstraction.

No explicit type hierarchies.

"Plug'n play" in a type-safe way.

Concurrency

What is concurrency?

Concurrency is the composition of independently executing computations.

Concurrency is a way to structure software, particularly as a way to write clean code that interacts well with the real world.

It is not parallelism.

Concurrency is not parallelism

Concurrency is not parallelism, although it enables parallelism.

If you have only one processor, your program can still be concurrent but it cannot be parallel.

On the other hand, a well-written concurrent program might run efficiently in parallel on a multiprocessor. That property could be important...

For more on that distinction, see the link below. Too much to discuss here.

A model for software construction

Easy to understand.

Easy to use.

Easy to reason about.

You don't need to be an expert!

(Much nicer than dealing with the minutiae of parallelism (threads, semaphores, locks, barriers, etc.))

There is a long history behind Go's concurrency features, going back to Hoare's CSP in 1978 and even Dijkstra's guarded commands (1975).

Basic Examples

A boring function

We need an example to show the interesting properties of the concurrency primitives.

To avoid distraction, we make it a boring example.

func main() {
    f("Hello, World", 500*time.Millisecond)
}
// +build OMIT

package main

import (
	"fmt"
	"time"
)

// START1 OMIT
func main() {
	f("Hello, World", 500*time.Millisecond)
}

// STOP1 OMIT

func f(msg string, delay time.Duration) {
    for i := 0; ; i++ {
        fmt.Println(msg, i)
        time.Sleep(delay)
    }
}

Ignoring it

The go statement runs the function as usual, but doesn't make the caller wait.

It launches a goroutine.

The functionality is analogous to the & on the end of a shell command.

// +build OMIT

package main

import (
	"fmt"
	"time"
)

func main() {
    go f("three", 300*time.Millisecond)
    go f("six", 600*time.Millisecond)
    go f("nine", 900*time.Millisecond)
}

func f(msg string, delay time.Duration) {
	for i := 0; ; i++ {
		fmt.Println(msg, i)
		time.Sleep(delay)
	}
}

Ignoring it a little less

When main returns, the program exits and takes the function f down with it.

We can hang around a little, and on the way show that both main and the launched goroutine are running.

// +build OMIT

package main

import (
	"fmt"
	"time"
)

func main() {
    go f("three", 300*time.Millisecond)
    go f("six", 600*time.Millisecond)
    go f("nine", 900*time.Millisecond)
    time.Sleep(3 * time.Second)
    fmt.Println("Done.")
}

func f(msg string, delay time.Duration) {
	for i := 0; ; i++ {
		fmt.Println(msg, i)
		time.Sleep(delay)
	}
}

Goroutines

What is a goroutine? It's an independently executing function, launched by a go statement.

It has its own call stack, which grows and shrinks as required.

It's very cheap. It's practical to have thousands, even hundreds of thousands of goroutines.

It's not a thread.

There might be only one thread in a program with thousands of goroutines.

Instead, goroutines are multiplexed dynamically onto threads as needed to keep all the goroutines running.

But if you think of it as a very cheap thread, you won't be far off.

Channels

A channel in Go provides a connection between two goroutines, allowing them to communicate.

    // Declaring and initializing.
    var c chan int
    c = make(chan int)
    // or
    c := make(chan int)
    // Sending on a channel.
    c <- 1
    // Receiving from a channel.
    // The "arrow" indicates the direction of data flow.
    value = <-c

Using channels

A channel connects the main and f goroutines so they can communicate.

// +build OMIT

package main

import (
	"fmt"
	"time"
)

func main() {
    c := make(chan string)
    go f("three", 300*time.Millisecond, c)
    for i := 0; i < 10; i++ {
        fmt.Println("Received", <-c) // Receive expression is just a value.
    }
    fmt.Println("Done.")
}

// START2 OMIT
func f(msg string, delay time.Duration, c chan string) {
	for i := 0; ; i++ {
		c <- fmt.Sprintf("%s %d", msg, i) // Any suitable value can be sent. // HL
		time.Sleep(delay)
	}
}

// STOP2 OMIT
func f(msg string, delay time.Duration, c chan string) {
    for i := 0; ; i++ {
        c <- fmt.Sprintf("%s %d", msg, i) // Any suitable value can be sent.
        time.Sleep(delay)
    }
}

Synchronization

When the main function executes <–c, it will wait for a value to be sent.

Similarly, when the function f executes c<–value, it waits for a receiver to be ready.

A sender and receiver must both be ready to play their part in the communication. Otherwise we wait until they are.

Thus channels both communicate and synchronize.

Channels can be unbuffered or buffered.

Using channels between many goroutines

// +build OMIT

package main

import (
	"fmt"
	"time"
)

func main() {
    c := make(chan string)
    go f("three", 300*time.Millisecond, c)
    go f("six", 600*time.Millisecond, c)
    go f("nine", 900*time.Millisecond, c)
    for i := 0; i < 10; i++ {
        fmt.Println("Received", <-c)
    }
    fmt.Println("Done.")
}

// START2 OMIT
func f(msg string, delay time.Duration, c chan string) {
	for i := 0; ; i++ {
		c <- fmt.Sprintf("%s %d", msg, i) // Any suitable value can be sent. // HL
		time.Sleep(delay)
	}
}

// STOP2 OMIT

A single channel may be used to communicate between many (not just two) goroutines; many goroutines may communicate via one or multiple channels.

This enables a rich variety of concurrency patterns.

Elements of a work-stealing scheduler

func worker(in chan int, out chan []int) {
    for {
        order := <-in           // Receive a work order.
        result := factor(order) // Do some work.
        out <- result           // Send the result back.
    }
}

The worker uses two channels to communicate:
- The in channel waits for some work order.
- The out channel communicates the result.
- As work load, a worker (very slowly) computes the list of prime factors for a given order.

A matching producer and consumer

func producer(out chan int) {
    for order := 0; ; order++ {
        out <- order // Produce a work order.
    }
}

func consumer(in chan []int, n int) {
    for i := 0; i < n; i++ {
        result := <-in // Consume a result.
        fmt.Println("Consumed", result)
    }
}

The producer produces and endless supply of work orders and sends them out.

The consumer receives n results from the in channel and then terminates.

Putting it all together

// +build OMIT

package main

import "fmt"
import "time"

func main() {
    start := time.Now()

    in := make(chan int)    // Channel on which work orders are received.
    out := make(chan []int) // Channel on which results are returned.
    go producer(in)
    go worker(in, out) // Launch one worker.
    consumer(out, 100)

    fmt.Println(time.Since(start))
}

// START2 OMIT
func worker(in chan int, out chan []int) {
	for {
		order := <-in           // Receive a work order. // HL
		result := factor(order) // Do some work. // HL
		out <- result           // Send the result back. // HL
	}
}

// STOP2 OMIT

// START3 OMIT
func producer(out chan int) {
	for order := 0; ; order++ {
		out <- order // Produce a work order. // HL
	}
}

func consumer(in chan []int, n int) {
	for i := 0; i < n; i++ {
		result := <-in // Consume a result. // HL
		fmt.Println("Consumed", result)
	}
}

// STOP3 OMIT

// factor returns a list containing x and its prime factors.
func factor(x int) (list []int) {
	list = append(list, x)
	for f := 2; x >= f; f++ {
		for x%f == 0 {
			x /= f
			list = append(list, f)
			// Slow down so we can see what happens.
			time.Sleep(50 * time.Millisecond)
		}
	}
	return
}

We use one worker to handle the entire work load.

Because there is only one worker, we see the result coming back in order.

This is running rather slow...

Using 10 workers

// +build OMIT

package main

import "fmt"
import "time"

func main() {
	start := time.Now()

    in := make(chan int)
    out := make(chan []int)
    go producer(in)
    // Launch 10 workers.
    for i := 0; i < 10; i++ {
        go worker(in, out)
    }
    consumer(out, 100)

	fmt.Println(time.Since(start))
}

// START2 OMIT
func worker(in chan int, out chan []int) {
	for {
		order := <-in           // Receive a work order. // HL
		result := factor(order) // Do some work. // HL
		out <- result           // Send the result back. // HL
	}
}

// STOP2 OMIT

// START3 OMIT
func producer(out chan int) {
	for order := 0; ; order++ {
		out <- order // Produce a work order. // HL
	}
}

func consumer(in chan []int, n int) {
	for i := 0; i < n; i++ {
		result := <-in // Consume a result. // HL
		fmt.Println("Consumed", result)
	}
}

// STOP3 OMIT

// factor returns a list containing x and its prime factors.
func factor(x int) (list []int) {
	list = append(list, x)
	for f := 2; x >= f; f++ {
		for x%f == 0 {
			x /= f
			list = append(list, f)
			// Slow down so we can see what happens.
			time.Sleep(50 * time.Millisecond)
		}
	}
	return
}

A ready worker will read the next order from the in channel and start working on it. Another ready worker will proceed with the next order, and so forth.

Because we have many workers and since different orders take different amounts of time to work on, we see the results coming back out-of-order.

On a multi-core system, many workers may truly run in parallel.

This is running much faster...

The Go approach

Don't communicate by sharing memory, share memory by communicating.

There's much more...

Links

Go Home Page:

Go Tour (learn Go in your browser)

Package documentation:

Articles galore:

Thank you

Robert Griesemer

Google Inc.