Cancelation, Context, and Plumbing

GothamGo 2014

Sameer Ajmani

Video

This talk was presented at GothamGo in New York City, November 2014.

Introduction

In Go servers, each incoming request is handled in its own goroutine.

Handler code needs access to request-specific values:

When the request completes or times out, its work should be canceled.

Cancelation

Abandon work when the caller no longer needs the result.

Efficiently canceling unneeded work saves resources.

Cancelation is advisory

Cancelation does not stop execution or trigger panics.

Cancelation informs code that its work is no longer needed.

Code checks for cancelation and decides what to do:
shut down, clean up, return errors.

Cancelation is transitive

Cancelation affects all APIs on the request path

Network protocols support cancelation.

APIs above network need cancelation, too.

And all the layers atop those, up to the UI.

Goal: provide a uniform cancelation API that works across package boundaries.

Cancelation APIs

Many Go APIs support cancelation and deadlines already.

Go APIs are synchronous, so cancelation comes from another goroutine.

Method on the connection or client object:

// goroutine #1
result, err := conn.Do(req)

// goroutine #2
conn.Cancel(req)

Method on the request object:

// goroutine #1
result, err := conn.Do(req)

// goroutine #2
req.Cancel()

Cancelation APIs (continued)

Method on the pending result object:

// goroutine #1
pending := conn.Start(req)
...
result, err := pending.Result()

// goroutine #2
pending.Cancel()

Different cancelation APIs in each package are a headache.

We need one that's independent of package or transport:

// goroutine #1
result, err := conn.Do(x, req)

// goroutine #2
x.Cancel()

Context

A Context carries a cancelation signal and request-scoped values to all functions running on behalf of the same task. It's safe for concurrent access.

type Context interface {
    Done() <-chan struct{}                   // closed when this Context is canceled
    Err() error                              // why this Context was canceled
    Deadline() (deadline time.Time, ok bool) // when this Context will be canceled
    Value(key interface{}) interface{}       // data associated with this Context
}

Idiom: pass ctx as the first argument to a function.

import "golang.org/x/net/context"

// ReadFile reads file name and returns its contents.
// If ctx.Done is closed, ReadFile returns ctx.Err immediately.
func ReadFile(ctx context.Context, name string) ([]byte, error)

Examples and discussion in blog.golang.org/context.

Contexts are hierarchical

Context has no Cancel method; obtain a cancelable Context using WithCancel:

// WithCancel returns a copy of parent whose Done channel is closed as soon as
// parent.Done is closed or cancel is called.
func WithCancel(parent Context) (ctx Context, cancel CancelFunc)

Passing a Context to a function does not pass the ability to cancel that Context.

// goroutine #1
ctx, cancel := context.WithCancel(parent)
...
data, err := ReadFile(ctx, name)

// goroutine #2
cancel()

Contexts form a tree, any subtree of which can be canceled.

Why does Done return a channel?

Closing a channel works well as a broadcast signal.

After the last value has been received from a closed channel c, any receive from c will succeed without blocking, returning the zero value for the channel element.

Any number of goroutines can select on <-ctx.Done().

Examples and discussion in in blog.golang.org/pipelines.

Using close requires care.

Done returns a receive-only channel that can only be canceled using the cancel function returned by WithCancel. It ensures the channel is closed exactly once.

Context values

Contexts carry request-scoped values across API boundaries.

RPC clients encode Context values onto the wire.

RPC servers decode them into a new Context for the handler function.

Replicated Search

Example from Go Concurrency Patterns.

// Search runs query on a backend and returns the result.
type Search func(query string) Result
type Result struct {
    Hit string
    Err error
}

// First runs query on replicas and returns the first result.
func First(query string, replicas ...Search) Result {
    c := make(chan Result, len(replicas))
    search := func(replica Search) { c <- replica(query) }
    for _, replica := range replicas {
        go search(replica)
    }
    return <-c
}

Remaining searches may continue running after First returns.

Cancelable Search

// Search runs query on a backend and returns the result.
type Search func(ctx context.Context, query string) Result

// First runs query on replicas and returns the first result.
func First(ctx context.Context, query string, replicas ...Search) Result {
    c := make(chan Result, len(replicas))
    ctx, cancel := context.WithCancel(ctx)
    defer cancel()
    search := func(replica Search) { c <- replica(ctx, query) }
    for _, replica := range replicas {
        go search(replica)
    }
    select {
    case <-ctx.Done():
        return Result{Err: ctx.Err()}
    case r := <-c:
        return r
    }
}

Context plumbing

Goal: pass a Context parameter from each inbound RPC at a server through the call stack to each outgoing RPC.

func (*ServiceA) HandleRPC(ctx context.Context, a Arg) {
    f(a)
}

func f(a Args) {
    x.M(a)
}

func (x *X) M(a Args) {
    // TODO(sameer): pass a real Context here.
    serviceB.IssueRPC(context.TODO(), a)
}

Context plumbing (after)

func (*ServiceA) HandleRPC(ctx context.Context, a Arg) {
    f(ctx, a)
}

func f(ctx context.Context, a Args) {
    x.M(ctx, a)
}

func (x *X) M(ctx context.Context, a Args) {
    serviceB.IssueRPC(ctx, a)
}

Problem: Existing and future code

Google has millions of lines of Go code.

We've retrofitted the internal RPC and distributed file system APIs to take a Context.

Lots more to do, growing every day.

Why not use (something like) thread local storage?

C++ and Java pass request state in thread-local storage.

Requires no API changes, but ...
requires custom thread and callback libraries.

Mostly works, except when it doesn't. Failures are hard to debug.

Serious consequences if credential-passing bugs affect user privacy.

"Goroutine-local storage" doesn't exist, and even if it did,
request processing may flow between goroutines via channels.

We won't sacrifice clarity for convenience.

In Go, pass Context explicitly

Easy to tell when a Context passes between functions, goroutines, and processes.

Invest up front to make the system easier to maintain:

Go's awesome tools can help.

Automated refactoring

Initial State:

Pass context.TODO() to outbound RPCs.

context.TODO() is a sentinel for static analysis tools. Use it wherever a Context is needed but there isn't one available.

Iteration:

For each function F(x) whose body contains context.TODO(),

Repeat until context.TODO() is gone.

Finding relevant functions

The golang.org/x/tools/cmd/callgraph tool constructs the call graph of a Go program.

It uses whole-program pointer analysis to find dynamic calls (via interfaces or function values).

For context plumbing:

Find all functions on call paths from Context suppliers (inbound RPCs) to Context consumers (context.TODO).

Updating function calls

To change add all F(x) to F(context.TODO(), x):

gofmt -r

Works well for simple replacements:

gofmt -r 'pkg.F(a) -> pkg.FContext(context.TODO(), a)'

But this is too imprecise for methods. There may be many methods named M:

gofmt -r 'x.M(y) -> x.MContext(context.TODO(), y)'

We want to restrict the transformation to specific method signatures.

The eg tool

The golang.org/x/tools/cmd/eg tool performs precise example-based refactoring.

The before expression specifies a pattern and the after expression its replacement.

To replace x.M(y) with x.MContext(context.TODO(), y):

package P

import (
    "xpkg"
    "ypkg"

    "golang.org/x/net/context"
)

func before(x xpkg.X, y ypkg.Y) error {
    return x.M(y)
}

func after(x xpkg.X, y ypkg.Y) error {
    return x.MContext(context.TODO(), y)
}

Dealing with interfaces

We need to update dynamic calls to x.M(y).

If M called via interface I, then I.M also needs to change. The eg tool can update call sites with receiver type I.

When we change I, we need to update all of its implementations.

Find types assignable to I using golang.org/x/tools/go/types.

More to do here.

What about the standard library?

The Go 1.0 compatibility guarantee means we will not break existing code.

Interfaces like io.Reader and io.Writer are widely used.

For Google files, used a currying approach:

f, err := file.Open(ctx, "/gfs/cell/path")
...
fio := f.IO(ctx)  // returns an io.ReadWriteCloser that passes ctx
data, err := ioutil.ReadAll(fio)

For versioned public packages, add Context parameters in a new API version and provide eg templates to insert context.TODO().

More to do here.

Conclusion

Cancelation needs a uniform API across package boundaries.

Retrofitting code is hard, but Go is tool-friendly.

New code should use Context.

Links:

Thank you

Sameer Ajmani

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)