Automatic Stackdriver Tracing for gRPC

In monolithic systems, it is relatively easy to collect diagnostic data from the building blocks of a program. All modules live within one process and share common resources to report logs and errors.

Once you are distributing your system into microservices, it becomes harder to follow a call starting from the user’s entry point until a response is served. To address this problem, Google invented Dapper to instrument and analyze its production services. Dapper-like distributed tracing systems allow you to trace a user request from the entry point to the response.

Distribute tracing helps us to:

Diagnose and improve latency problems.

See the integration problems that are only visible in production.

See the fundamental architectural problems, e.g. critical bottlenecks that were not obvious without looking at the tracing data.

As a gRPC user, you are deploying distributed production services and being able to trace a user request end-to-end can easily be a critical fundamental requirement.

In this article, we are going to modify the helloworld example from the gRPC Go package to add tracing.

Import the trace package:

import "cloud.google.com/go/trace"

Initiate a trace client:

ctx := context.Background() tc, err := trace.NewClient(ctx, "project-id") if err != nil { log.Fatal(err) }

See the examples to learn how to set the auth. In the example above, we use the “Application Default Credentials”.

In order to initiate the greeter client, use the Stackdriver Trace client interceptor we are providing:

conn, err := grpc.Dial(address, grpc.WithInsecure(), grpc.WithUnaryInterceptor(tc.GRPCClientInterceptor())) if err != nil { log.Fatalf("did not connect: %v", err) } defer conn.Close() c := pb.NewGreeterClient(conn)

All the outgoing requests from c will be automatically traced:

span := tc.NewSpan("/foo") defer span.FinishWait() // use span.Finish() if your client is a long-running process. ctx = trace.NewContext(ctx, span) r, err := c.SayHello(ctx, &pb.HelloRequest{Name: name}) if err != nil { log.Fatalf("could not greet: %v", err) }

On the server side, in order to be able to receive the traces (and keep propagating), use the server interceptor we are providing when initializing a server:

s := grpc.NewServer(grpc.UnaryInterceptor(tc.GRPCServerInterceptor()))

Then, the server handlers will be able to access the trace.Span instances from the current calling context:

func (s *server) SayHello(ctx context.Context, in *pb.HelloRequest) (*pb.HelloReply, error) { span := trace.FromContext(ctx) // TODO: Use the span directly or keep using the context // to make more outgoing calls from this handler. // If you don't finish the span, it will be auto-finished // once this function returns. return &pb.HelloReply{Message: "Hello " + in.Name}, nil }

A single-hop from the client to server looks like below on the Stackdriver Trace console:

But things are getting more exciting as you begin to depend on more services to serve your user requests:

Similar to the gRPC interceptors, I also contributed a few HTTP utilities to enable tracing support for your HTTP-speaking microservices. See NewHTTPClient and HTTPHandler for more information and examples.

What’s next?

In the past few months, I have been privileged to work on Go distributed tracing APIs on a part-time basis. We experimented a lot, addressed many critical open questions, and worked hard to achieve a very minimal backend-agnostic tracing API for the entire Go ecosystem.

Achieving common APIs will make distributed tracing more accessible, make our libraries trace-aware and create opportunity to reuse our utilities. I am looking forward to share this work in the upcoming weeks.