September 12, 2020 at 06:37

In my posts about embedding in Go last month, I provided multiple examples of different kinds of embeddings from the Go standard library. How did I find these examples?

I wish I could say it all comes from a deep familiarity with the breadth and depth of the standard library; instead, I combined the programming virtues of laziness and impatience and wrote a tool that found these examples for me.

In this post, I'm going to describe this tool and how you may go about writing such tools of your own to analyze real-world Go codebases to glean any insights you may be interested in.

The task Let's start by describing the requirement: we're interested in finding all instances of embeddings in Go code, and moreover - we'd like to know what kinds of embeddings they are and call it out in some way; i.e. distinguish interface-in-interface embeddings from struct-in-struct embeddings, and so on. I wrote earlier about the various compilation steps Go source code goes through. Many of these are available for Go tool writers as well, and it's worth spending a bit of time thinking about the level of information we need for our tool. For a deeper exploration of what it takes to analyze Go source code, I highly recommend reading this document. Just parsing the Go source code of a project won't do, because we'll need type information. Take this example struct from part 3 of the embedding post: type StatsConn struct { net . Conn BytesRead uint64 } We can figure out that net.Conn is an embedding from parsing this code and looking at the AST. But what kind of embedding is it? Is net.Conn an interface or a struct? For this, we'll have to run the AST through Go type checking; moreover, in the general case this ought to be cross-package, or even cross-module type checking because the embedded type net.Conn could be defined in a different package or module. Therefore, our tool should be able to perform cross-module type checking. If this sounds tricky, that's because it is! But worry not, Go has just the package to help us.

Finding embeddings It's time to show the code of the "find embeddings" tool. The full source code is available on GitHub. We'll start with the setup for configuring XTGP: import "golang.org/x/tools/go/packages" const mode packages . LoadMode = packages . NeedName | packages . NeedTypes | packages . NeedSyntax | packages . NeedTypesInfo func main () { flag . Usage = func () { out := flag . CommandLine . Output () fmt . Fprintln ( out , "usage: find-embeddings [options] <module dir>

" ) fmt . Fprintln ( out , "Options:" ) flag . PrintDefaults () } pattern := flag . String ( "pattern" , "./..." , "Go package pattern" ) flag . Parse () if flag . NArg () != 1 { log . Fatal ( "Expecting a single argument: directory of module" ) } var fset = token . NewFileSet () cfg := & packages . Config { Fset : fset , Mode : mode , Dir : flag . Args ()[ 0 ]} pkgs , err := packages . Load ( cfg , * pattern ) if err != nil { log . Fatal ( err ) } for _ , pkg := range pkgs { findInPackage ( pkg , fset ) } } The main entry point to XTGP is packages.Load , which takes a packages.Config object for configuration. The most important field to pay attention to is Mode, which specifies what XTGP should load. It's tempting to just ask for "everything", but this isn't necessarily the best approach in the general case, as it may take quite a while for large projects. For example, in our case we don't need NeetImports | NeedDeps , which would bring in the type-checked ASTs of all the transitive dependencies of our code. This is an expensive operation, as you can imagine! All we need for our tool is to look at dependencies sufficiently to glean the type information of their exported types; luckily, in Go this information is available cheaply (to support Go's famously fast parallel compilation process). Once we have the packages loaded, we get a slice of packages.Package values, through which we can perform our analysis. We invoke findInPackage for each such package. func findInPackage ( pkg * packages . Package , fset * token . FileSet ) { for _ , fileAst := range pkg . Syntax { ast . Inspect ( fileAst , func ( n ast . Node ) bool { if structTy , ok := n .( * ast . StructType ); ok { findInFields ( structTy . Fields , n , pkg . TypesInfo , fset ) } else if interfaceTy , ok := n .( * ast . InterfaceType ); ok { findInFields ( interfaceTy . Methods , n , pkg . TypesInfo , fset ) } return true }) } } This function has two important tasks: Invoke ast.Inspect to run a visitor function on every AST node in the package. Our visitor focuses on either an *ast.StructType or *ast.InterfaceType to look deeper into struct/interface declarations. Deal with a difference in how struct vs. interface fields are accessed ( Fields field for *ast.StructType , Methods field for *ast.InterfaceType ). Let's move on to findInFields : func findInFields ( fl * ast . FieldList , n ast . Node , tinfo * types . Info , fset * token . FileSet ) { type FieldReport struct { Name string Kind string Type types . Type } var reps [] FieldReport for _ , field := range fl . List { if field . Names == nil { tv , ok := tinfo . Types [ field . Type ] if ! ok { log . Fatal ( "not found" , field . Type ) } embName := fmt . Sprintf ( "%v" , field . Type ) _ , hostIsStruct := n .( * ast . StructType ) var kind string switch typ := tv . Type . Underlying ().( type ) { case * types . Struct : if hostIsStruct { kind = "struct (s@s)" } else { kind = "struct (s@i)" } reps = append ( reps , FieldReport { embName , kind , typ }) case * types . Interface : if hostIsStruct { kind = "interface (i@s)" } else { kind = "interface (i@i)" } reps = append ( reps , FieldReport { embName , kind , typ }) default : } } } if len ( reps ) > 0 { fmt . Printf ( "Found at %v

%v

" , fset . Position ( n . Pos ()), nodeString ( n , fset )) for _ , report := range reps { fmt . Printf ( "--> field '%s' is embedded %s: %s

" , report . Name , report . Kind , report . Type ) } fmt . Println ( "" ) } } This function is conceptually simple; it iterates over a slice of fields, focusing only on fields that are unnamed (i.e. embedded). For each field, it looks at its underlying type and its kind - is it a struct type, or an interface type? This is where inter-package type analysis is critical, because in the general case we have no way of knowing the type of fields without understanding the types imported from other packages. This is it! There's a bit of extra logic in findInFields to collect all embedded fields of a given struct/interface into a single place, but otherwise it does what we need - including distinguishing between the kinds of embedding. This simple tool can now be run on the Go standard library or real-world large projects (like k8s or hugo ) and report all the embeddings found therein.

Finding embeddings using go/analysis The example shown above uses the "raw" XTGP API to load packages. An alternative approach is to use the go/analysis framework, which saves us from some of the boilerplate: import "golang.org/x/tools/go/analysis" import "golang.org/x/tools/go/analysis/singlechecker" var EmbedAnalysis = & analysis . Analyzer { Name : "embedanalysis" , Doc : "reports embeddings" , Run : run , } func main () { singlechecker . Main ( EmbedAnalysis ) } func run ( pass * analysis . Pass ) ( interface {}, error ) { for _ , file := range pass . Files { ast . Inspect ( file , func ( n ast . Node ) bool { if structTy , ok := n .( * ast . StructType ); ok { findInFields ( structTy . Fields , n , pass . TypesInfo , pass . Fset ) } else if interfaceTy , ok := n .( * ast . InterfaceType ); ok { findInFields ( interfaceTy . Methods , n , pass . TypesInfo , pass . Fset ) } return true }) } return nil , nil } Note how short the main function becomes; by delegating to the go/analysis framework, we no longer need to explicitly initialize go/packages or handle command-line flags. The singlechecker helper from go/analysis does this for us. The rest of the code is very similar to the previous sample. run is the moral equivalent of findInPackage and does pretty much the same work, except that it has to operate on pass.Files instead of pkg.Syntax . It invokes findInFields for every struct or interface, and this function is exactly the same as shown above.