January 21, 2020

nullprogram.com/blog/2020/01/21/

This article was discussed on Hacker News, on reddit, and on Lobsters.

Regardless of your opinions of the Go programming language, the primary implementation of Go, gc, is an incredible piece of software engineering. Everyone ought to be blown away by it! Yet not only is it undervalued in general, even the Go community itself doesn’t fully appreciate it. It’s not perfect, but it has unique features never before seen in a toolchain.

In this article, when I say “Go” I’m referring to the gc compiler.

Building Go

Since Go 1.5, Go is implemented in Go. It also has no external dependencies, so to build Go only a Go compiler is required. On my laptop, building the latest version of Go takes only 43 seconds. That includes the compiler, linker, all cross-compilers, and the standard library.

$ tar xzf go1.13.6.src.tar.gz $ cd go/src/ $ ./make.bash

Cross-compiling Go — as in to build a Go toolchain for another platform supported by Go — only requires setting a couple of environment variables ( GOOS , GOARCH ). So, in a mere 43 seconds I can compile an entire toolchain for any supported host! If you already have a Go compiler on your system, there’s no reason to bother with binary releases. Just grab the source and build it. All can manage their own toolchain with ease!

Anyone who’s ever built a GCC or Clang+LLVM toolchain, especially anyone who’s built cross-compiler toolchains, should find this situation totally bonkers. How could it possibly be so easy and so fast? GCC’s configure script wouldn’t even finish before Go was already built.

Yes, this comparison is a bit apples and oranges. Both GCC and LLVM are more advanced compilers and produce much more efficient code, so of course there’s more to them, and of course they take longer to build. But does that completely justify the difference? This goes double for GCC and LLVM cross-compiler toolchains, which are, for the most part, very complex and difficult to build.

If you don’t already have Go, all you need is a C compiler and the Go 1.4 source code. Bootstrapping through Go 1.4 is easy, and I’ve done it a number of times. I keep a copy of the Go 1.4 tarball just for this reason.

How Go could improve: The linker could be better. Binaries are already too big, and getting bigger with each release. This problem is acknowledged by the Go developers:

The original linker was also simpler than it is now and its implementation fit in one Turing award winner’s head, so there’s little abstraction or modularity. Unfortunately, as the linker grew and evolved, it retained its lack of structure, and our sole Turing award winner retired.

The story for native interop (cgo) isn’t great either and requires trading away Go’s biggest strengths.

Package Management

Go has decentralized package management — or, more accurately, module management. There’s no central module manager or module registration. To use a Go module, it need only be hosted on a reachable network with a valid HTTPS certificate. Modules are named by a module path that includes its network location. This means there’s no land grab for popular module names.

An organization using Go does not need to trust an external package repository (PyPI, etc.), nor do they need to run an internal package repository for their own internal packages. In general it’s sufficient just to leverage the organization’s already-existing source control system.

Dependencies are locked to a particular version cryptographically. The upstream source cannot change a published module for those that already depend on it. They could still publish a new version with hostile changes, but one should be cautious about updating dependencies — a deliberate action — or even having dependencies in the first place (also).

With decentralized module management, you might think that each dependency host is a single point of failure — and you would be exactly right. If any dependency disappears, you can no longer build in a fresh checkout. Go has a solution for this: a module proxy. Before fetching the dependency directly, Go (optionally, configured via GOPROXY ) checks with a module proxy that may have cached the dependency. This eliminates the single point of failure. Google hosts a free module proxy service for the internet, but organizations should probably run their own module proxy internally, at least for external dependencies. This neatly solves the left-pad problem.

Honestly, this is a breath of fresh air. Decentralized modules are great idea and avoid most of the issues of a centralized package repositories.

How Go could improve: Go’s module management is a little too gung-ho about HTTPS and certificates. The module documentation is still incomplete, and the only way to get answers to some of my questions was either to find the relevant source code in Go or to simply experiment.

Normally I could experiment using my local system, but Go refuses to do anything with modules unless I go through HTTPS with valid certificates. Needing to do bunch of pointless configuration — creating a dummy CA, dummy localhost certificates, and setting it all up — really kills my momentum and motivation, and it delayed me in learning the new module system. Before modules, Go supported an -insecure flag, which was great for this sort of experimentation, but they removed it out of fear of misuse. I’ll decide my own risks, thank you very much.

An example of a question without a documented answer: If my module path is example.com/foo but my web server 301 redirects this request to example.com/foo/ , will Go follow this redirect and re-append ?go-get=1 ? (Yes.) Did I want to configure an HTTPS server just to test this? (No.)

Update: I’ve been alerted that Go 1.14 will introduce GOINSECURE as a finer-grained form of the old -insecure option. This nicely solves my experimentation issue!

Vendoring

I still haven’t even gotten to one of the most powerful and unique module features — a feature which the Go developers initially didn’t want to include. If you have a vendor/ directory at the root of your module, and you use -mod=vendor when compiling, Go will look in that directory for modules. Go’s build system before modules ( GOPATH ) had a similar mechanism.

This is called vendoring and the practice pre-dates Go itself. Just check your dependency sources directly into source control alongside your own sources and hook them into your build. Organizations will often use this internally to lock down dependencies and to avoid depending on external resources. Typically, vendoring is a lot of work. The project’s build system must cooperate with the dependency’s build system. Then eventually you may want to update a vendored dependency, which may require more build changes.

These issues have led to the rise of header libraries and amalgamations in C and C++: libraries that are trivial to integrate into any project.

Go’s module system fully automates vendoring, which it can do because it already orchestrates builds. A single command populates the vendor/ directory with all of the module’s current dependencies:

$ go mod vendor

Normally you might follow this up by checking it into source control, but that’s not the only way it’s useful. Instead a project could merely include the vendor/ directory in its release source tarball. That tarball would be the entire, standalone source for that release. With all external dependencies packed into the tarball, the program could be built entirely offline on any system with a Go compiler. This is incredibly useful for me personally.

Some open source projects not written in Go have dependencies-included releases like this (example), but it’s a ton of work. So, of course, it’s usually not done. However, any Go project (not using cgo) can accomplish this trivially without even thinking about it. This is a such big deal, and nobody’s talking about it!

There’s lots of discussion about Go the programming language, but I hardly see discussion about the amazing engineering that’s gone into Go itself. It’s an under-appreciated piece of technology!