An unexpected side effect of using Bazel was that it ended up defining a user interface to our entire development system. Bazel can be told to run scripts as well as build them, so we configured our deploy script to depend on the built docker images of our services. That meant the deploy script was written assuming that it had access to all the freshly built services, and with one “run” command you could kick off whatever needed to be done to make that true. Because Bazel is careful to encode all dependencies a target might have, this meant that building all the services, including the most recent gRPC definitions, putting them into Docker Images, tagging those images, pushing them to Google Cloud, and finally deploying them all to a test environment, could be done with a single command: bazel run deploy:deploy-to-test . It didn’t matter if none of those services had been built yet on this machine, or if we had been building and deploying all day. Bazel only built what was necessary, all wrapped up in the one command.

That made our CI configuration story very straightforward. In our case, it only took a few lines of code to configure an environment in CircleCI, and then a single Bazel command built everything and deployed it in CI, just like we did locally. This harmony of local and CI environments made it easy to avoid problems with deployments or building in CI that appeared to work locally.

We configured Circle to deploy to a testing environment once tests passed on every branch in GitHub. When a pull request was merged to master, that triggered pushing that code directly to stage and then (if the smoke tests passed) production. Our Circle environment mirrored our development environment, and the commands we ran there were the same as the ones we ran locally.

Figure 5. CircleCI config file. Note how simple the commands are and that they are almost entirely by bazel. CircleCI has no knowledge of build steps or dependencies beyond this.

It wasn’t all roses. Bazel still felt fairly young, every time we used it for something new it took a day or so to figure out how to make it work. Some compatibilities just aren’t implemented, and the docs occasionally lagged. Releases don’t seem to be a huge priority there, for one of our issues we were told just to build Bazel top of tree on our own. We tried a handful of different methods of parameterizing our Kubernetes configuration until settling on something dead simple. And while kubernetes is a very nice abstraction it is still complex. Several hours were spent staring that the config wondering why two services couldn’t talk to each other.

Another issue is that most commercial CI vendors expect a single repository in a single language per project. Fortunately, by picking a dedicated build system like Bazel and using containers, we could sidestep this problem and have the CI systems trigger the same commands we used when building locally. Across the board, once things worked, they kept working, which made the struggles worth it.

The Sum of Its Parts

What did this all add up to? To add a new parameter to the communication between two services, our dev environment allowed us to work on both services in tandem. First, we updated the gRPC service definition to include the new parameter. Then, we worked on either the sender or the receiver first, briefly taking advantage of the fact that gRPC handles version mismatches by silently dropping new parameters.

Steps to build and deploy Step 1: > bazel run deploy:deploy-to-minikube Step 2: > echo “Done.”

Crucially, we could get started on both together, get the controller sending something, get the model receiving it, and have them both up and running with one command: bazel run deploy:deploy-to-minikube . That command generated the gRPC definitions for each language, rebuilt the services and their Docker Images, and restarted their pods in Minikube, which were restarted by their Kubernetes Deployments running the new Docker images. On first build that might take minutes as it compiles all the dependencies, but for incremental changes Bazel only builds the diffs, so it takes around 10 seconds.

Now we had a tight development loop where it was easy to make changes to either side of this new feature and immediately interact with it in the full Gyroscope environment, locally. If we were working on something complex, we could even write a short test script in any language that could use the same gRPC definitions to talk to our service in isolation.

This is how we laid the engineering foundation (i.e., meeting properties 1 through 4) for creating a sustainable data science system. We’ve also given you a framework to think about how your data science system stacks up: 4 questions to answer, 8 properties to meet, and 9 core stack components. If you don’t stack up well, consider swapping them out piecewise — there’s great options out there now. Indeed, we’re beginning to see convergence on the principle components required for a modern stack (i.e., Bazel, Docker, Kubernetes, gRPC), but there remains a long road toward integrating them together in a way that is effective, scalable, and follows best practices — Truss was instrumental in helping Gyroscope achieve that, today.