For the past 8 years, Netflix has been building and evolving a robust microservice architecture in AWS. Throughout this evolution, we learned how to build reliable, performant services in AWS. Our microservice architecture decouples engineering teams from each other, allowing them to build, test and deploy their services as often as they want. This flexibility enables teams to maximize their delivery velocity. Velocity and reliability are paramount design considerations for any solution at Netflix.

As supported by our architecture, microservices provide their consumers with a client library that handles all of the IPC logic. This provides a number of benefits to both the service owner and the consumers. In addition to the consumption of client libraries, the majority of microservices are built on top of our runtime platform framework, which is composed of internal and open source libraries.

While service teams do have the flexibility to release as they please, their velocity can often be hampered by updates to any of the libraries they depend on. An upcoming product feature may require a number of microservices to pick up the latest version of a shared library or client library. Updating dependency versions carries risk.

Or put simply, managing dependencies is hard.

Updating your project’s dependencies could mean a number of potential issues, including:

Breaking API changes — This is the best case scenario. A compilation failure that breaks your build. Semantic versioning combined with dependency locking and dynamic version selectors should be sufficient for most teams to prevent this from happening, assuming cultural rigor around semver. However, locking yourself into a major version makes it that much harder to upgrade the company’s codebase, leading to prolonged maintenance of older libraries and configuration drift.

— This is the best case scenario. A compilation failure that breaks your build. Semantic versioning combined with dependency locking and dynamic version selectors should be sufficient for most teams to prevent this from happening, assuming cultural rigor around semver. However, locking yourself into a major version makes it that much harder to upgrade the company’s codebase, leading to prolonged maintenance of older libraries and configuration drift. Transitive dependency updates — Due to the JVM’s flat classpath, only a single version of a class can exist within an application. Build tools like Gradle and Maven handle version conflict resolution preventing multiple versions of the same library to be included. This will also mean that there is now code within your application that is running with a transitive dependency version that it has never been tested against.

— Due to the JVM’s flat classpath, only a single version of a class can exist within an application. Build tools like Gradle and Maven handle version conflict resolution preventing multiple versions of the same library to be included. This will also mean that there is now code within your application that is running with a transitive dependency version that it has never been tested against. Breaking functional changes — Welcome to the world of software development! Ideally this is mitigated by proper testing. Ideally, library owners are able to run their consumer’s contract tests to understand the functionality that is expected of them.

To address the challenges of managing dependencies at scale, we have observed companies moving towards two approaches: Share little and monorepos.

The share little approach (or don’t use shared libraries) has been recently popularized by the broader microservice movement. The share little approach states that no code should be shared between microservices. Services should only be coupled via their HTTP APIs. Some recommendations even go as far as to say that copy and paste is preferable to share libraries. This is the most extreme approach to decoupling.

The monorepo approach dictates that all source code for the organization live in a single source repository. Any code change should be compiled/tested against everything in the repository before being pushed to HEAD. There are no versions of internal libraries, just what is on HEAD. Commits are gated before they make it to HEAD. Third party library versions are generally limited to one of two “approved” versions.

While both approaches address the problems of managing dependencies at scale, they also impose certain challenges. The share little approach favors decoupling and engineering velocity, while sacrificing code reuse and consistency. The monorepo approach favors consistency and risk reduction, while sacrificing freedom by requiring gates to deploying changes. Adopting either approach would entail significant changes to our development infrastructure and runtime architecture. Additionally, both solutions would challenge our culture of Freedom and Responsibility.

The challenge we’ve posed to ourselves is this:

Can we provide engineers at Netflix the benefits of a monorepo and still maintaining the flexibility of distributed repositories?

Using the monorepo as our requirements specification, we began exploring alternative approaches to achieving the same benefits. What are the core problems that a monorepo approach strives to solve? Can we develop a solution that works within the confines of a traditional binary integration world, where code is shared?

Our approach, while still experimental, can be distilled into three key features:

Publisher feedback — provide the owner of shared code fast feedback as to which of their consumers they just broke, both direct and transitive. Also, allow teams to block releases based on downstream breakages. Currently, our engineering culture puts sole responsibility on consumers to resolve these issues. By giving library owners feedback on the impact they have to the rest of Netflix, we expect them to take on additional responsibility.

— provide the owner of shared code fast feedback as to which of their consumers they just broke, both direct and transitive. Also, allow teams to block releases based on downstream breakages. Currently, our engineering culture puts sole responsibility on consumers to resolve these issues. By giving library owners feedback on the impact they have to the rest of Netflix, we expect them to take on additional responsibility. Managed source — provide consumers with a means to safely increment library versions automatically as new versions are released. Since we are already testing each new library release against all downstreams, why not bump consumer versions and accelerate version adoption, safely.

— provide consumers with a means to safely increment library versions automatically as new versions are released. Since we are already testing each new library release against all downstreams, why not bump consumer versions and accelerate version adoption, safely. Distributed refactoring — provide owners of shared code a means to quickly find and globally refactor consumers of their API. We have started by issuing pull requests en masse to all Git repositories containing a consumer of a particular Java API. We’ve run some early experiments and expect to invest more in this area going forward.

We are just starting our journey. Our publisher feedback service is currently being alpha tested by a number of service teams and we plan to broaden adoption soon, with managed source not far behind. Our initial experiments with distributed refactoring have helped us understand how best to rapidly change code globally. We also see an opportunity to reduce the size of the overall dependency graph by leveraging tools we build in this space. We believe that expanding and cultivating this capability will allow teams at Netflix to achieve true organization-wide continuous integration and reduce, if not eliminate, the pain of managing dependencies.

If this challenge is of interest to you, we are actively hiring for this team. You can apply using one of the links below:

— Mike McGarr, Dianne Marsh and the Developer Productivity team