Freezing Python’s Dependency Hell in 2018

A simpler solution for a complicated problem

The challenge of managing dependencies in Python has been described by many different people. It’s a storied past, which has left a history of conflicting posts across the web. Even with state-of-the-art best practices, you can still end up in Dependency Hell when adding new dependencies, because there is an open issue for pip first reported in 2013 to implement dependency resolution. The usual series of workarounds ends up like this relevant XKCD:

At Instacart, we’re automating our best practices in Lore, so Data Scientists and Machine Learning Engineers can trivially replicate their work on any computer in any environment, without spending time in Dependency Hell. This helps multiple people collaborate on a single project and switch from one project to another as easily as changing directories. It also eliminates random production issues from unintentional changes to secondary dependencies.

When every App maintains its own virtualenv, individual contributors are empowered to manage their dependency updates reliably, without having to update the entire company’s codebase to the latest version. There are still Python 2 vs 3 debates ten years after release, partly because old monolithic code bases hold new projects back.

Best Practices (in 2018)

Don’t rely on humans to follow best practices. Write code to do it for them. Use a fresh virtualenv for each project pip freeze > requirements.txt on every change Specify your exact Python version in runtime.txt Project code should be organized in a Python module

Lore’s open source command line takes care of everything required to satisfy these steps, without adding any environment variables, updating PATHs, or extra commands. It’s a natural workflow that uses the current working directory, and it’s trivial to install: pip install lore

All lore commands will pass extra arguments to their delegate. Using lore means you don’t need some combination of brew, apt-get, anaconda, miniconda, pipenv, pyenv, pyvenv, venv, virtualenv etc. Lore is lightweight and modular by design and will not add any other entries to your App’s requirements.txt . It stands on the shoulders of pip, pyenv and virtualenv behind the scenes to avoid reinventing those wheels.

Modern Software Architecture

Why not use ____?

brew, apt-get and other OS package managers don’t allow you to specify your Python minor or patch versions, and will force upgrade you regularly.

Docker sort of solves this problem, by freezing your OS image, but this still doesn’t allow specific control of Python or dependency versions.

Pyenv gives us fine grained control of multiple Python versions, but doesn’t deal with package dependencies.

Pipfile looks promising for managing package dependencies, but is under active development. We may adopt this as an alternative if/when it reaches maturity, but for the time being we use requirements.txt .

. autoenv, direnv, .venv and others that automagically change your $PATH or other environment variables prevent access to your system Python (and packages) when you’re in those project directories, which will break any shell script that uses #!/usr/bin/env python .

or other environment variables prevent access to your system Python (and packages) when you’re in those project directories, which will break any shell script that uses . Anaconda requires a large installation up front, and while monolithic dependency management that just works is great if you’re the only person working on the code, it makes it harder for other people to replicate your work, unless you also use requirements.txt . That means pip is all that is actually necessary for other contributors to collaborate.

. That means pip is all that is actually necessary for other contributors to collaborate. (pyenv + virtualenv + pip) or (miniconda + environment.yml) start to look like minimum viable products, but they rely on people to actually know and consistently use their best practices. Many people don’t and won’t, because frankly, we’re concerned with bigger things. These workflows rely on senior team members to catch and corral.

In addition, there is nuance around whether or not you should freeze all packages to patch versions. The hope is that if you don’t freeze any versions, you get free upgrades from all those upstream library developers. In reality what you’ll notice are the bugs and breaking changes that randomly get introduced into your continuous integration pipeline, or that the next developer to checkout your project needs to spend 30 minutes figuring out the dependency versions that work, rather than what the most recent versions are. It’s difficult to track down the source of these breakages, because they’re not in your own code and the changes were not tracked or intentional. The same logic applies to patch version changes in Python itself.

The nuance is that library maintainers, rather than application developers, should be encouraged to white list ranges of tested dependency versions to reduce the likelihood of causing downstream dependency conflicts with other libraries.

How To

If you want to use these best practices for any project, it takes about 2 minutes to complete the one-time setup:

If you’re creating a brand new App, lore init my_app will create the directory my_app with a template scaffold from scratch, --bare skips scaffold creation for existing projects. Anyone who checks out a Lore App will instantly be at home. When they change to the directory, and run lore test for the first time, all dependencies will be installed in a brand new virtualenv on their machine.

Lore produces reliable builds for CI testing and deployment as well. Python versions and virtualenv packages are Russian Doll cached on the machine for fast and efficient repeatability across many projects.

When you import lore , all dependencies will be checked to fail fast if there is a version mismatch or unsatisfied requirement. In development or test environments, new requirements are automatically added to requirements.txt and pushed down the CI pipeline.

Lore is rigorous. If you manually launch a python process from outside the virtualenv and try to import an App’s module, it will reboot Python with the correct version in the correct env with the correct dependencies, or die trying (with a helpful error message). Nobody should be wasting time chasing spurious errors caused by subtle dependency bugs.

Of course, all of this is configurable via environment variables, configuration directories, hidden .env files, or Python code. We believe strongly in convention over configuration, and also that rules are meant to be broken.

Full disclosure

Lore dependency management is limited on Windows to the currently installed system Python version, since pyenv is not Windows compatible. We’d love to fix this.

Lore adds a few hundred milliseconds to application startup, because it reboots Python into the virtualenv. If that time matters to you, launch lore directly in the correct virtualenv with the appropriate path like ~/.pyenv/versions/3.6.6/envs/my_app/bin/lore . You can find this path and more in lore env .