Making conda fast again

Explaining how we’ve created the mamba prototype, a solver for conda environments that is hopefully fast enough to support a conda-forge with hundreds of thousands of packages.

You might have seen the announcement on Twitter: at QuantStack we’ve been working on making a prototype of a conda-compatible package manager called mamba. Conda is a great tool to distribute data science packages. The community-led conda-forge comprises tons of awesome packages. The Anaconda company supplies us with recent and well integrated compilers. And conda-build is simply amazing to build binaries across different platforms (Windows, Linux and OS X). At QuantStack we use conda all the time to package Python, C++, Julia and R packages, and ship them to clients around the world.

However, due to the growth of conda-forge (it’s got over 60'000 packages for Linux right now) many users have made experience with the “conda: Solving environment” spinner. It’s been frustratingly slow for a while now.

At QuantStack, one of our main expertises is building High Performance applications for customers, mainly using C++.

To make conda faster we propose to

Build a Python extension using C++, pybind11 and compile it with all optimizations enabled

Use the existing libsolv library, that powers package managers like Fedora’s DNF or OpenSUSEs zypper and (like conda) performs SAT solving to satisfy all package dependencies correctly

For faster parsing of the repodata.json (already 35 MB of JSON for conda-forge) we use a library called simdjson which enables high speed parsing

With the prototype, we manage to solve environments in seconds, as demonstrated in the following video: