Python Packaging has recently been discussed a lot, but the articles usually only focus on publishing (open source) code to PyPI.

But what do you do when your organization uses Python for in-house development and you can’t (or don’t want to) make everything Open Source? Where do you store and manage your code? How do you distribute your packages?

In this article, I describe how we solve this problem with GitLab, Conda and a few other tools.

You can find all code and examples referenced in this article under gitlab.com/ownconda. These tools and examples are using the own prefix in order to make a clear distinction between our own and third-party code. I will not necessarily update and fix the code, but it is released under the Blue Oak license so you can copy and use it. Any feedback is welcome, nonetheless.

Because we need to package more than just Python, we chose to use Conda. This dates back to at least to Conda v2.1 which was released in 2013. At that time, projects like conda-forge weren’t even in sight.

Conda-forge is a (relatively) new project that has a huge library of Conda recipes and packages. However, if you want full control over your own packages you may want to host and build everything on your own.

Conda stems from the scientific community and is being developed by Anaconda . In contrast to Pip, Conda is a full-fledged package manager similar to apt or dnf . Like virtualenv, Conda can create isolated virtual environments. Conda is not directly compatible with Python’s setup.py or pyproject.toml files. Instead, you have to create a Conda recipe for every package and build it with conda-build . This is a bit more involved because you have to convert every package that you find on PyPI, but it also lets you patch and extend every package. With very little effort you can create a self-extracting Python distribution with a selection of custom packages (similar to the Miniconda distribution).

Pip is the official package installer for Python. It supports Python source distributions and (binary) Wheel packages. Pip only installs files in the current environment’s site-packages directory and can optionally create entry points in its bin directory. You can use Virtualenv to isolate different projects from another, and Devpi to host your own package index. Devpi can both, mirror/cache PyPI and store your own packages . The Python packaging ecosystem is overlooked by the Python Packaging Authority working group ( PyPA ).

GitLab has a free core and some paid versions that add more features and support.

The only tool that (currently) meets these requirements is GitLab. It has a lot more features that are very useful for an organization wide use, e.g., LDAP and Kerberos support, issue labels and boards, Mattermost integration or Git LFS support. And—more importantly—it also has a really nice UX and is one of the few pieces of software that I actually enjoy using.

Though you could use private repositories from one of the well-known cloud services, you should probably use a self-hosted service to retain full control over your code. In some countries it may even be forbidden to use a US cloud service for your organization’s data.

In this section I’ll briefly explain the reasons why we are using GitLab and Conda.

The subject of packaging consists of several components: The platforms on which your code needs to build and run, the package manager and repository, management of external and internal packages, a custom Python distribution, and means to keep an overview over all packages and their dependencies. I will go into detail about each aspect in the following sections.

Runtime and build environment Our packages need to run on Fedora desktop systems and on Centos 7. Packages built on Centos also run on Fedora, so we only have a single build environment: Centos 7. We use different Docker images for our build pipeline and some deployments. The most important ones are centos7-ownconda-runtime and centos7-ownconda-develop. The former only contains a minimal setup to install and run Conda packages while the latter includes all build dependencies, conda-build and the ownconda tools. If your OS landscape is more heterogeneous, you may need to add more build environments which makes things a bit more complicated—especially if you need to support macOS or even Windows. To build Docker images in our GitLab pipelines, we use docker-in-docker. That means that the GitLab runners start docker containers that can access /var/run/dockers.sock to run docker build . GitLab provides a Docker registry that allows any project to host its own images. However, if a project is private, other project’s pipelines can not access these images. For this reason, we have decided to serve Docker images from a separate host.

3rd party packages We re-package all external dependencies as Conda packages and host them in our own Conda repository. This has several benefits: We can prohibit installing Software from other sources than our internal Conda repository.

If users want to depend on new libraries, we can propose alternatives that we might already have on our index. This keeps our tree of dependencies a bit smaller.

We cannot accidentally depend on packages with “bad” licenses.

We can add patches to fix bugs or extend the functionality of a package (e.g., we added our internal root certificate to Certifi).

We can reduce network traffic to external servers and are less dependent on their availability. Recipe organization We can either put the recipe for every package into its own repository (which is what conda-forge does) or use a single repository for all recipes (which is what we are doing). The multi-repository approach makes it easier to only build packages that have changed. It also makes it easier to manage access levels if you have a lot of contributors that each only manage a few packages. The single-repository approach has less overhead if you only have a few maintainers that take care of all the recipes. To identify updated packages that need re-building, we can use ownconda’s show-updated-recipes command. Linking against system packages With Conda, we can (and must) decide whether we want to link against system packages (e.g., installed with yum or use other Conda packages to satisfy a package’s dependencies. One extreme would be to only build Python packages on our own and completely depend on system packages for all C libraries. The other extreme would be to build everything on our own, even glibc and gcc. The former has a lot less overhead but becomes the more fragile the more heterogeneous your runtime environments become. The latter is a lot more complicated and involved but gives you more control and reliability. We decided to take the middle ground between these two extremes: We build many libraries on our own but rely on the system’s gcc, glibc, and X11 libraries. This is quite similar to what the manylinux standard for Python Wheels does. Recipes must list the system libraries that they link against. The rules for valid system libraries are encoded in ownconda validate-recipes and enforced by conda-build’s –error-overlinking option. Recipe management Recipes for Python packages can easily be created with ownconda pypi-recipe. This is similar to conda skeleton pypi but tailored to our needs. Recipes for other packages have to be created manually. We also implemented an update check for our recipes. Every recipe contains a script called update_check.py which uses one of the update checkers provided by the ownconda tools. These checkers can query PyPI, GitHub release lists and (FTP) directory listings, or crawl an entire website. The command ownconda check-for-updates runs the update scripts and compares the version numbers they find against the recipes’ current versions. It can also print URLs to the packages’ changelogs: $ own check-for-updates --verbose . [████████████████████████████████████] 100% Package: latest version (current version) freetype 2.10.0 (2.9.1): https://www.freetype.org/index.html#news python-attrs 19.1.0 (18.2.0): http://www.attrs.org/en/stable/changelog.html python-certifi 2019.3.9 (2018.11.29): https://github.com/certifi/python-certifi/commits/master ... qt5 5.12.2 (5.12.1): https://wiki.qt.io/Qt_5.12.2_Change_Files readline 8.0.0 (7.0.5): https://tiswww.case.edu/php/chet/readline/CHANGES We can then update all recipes with ownconda update-recipes: $ ownconda update-recipes python-attrs ... python-attrs cd /data/ssd/home/stefan/Projects/ownconda/external-recipes && /home/stefan/ownconda/bin/python -m own_conda_tools pypi-recipe attrs -u diff --git a/python-attrs/meta.yaml b/python-attrs/meta.yaml index 7d167a8..9b3ea20 100644 --- a/python-attrs/meta.yaml +++ b/python-attrs/meta.yaml @@ -1,10 +1,10 @@ package: name: attrs - version: 18.2.0 + version: 19.1.0 source: - url: https://files.pythonhosted.org/packages/0f/9e/26b1d194aab960063b266170e53c39f73ea0d0d3f5ce23313e0ec8ee9bdf/attrs-18.2.0.tar.gz - sha256: 10cbf6e27dbce8c30807caf056c8eb50917e0eaafe86347671b57254006c3e69 + url: https://files.pythonhosted.org/packages/cc/d9/931a24cc5394f19383fbbe3e1147a0291276afa43a0dc3ed0d6cd9fda813/attrs-19.1.0.tar.gz + sha256: f0b870f674851ecbfbbbd364d6b5cbdff9dcedbc7f3f5e18a6891057f21fe399 build: - number: 1 + number: 0 ... Example recipes You can find the recipes for all packages required to run the ownconda tools here. As a bonus, I also added the recipes for NumPy and PyQt5.

Internal projects Internal packages are structured in a similar way to most projects that you see on PyPI. We put the source code into src , the pytest tests into tests and the Sphinx docs into docs . We do not use namespace packages. They can lead to various nasty bugs. Instead, we just prefix all packages with own_ to avoid name clashes with other packages and to easily tell internal and external packages apart. A project usually contains at least these files and directories. The biggest difference to “normal” Python projects is the additional Conda recipe in each project. It contains all meta data and the requirements. The setup.py contains only the minimum amount of information to get the package installed via pip: Conda-build runs it to build the Conda package.

runs it to build the Conda package. ownconda develop runs it to install the package in editable mode. ownconda develop also creates/updates a Conda environment for the current project and installs all requirements that it collects from the project’s recipe. Projects also contain a .gitlab-ci.yml which defines the GitLab CI/CD pipeline. Most projects have at least a build, a test and an upload stage. The test stage is split into parallel steps for various test tools (e.g., pytest, pylint and bandit). Projects can optionally build documentation and upload it to our docs server. The ownconda tools provide helpers for all of these steps: ownconda build builds the package.

ownconda test runs pytest.

ownconda lint runs pylint.

ownconda sec-check runs bandit.

ownconda upload uploads the package to the package index.

ownconda make-docs builds and uploads the documentation. We also use our own Git flow: Development happens in a develop branch. Builds from this branch are uploaded into a staging Conda channel.

Larger features can optionally branch of a feature branch. Their builds are not uploaded into a public Conda channel.

Stable develop states get merged into the master branch. Builds are uploaded into our stable Conda channel.

Since we continuously deploy packages, we don’t put a lot of effort into versioning. The package version consists of a major release which rarely changes and the number of commits since the last tagged major release. The GitLab pipeline ID is used as a build number: Version: $GIT_DESCRIBE_TAG.$GIT_DESCRIBE_NUMBER Build: py37_$CI_PIPELINE_ID The required values are automatically exported by Conda and GitLab as environment variables.



Package and documentation hosting Hosting a Conda repository is very easy. In fact, you can just run python -m http.server in your local Conda base directory if you previously built any packages. You can then use it like this: conda search --override-channels --channel=http://localhost:8000/conda-bld PKG . A Conda repository consists of one or more channels. Each channel is a directory that contains a noarch directory and additional platform directories (like linux-64 ). You put your packages into these directories and run conda index channel/platform to create an index for each platform (you can omit the platform with newer versions of conda-build). The noarch directory must always exist, even if you put all your packages into the linux-64 directory. The base URL for our Conda channels is https://forge.services.own/conda/channel . You can put a static index.html into each channel’s directory that parses the repo data and displays it nicely: A JavaScript reads and renders the contents of a channel’s repodata.json . The upload service (for packages created in GitLab pipelines) resides under https://forge.services.own/upload/<channel> . It is a simple web application that stores the uploaded file in channel/linux-64 and runs conda index . For packages uploaded to the stable channel, it also creates a hard link in a special archive channel. Every week, we prune our channels with ownconda prune-index. In case that we accidentally prune too aggressively, we have the option to restore packages from the archive. We also host our own Read the Docs like service. GitLab pipelines can upload Sphinx documentation to https://forge.services.own/docs via ownconda make-docs. Note The server name forge does not refer to conda-forge but to SourceForge.net, which was quite popular back in the days.

Python distribution With Constructor, you can easily create your own self-extractable Python distribution. These distributions are similar to miniconda, but you can customize them to your needs. A constructor file is a simple YAML file with some meta data (e.g., the distribution name and version) and the list of packages that should be included. You can also specify a post-install script. The command constructor <distdir>/construct.yaml will then download all packages and put them into a self extracting Bash script. We upload the installer scripts onto our Conda index, too. Instead of managing multiple construct.yaml files manually, we create them dynamically in a GitLab pipeline which makes building multiple similar distributions (e.g., for different Python versions) a bit easier.

Deployment We are currently on the road from copy-stuff-with-fabric-to-vms to docker-kubernetes-yay-land. I am not going to go too much into detail here—this topic is not directly related to packaging and worth its own article. Most of our deployments are now Ansible based. Projects contain an ansible directory with the required playbooks and other files. Shared roles are managed in a separate ownsible project. The ansible deployments are usually part of the GitLab CI/CD pipeline. Some are run automatically, some need to be triggered manually. Some newer projects are already using Docker based deployments. Docker images are built as part of the pipeline and uploaded into our Docker registry from which they are then pulled for deployments.