In recent adventures for work, I once again found myself on a team with an interesting and relatively new-to-me problem that I hadn’t been forced to consider quite so much before.

This team is responsible for supplying a variety of web apps built on a modern stack (mostly Celery, Django, nginx and Redis), but have almost no control over the infrastructure on which it runs, and boy, is some of that infrastructure old and stinky.

We have no root access to these servers, most software configuration requires a ticket with a lead time of 48 hours plus, and the watchful eyes of a crusty old administrator and obtuse change management process. The machines are so old that many are still running on real hardware, and those that are VMs still run some ancient variety of Red Hat Linux, with, if we’re lucky, Python 2.4 installed.

Naturally the Python world moves quickly, and, as anyone who ever worked on real code behind closed doors (as opposed to the joyous freshness of hobby projects that can run the latest and greatest) can attest, the Python community’s attitude towards compatibility is more or less atrocious. The thought of deploying a modern Django app using Python 2.4 is simply out of the question, and if it weren’t a problem with Python, then outdated core libraries (such as OpenSSL) would soon become a problem.

This won’t be about compatibility (oh my, could I spend words on that), but instead how to sanely solve the unfortunate situation of deploying modern apps to ancient infrastructure, when you have no control over the base Linux system, and when that Linux system sucks.

A Bad Situation

Naturally if the base system is unusable you are going to have to replace it, or say, rewrite your shiny new application in Perl 4. But how? On arrival, I was greeted with selection of huge shell scripts running under Jenkins that manually fetched individual source tarballs for Python, Nginx, OpenSSL, and every other dependent library from the Internet, ./configure --prefix=/home/ourapp , make install , and so on.

The result was effectively a miniature Linux distribution hand-cobbled together as and when needs arose. Should a new feature of an application require, say, the Python ImageMagick package, one could easily expect to spend half a day editing shell scripts in a massively wasteful loop retrying the build repeatedly, iteratively updating it to include base library dependencies until everything built, and the desired Python binding could be installed.

That wasn’t the worst part. The cruddy old base system was building under Docker, using a slightly more modern version of Red Hat Linux than installed in production. Some of the builds were succeeding only because a newer-than-available library dependency was satisfied by the Docker image, and so errors like below were commonplace:

libpcre.so.1: cannot open shared object file: No such file or directory

The solution to these library problems in the usual case was to simply copy the versions installed in the Docker container, resulting in a final hand-cobbled base system that was a mixture of outdated self-built programs that relied on a random smattering of libraries sourced only from some untrusted third party Docker image.

In other yet more horrific cases, the problem was solved by symlinking older versions of a library to a different filename, explicitly violating the library author’s signal that binary compatibility had changed, and provoking some delightfully impossible to debug unpredictable crash sometime deep into the future.

In a setup like this, asking a simple question like “do our apps have any security vulnerabilities?” becomes an impossible half-day’s work to answer. “Let me just check the package manage.. oh, we don’t have one of those. Actually we do, but there are two package managers to consult depending on which libary is involved, one of which we have no control over, and a huge ugly shell script to manually audit.”

Since this home-grown combobulation of binaries depended on libraries that weren’t installed in system paths, it wasn’t even possible to run them without first sourcing a shell script to set LD_LIBRARY_PATH so the runtime linker could find the libraries in our nonstandard location.

In short, the situation was a total mess.

Enter pkgsrc

The pkgsrc project is a huge collection of package descriptions for modern software releases, and forms the basis for NetBSD’s equivalent to Linux’s apt-get or yum commands. For Mac users it is most similar to MacPorts or Homebrew, but unlike those systems it not only targets NetBSD but also Linux, OS X, and a bunch of lesser loved operating systems. To get an idea of the scope of pkgsrc you can browse the collection.

Unlike our mess that came before it, pkgsrc is actively maintained by a large community, the program and library combinations in its tree can be relied on to be tested and intercompatible, with quarterly stable releases, and it even comes with tooling to scan for security vulnerabilities.

Best of all, it does not require root to install. After a relatively short bootstrap process completes, a comprehensive modern UNIX system can be installed into a non-root filesystem location on any supported OS, such as our application’s home directory on its ancient production machine.

Since pkgsrc is a real grown up solution, we can ask it to build from source, say, ImageMagick, and it will automatically figure out all the dependencies required to build ImageMagick, and recursively build them in a supported configuration that is in use by many people.

The level of foresight on the maintainers’ part is quite amazing. In order to address accidentally picking up random dependencies from our Docker image, pkgsrc has a feature to allow on a per-dependency basis preferring any system-installed version if available (the default), preferring a fresh pkgsrc-built and managed version, or simply to always prefer building fresh up-to-date tools and libaries from the tree, which is exactly what we want, as it produces a base system that depends only on the installed Linux distribution for its kernel and C library, which are very stable dependencies.

For icing on the cake, packages build by default with useful compile-time features such as consistent use of the ELF rpath header, allowing programs to run in the target environment without requiring LD_LIBRARY_PATH to be set at all, since the built binaries contain internal annotations telling the runtime linker where its library dependencies are to be found.

Linker flags enabling this feature neatly percolate down through the Python build process, leaving a configuration of distutils that knows how to build Python extension modules with the same suite of flags, without any further deviation from the usual “ /home/ourapp/root/bin/pip install numpy ”.

pkgsrc Crash Course

All that is required to get a build up and running is a compiler suite (e.g. the build-essential metapackage on Debian), and a copy of the pkgsrc tree tarball.

wget https://ftp.netbsd.org/pub/pkgsrc/pkgsrc-2016Q1/pkgsrc-2016Q1.tar.bz2 tar jxvf pkgsrc-2016Q1.tar.bz2

Once the tarball is unpacked, a short bootstrap shell script must be run in order to build a copy of bmake , the make implementation used by the tree along with pkgutils , a small binary package administration tool akin to rpm or dpkg .

SH=/bin/bash pkgsrc/bootstrap/bootstrap \ --unprivileged \ --prefix /home/ourapp/root \ --make-jobs 4

Important note about –prefix: the supplied directory must be one that can be written to by your application’s user account in all its target environments. For example, if your staging and production environment have different usernames, to avoid having to rebuild pkgsrc with a different configuration for both targets, it may be wise to simply update every environment to have the same username. Where this isn’t possible, the $ORIGIN trick described later may be interesting.

Configuration of the build is done by way of <prefix>/etc/mk.conf , where global bmake variables are set. The defaults require no editing in the usual case, but for our purposes, we want to enable that tasty feature of force-building all dependencies from scratch:

echo PREFER_PKGSRC = yes >> /home/ourapp/root/etc/mk.conf

For demonstration purposes, and to show just how easy it is to get a perfect result, let’s also tell the nginx package to enable its uwsgi protocol module when it gets built:

echo PKG_OPTIONS.nginx = uwsgi >> /home/ourapp/root/etc/mk.conf

Now all that’s left is to build some useful software! That is done by running bmake install from inside the software’s directory inside the tree. Of course since bmake is not installed globally, we must use an absolute path:

cd pkgsrc/www/nginx /home/ourapp/root/bin/bmake install

On a fast machine, by the time lunch is finished you should return to an up-to-date build of Nginx installed in /home/ourapp/root/bin , with all dependencies (such as PCRE and OpenSSL) correctly built and linked so that invoking /home/ourapp/root/bin/nginx is all required to start a vulnerability-free correctly compiled Nginx.

Best of all, the binaries in /home/ourapp/root can reasonably be expected to run on any version of Linux available in the past 15 years: simply extract a tarball of the build to the target machine.

Putting It All Together

Below is the final Jenkins script that builds the base system for one of our applications. The final /home/ourapp/root is quite huge, coming in close to 500mb, but that includes every tool required during build, along with every library we never knew we depended on built from a recent supported version.

The script expects to run under Docker, with a --volume mounted from /workspace that points back to the Jenkins job’s workspace directory.

#!/bin/bash -ex INSTALL_PREFIX=/home/ourapp/root mkdir -p /workspace/output trap 'chown -R $(stat -c "%u:%g" /workspace) /workspace/output' EXIT curl -sS https://ftp.netbsd.org/pub/pkgsrc/pkgsrc-2016Q1/pkgsrc-2016Q1.tar.bz2 \ | tar -jx export SH=/bin/bash pkgsrc/bootstrap/bootstrap \ --unprivileged \ --prefix $INSTALL_PREFIX \ --make-jobs 4 cat >> $INSTALL_PREFIX/etc/mk.conf <<-EOF PREFER_PKGSRC = yes PKG_OPTIONS.nginx = uwsgi EOF $INSTALL_PREFIX/bin/bmake -C pkgsrc/lang/python27 install $INSTALL_PREFIX/bin/bmake -C pkgsrc/devel/py-pip install $INSTALL_PREFIX/bin/bmake -C pkgsrc/devel/py-readline install $INSTALL_PREFIX/bin/bmake -C pkgsrc/databases/py-ldap install $INSTALL_PREFIX/bin/bmake -C pkgsrc/databases/redis install $INSTALL_PREFIX/bin/bmake -C pkgsrc/databases/py-sqlite3 install $INSTALL_PREFIX/bin/bmake -C pkgsrc/databases/py-mysqldb install $INSTALL_PREFIX/bin/bmake -C pkgsrc/www/py-uwsgi install $INSTALL_PREFIX/bin/bmake -C pkgsrc/www/nginx install $INSTALL_PREFIX/bin/bmake -C pkgsrc/textproc/libxml2 install ln -s $INSTALL_PREFIX/bin/python{2.7,} ln -s $INSTALL_PREFIX/bin/pip{2.7,} ln -s $INSTALL_PREFIX/bin/uwsgi{-2.7,} tar czf /workspace/output/ourapp-system.tar.gz $INSTALL_PREFIX

The Jenkins job output is an ourapp-system.tar.gz to be extracted on the target machine.

Risks

Of course nothing comes without downsides. The major risk with pkgsrc is the sheer complexity of the makefiles that implement it. If your use case isn’t covered by the provided tree, then significant engineering and forward maintenance may be required to modify it and keep your modifications up to date.

An example of such a modification would be if your project needs an exact library version that differs from that available in the pkgsrc release tree. Thankfully such needs are rare, but it was overwhelmingly my primary concern while evaluating pkgsrc for this solution. Another complexity scare story is that when pkgsrc goes wrong, boy, does it generate some inscrutable errors.

I stopped worrying when I realized these problems are equally as hard to solve with apt-get or yum , C compilers are quite capable of producing equally as weird errors, and after a coworker helped me realize pkgsrc need not be a 100% solution: a broken package could simply be manually compiled just as it was in the past, with the remainder handled automatically.

Despite saying that, one ongoing minor source of worry is the size of the pkgsrc user community relative to a typical Linux package manager’s community. Stack Overflow is much less likely to contain an answer for the exact error you’re encountering.

I’m reaffirmed in the knowledge that the small community that does exist around NetBSD is both extremely helpful and in general vastly more clueful on average than the typical user support communities available in mainstream Linux.

Cutting Down On Build Size

One area that did not concern me so much is the size of the built tree. 500mb is nothing for us, but it is easy to imagine cases where this could be troublesome. Thankfully there are plenty of options to solve it, such as:

More granular PREFER_PKGSRC settings to avoid rebuilding tools like perl that really don’t need to be rebuilt. In our case, it was simpler to just take a slightly bloated tarball full of supported but useless-at-runtime tools than it was to try finding the minimum set required.





settings to avoid rebuilding tools like that really don’t need to be rebuilt. In our case, it was simpler to just take a slightly bloated tarball full of supported but useless-at-runtime tools than it was to try finding the minimum set required. Use of the pkg_delete command to remove tools that aren’t required after build. I just couldn’t be bothered with this, it’s ongoing maintenance for maintenance’ sake.





command to remove tools that aren’t required after build. I just couldn’t be bothered with this, it’s ongoing maintenance for maintenance’ sake. Brute force rm -rf of useless directories like locale files, /home/ourapp/root/share/doc , share/man , etc. Again the cost of the fat tarball must be weighed against the unpredictable man hours involved in rebuilding everything in an emergency since an important file was accidentally deleted, and a new application release triggers latent breakage.





The $ORIGIN Trick

There is one final avenue I itched to explore, but could not find the personal motivation nor justification for business hours to chase. That is the use of the runtime linker’s $ORIGIN feature, which allows the ELF rpath header to be set relative to the location of the binary containing the header. In other words, it allows for (almost) fully relocateable builds, where the filesystem path of the build tree is no longer a fixed variable.

A working build that used $ORIGIN would have allowed our group to have a single pkgsrc build for all our applications (which run with a variety of uncontrollable home directory paths), with the minor downside that a variety of installed applications might randomly fail since they hard-coded paths to their configuration files (and suchlike) using the build-time supplied --prefix .

Again, this solution was not fully explored, but it seem in principle quite possible to implement with careful testing of the produced tree, and explicit configuration of paths on the command-line and in configuration files for every program (such as Nginx).

After building a tree, one need simply use the chrpath utility to update the ELF headers for every installed program and library in the build, so that it includes a rpath relative to the program or library’s location.

For example, /home/ourapp/root/bin/nginx normally builds with an rpath of /home/ourapp/root/lib . Using $ORIGIN , that would change to $ORIGIN/../lib . Now the runtime linker knows how to find PCRE and OpenSSL independent of the absolute filesystem prefix.

I never finished testing this approach, but in case the reader finds it desirable, when run from inside --prefix , the following short script takes care to manage setting $ORIGIN for every binary correctly regardless of its subpath depth under --prefix :

import os import subprocess def chrpath(path): depth = path.count('/') - 1 libdir = '$ORIGIN/%slib' % ('../' * depth,) subprocess.Popen(['chrpath', path, '-cr', libdir]).wait() for dirpath, dirnames, filenames in os.walk('.'): for filename in filenames: path = os.path.join(dirpath, filename) if path.endswith('.so') or 'bin/' in path: chrpath(path)

I knew this would take forever to write up, I started thinking about it a week ago, and finally spent 3 hours on it tonight. It is in response to, and hopefully neatly complements Mahmoud Hashemi’s post Python Packaging at PayPal.