blog | oilshell.org

Rewriting Python's Build System From Scratch

What am I working on? In the last post, I linked to bash scripts that built a slice of the CPython VM.

I'm now using these scripts as the foundation for a rewrite of Python's autotools-based build system. This will be the build system for Oil and OVM. Roughly speaking, OVM is the part of Oil that is native code.

Why do this?

To support deploying Oil as a single executable file, i.e. an app bundle. Because our problem is simpler. OVM has fewer dependencies, doesn't need to support non-Unix systems, and doesn't yet have shared library extensions. To practice writing nontrivial Makefiles from scratch. This is research for combining shell and make.

I think writing good build systems is hard, and this experience is supporting that belief. Read on for details.

bin/oil as an App Bundle

To end users and distro packagers, Oil should look like a C program. The use of CPython is an implementation detail. Thus, bin/oil will be a single file with two components:

Native code. CPython without its parser, plus Oil-specific code. For example, we have an extension module to wrap libc called pylibc. Architecture-independent bytecode. I'm precompiling .pyc files to reduce startup time. We also need the .py source to show code snippets in tracebacks.

Python app bundles are familiar to me because a decade ago I submitted a patch to Python to allow it to directly execute .zip files with .py and .pyc files (python-dev thread).

I'm taking it further now:

Rather than concatenating a shell script + .zip file, I'm concatenating an ELF file + .zip file. The .zip file contains the bytecode. Statically linking the pylibc extension module (but not libc itself.)

Why a single file? I want to be completely divorced from Python packaging. I don't want anything to do with site-packages and so forth, which are known to cause deployment problems.

(Historical note: Phillip J. Eby rewrote the patch to support directories with __main__.py in addition to .zip files. This was available in Python 2.6, but it was relatively unknown until Python 3.5 added the zipapp module, a result of PEP 441.)

Portability, Dependencies, Extensibility

Building OVM is a different problem than building Python. This section compares them along three dimensions.

Platform Portability:

Python is over 25 years old, and portable to many non-Unix systems, including Windows and the classic Mac OS.

Oil is a Unix shell, so it only needs to run on Unix.

(I suppose Oil will run on Windows 10 via the confusingly-named Windows Subsystem for Linux. I find it amazing that this Windows kernel feature is marketed as Bash on Windows. A program that is over three decades old is now a hot new feature!)

Dependencies and System Requirements:

Python has a batteries included philosophy, which means its standard library has many dependencies: libffi for ctypes, sqlite , etc.

for ctypes, , etc. A Unix shell depends only on the oldest and most portable parts of Unix: fork() , exec() , pipe() , read() , write() , etc. The "standard library" consists of separate programs like coreutils or busybox.

Extensibility with Dynamic Linking:

A large part of Python's build system is dealing with dynamically-loaded extension modules like _json.so . The Python-2.7.13/setup.py file that controls this is ~2300 lines long!

. The file that controls this is ~2300 lines long! I know exactly which extension modules I need for Oil, so it doesn't need to be configurable.

In summary, these three differences mean that Python requires a complex build system, but Oil can have a relatively simple one.

I don't think Oil will need autotools. If you can think of a reason it might, leave a comment. The main issue I see is detecting the readline library, but that can be done with a POSIX shell script written by hand.

Research for Boil

Boil is the name I have in mind for the Make-like build tool in Oil. Its relationship to Make is similar to that of Oil and bash, although not identical.

This is a big topic, so for now I'll list a few problems that I found with Python's Makefile. These problems appear in most Makefiles because the solutions are hard to express in Make.

Proper incremental builds when changing CFLAGS . It's annoying to do a make clean between debug and release builds. Support for build variants like AddressSanitizer (ASAN) and code coverage. After shaking bugs out of toybox and bwk with ASAN, I'm now a big fan of it (example toybox patch).

I noticed that Python's coverage support is broken with respect to shared libraries. You don't get coverage for say _json.so because setup.py didn't pass the right flags.

Python duplicates its C header dependencies in its Makefile, rather than using automatic dependency generation, i.e. the gcc -M scheme. I plan to reorganize the code for OVM, and this may be a nuisance.

Status and Conclusion

I'm knee-deep in this work right now. It's not finished by any means, but I'm happy to slow down and examine Make more closely. I revisited the research on Make I did before starting the blog last year.

I would say that shell has bad syntax and good semantics, but Make has both bad syntax and bad semantics. It's not statically parseable, and there are many weird mechanisms like double-colon rules, order-only dependencies, .ONESHELL , .SECONDARYEXPANSION , etc.

Boil is still two steps away. A rough plan is:

Release OSH, built with a plain Makefile written from scratch. Implement the Oil parser. Add Boil to Oil. At some point: Rewrite the Makefile in Boil.

So Make and Boil will a major blog topic, but the shape of the work means that I have to jump from topic to topic. I'll continue to regularly organize posts by topic and highlight them on the blog index.