Ab Origine - Latin, "From the beginning".

Build the simplest linux system capable of compiling itself.

Cross compile it to every target supported by QEMU.

Boot it under QEMU (or real hardware).

Build/test everything else natively on target.

Creating system images.

Aboriginal Linux is a shell script that builds the smallest/simplest linux system capable of rebuilding itself from source code. This currently requires seven packages: linux, busybox, uClibc, binutils, gcc, make, and bash. The results are packaged into a system image with shell scripts to boot it under QEMU. (It works fine on real hardware too.)

The build supports most architectures QEMU can emulate (x86, arm, powerpc, mips, sh4, sparc...). The build runs as a normal user (no root access required) and should run on any reasonably current distro, downloading and compiling its own prerequisites from source (including cross compilers).

The build is modular; each section can be bypassed or replaced if desired. The build offers a number of configuration options, but if you don't want to run the build yourself you can download binary system images to play with, built for each target with the default options.

(Note: the goal of the 2.0 release is to migrate from busybox, uClibc, and gcc/binutils to toybox, musl-libc, and lvm/lld.)

Using system images.

Each system image tarball contains a wrapper script ./run-emulator.sh which boots it to shell prompt. (This requires the emulator QEMU to be installed on the host.) The emulated system's /dev/console is routed to stdin and stdout of the qemu process, so you can just type at it and log the output with "tee". Exiting the shell causes the emulator to shut down and exit.

The wrapper script ./dev-environment.sh calls run-emulator.sh with extra options to tell QEMU to allocate more memory, attach 2 gigabytes of persistent storage to /home in the emulated system, and to hook distcc up to the cross compiler to move the heavy lifting of compilation outside the emulator (if distccd and the appropriate cross compiler are available on the host system).

The wrapper script ./native-build.sh calls dev-environment.sh with a build control image attached to /mnt in the emulated system, allowing the init script to run /mnt/init instead of launching a shell prompt, providing fully automated native builds. The "static tools" (dropbear, strace) and "linux from scratch" (a chroot tarball) builds are run each release as part of testing, with the results uploaded to the website.

For more information, see Getting Started or the presentation slides Developing for non-x86 Targets using QEMU.

Prebuilt binary images are available for each target, based on the current Aboriginal Linux release. This includes cross compilers, native compilers, root filesystems suitable for chroot, and system images for use with QEMU.

The binary README describes each tarball. The release notes explain recent changes.

Even if you plan to build your own images from source code, you should probably start by familiarizing yourself with the (known working) binary releases.

To build a system image for a target, download the Aboriginal Linux source code and run "./build.sh" with the name of the target to build (or with no arguments to list available targets). See the "config" file in the source for various environment variables you can export to control the build. See the source README for additional usage instructions, and the release notes for recent changes.

Aboriginal Linux is a build system for creating bootable system images, which can be configured to run either on real hardware or under emulators (such as QEMU). It is intended to reduce or even eliminate the need for further cross compiling, by doing all the cross compiling necessary to bootstrap native development on a given target. (That said, most of what the build does is create and use cross compilers: we cross compile so you don't have to.)

The build system is implemented as a series of bash scripts which run to create the various binary images. The "build.sh" script invokes the other stages in the correct order, but the stages are designed to run individually. (Nothing build.sh itself does is actually important.)

Aboriginal Linux is designed as a series of orthogonal layers (the stages called by build.sh), to increase flexibility and minimize undocumented dependencies. Each layer can be either omitted or replaced with something else. The list of layers is in the source README.

The project maintains a development repository using the Mercurial source control system. This includes RSS feeds for each checkin and for new releases.

Questions about Aboriginal Linux should be addressed to the project's mailing list, or to the maintainer (rob at landley dot net) who has a blog that often includes notes about ongoing Aboriginal Linux development.

Design goals

In addition to implementing the above, Aboriginal Linux tries to support a number of use cases:

Eliminate the need for cross compiling

Allow package maintainers to reproduce/fix bugs on more architectures

Automated cross-platform regression testing and portability auditing.

Use current vanilla packages, even on obscure targets.

Provide a minimal self-hosting development environment.

Cleanly separate layers

Document how to put together a development environment.

Eliminate the need for cross compiling

We cross compile so you don't have to: Moore's Law has made native compiling under emulation a reasonable approach to cross-platform support. If you need to scale up development, Aboriginal Linux lets you throw hardware at the scalability problem instead of engineering time, using distcc acceleration and distributed package build clusters to compile entire distribution repositories on racks of cheap x86 cloud servers. But using distcc to call outside the emulator to a cross compiler still acts like a native build. It does not reintroduce the complexities of cross compiling, such as keeping multiple compiler/header/library combinations straight, or preventing configure from confusing the system you build on with the system you deploy on. Allow package developers and maintainers to reproduce and fix bugs on architectures they don't have access to or experience with. Bug reports can include a link to a system image and a reproduction sequence (wget source, build, run this test). This provides the maintainer both a way to demonstrate the issue, and a native development environment in which to build and test their fix. No special hardware is required for this, just an open source emulator (generally QEMU) and a system image to run under it. Use wget to fetch your source, configure and make your package as normal using standard tool names (strip, ld, as, etc), even build and test on a laptop in an airplane without internet access (10.0.2.2 is qemu's alias for the host's 127.0.0.1.).

Automated cross-platform regression testing and portability auditing. Aboriginal Linux lets you build the same package across multiple architectures, and run the result immediately inside the emulator. You can even set up a cron job to build and test regular repository snapshots of a package's development version automatically, and report regressions when they're fresh, when the developers remember what they did, and when there are few recent changes that may have introduced the bug.

Use current vanilla packages, even on obscure targets. Nonstandard hardware often receives less testing than common desktop and server platforms, so regressions accumulate. This can lead to a vicious cycle where everybody sticks with private forks of old versions because making the new ones work is too much trouble, and the new ones don't work because nobody's testing and fixing them. The farther you fall behind, the harder it is to catch up again, but only the most recent version accepts new patches, so even the existing fixes don't go upstream. Worst of all, working in private forks becomes the accepted norm, and developers stop even trying to get their patches upstream. Aboriginal Linux uses the same (current) package versions across all architectures, in as similar a configuration as possible, and with as few patches as we can get away with. We (intentionally) can't upgrade a package for one target without upgrading it for all of them, so we can't put off dealing with less-interesting targets. This means any supported target stays up to date with current packages in unmodified "vanilla" form, providing an easy upgrade path to the next version and the ability to push your own changes upstream relatively easily.

Provide a minimal self-hosting development environment.

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." - Antoine de Saint Exupery Most build environments provide dozens of packages, ignoring the questions "do you actually need that?" and "what's it for?" in favor of offering rich functionality. Aboriginal Linux provides the smallest, simplest starting point capable of rebuilding itself under itself, and of bootstrapping up to build arbitrarily complex environments (such as Linux From Scratch) by building and installing additional packages. (The one package we add which is not strictly required for this, distcc, is installed it in its own subdirectory which is only optionally added to the $PATH.) This minimalist approach makes it possible to regression test for environmental dependencies. Sometimes new releases of packages simply won't work without perl, or zlib, or some other dependency that previous versions didn't have, not because they meant to but because they were never tested in a build environment that didn't have them, so the dependency leaked in. By providing a build environment that contains only the bare essentials (relying on you to build and install whatever else you need), Aboriginal Linux lets you document exactly what dependencies packages actually require, figure out what functionality the additional packages provide, and measure the costs and benefits of the extra code. (Note: the command logging wrapper record-commands.sh can actually show which commands were used out of the $PATH when building any package.) Cleanly separate layers. The entire build is designed to let you use only the parts of it you want, and skip or replace the rest. The top level "build.sh" script calls other scripts in sequence, each of which is designed to work independently. The only place package versions are mentioned is "download.sh", the rest of the build is version-agnostic. All it does is populate the "packages" directory, and if you want to provide your own you never need to run this script. The "host-tools.sh" script protects the build from variations in the host system, both by building known versions of command line tools (in build/host) and adjusting the $PATH to point only to that directory, and by unsetting all environment variables that aren't in a whitelist. If you want to use the host system's unfiltered environment instead, just skip running host-tool.sh. If you supply your own cross compilers in the $PATH (with the prefixes the given target expects), you can skip the simple-cross-compiler.sh command. Similarly you can provide your own simple root filesystem, your own native compiler, or your own kernel image. You can use your own script to package them if you like.

Document how to put together a development environment. The build system is designed to be readable. That's why it's written in Bash (rather than something more powerful like Python): so it can act as documentation. Each shell script collects the series of commands you need to run in order to configure, build, and install the appropriate packages, in the order you need to install them in to satisfy their dependencies. The build is organized as a series of orthogonal stages. These are called in order from build.sh, but may be run (and understood) independently. Dependencies between them are kept to a minimum, and stages which depend on the output of previous stages document this at the start of the file. The scripts are also extensively commented to explain why they do what they do, and there's design documentation on the website.

What's next?

Now that the 1.0 release is out, what are the project's new goals?

Move from busybox, uclibc, and gcc/binutils to toybox, musl, and llvm (then qcc).

Now that we've got a simple development environment working, we can make it simpler by moving to better packages. Most of this project's new development effort is going into the upstream versions of those packages until they're ready for use here. In the meantime we're maintaining what works, but only really upgrading the kernel version and slowly switching from busybox to toybox one command at a time.)

uClibc: The uClibc project's chronic development problems resulted in multiple year-long gaps between releases, and after the may 2012 release more than three years went by without a release during which time musl-libc went from "git init" to a 1.0 release. At this point it doesn't matter if uClibc did get another release out, it's over, musl is the more interesting project. (Its limitations are lack of target support, but it's easy to port musl to new targets and very hard to clean up the mess uClibc has become.)

toybox: The maintainer of Aboriginal Linux used to maintain busybox, but left that project and went on to create toybox for reasons explained at length elsewhere (video, outline, merged into Android).

The toybox 1.0 release should include a shell capable of replacing bash, and may include a make implementation (or in qcc, below). This would eliminate two more packages currently used by Aboriginal Linux.

llvm: When gcc and binutils went GPLv3, Aboriginal Linux froze on the last GPLv2 releases, essentially maintaining its own fork of those projects. Several other projects did the same but most of those have since switched to llvm.

Unfortunately, configuring and building llvm is unnecessarily hard (among other things because it's not just implemented in C++ but the 2013 C++ spec, so you need gcc 4.7 or newer to bootstrap it), and nobody seems to have worked out how to canadian cross native compilers out of it yet. But other alternatives like pcc or tinycc are both less capable and less actively developed; since the FSF fell on its sword with GPLv3, the new emerging standard is LLVM.

qcc: In the long run, we'd like to put together a new compiler, qcc, but won't have development effort to spare for it before toybox's 1.0 release. Its goal is to combine tinycc and QEMU's Tiny Code Generator into a single multicall binary toolchain (cc, ld, as, strip and so on in a single executable replacing both the gcc and binutils packages) that supports all the output formats QEMU can emulate. (As a single-pass compiler with no intermediate format it wouldn't optimize well, but could bootstrap a native compiler that would.)

Additional goals for qcc would be to absorb ccwrap.c, grow built-in distcc equivalent functionality, and an updated rewrite of cfront to compile C++ code (and thus natively bootstrap LLVM).

Finishing the full development slate would bring the total number of Aboriginal Linux packages down to four: linux, toybox, musl, and qcc.

(Yes, reducing dependency on GPL software and avoiding GPLv3 entirely is a common theme of the above package switches, there's a reason for that: audio, outline, see also Android self-hosting below.)

Untangle distro build system hairballs into distinct layers.

The goal here is to separate what packages you can build from where and how you can build them.

For years, Red Hat only built under Red Hat, Debian only built under Debian, even Gentoo assumed it was building under Gentoo. Building their packages required using their root filesystem, and the only way to get their root filesystem was by installing their package binaries built under their root filesystem. The circular nature of this process meant that porting an existing distribution to a new architecture, or making it use a new C library, was extremely difficult at best.

This led cross compilng build systems to add their own package builds ("the buildroot trap"), and wind up maintaining their own repository of package build recipes, configurations, and dependencies. Their few hundred packages never approached the tens of thousands in full distribution repositories, but the effort of maintaining and upgrading packages would come to dominate the project's development effort until developers left to form new projects and start the cycle over again.

This massive and perpetual reinventing of wheels is wasteful. The proliferation of build systems (buildroot, openembedded, yocto/meego/tizen, and many more) each has its own set of supported boards and its own half-assed package repository, with no ability to mix and match.

The proper way to deal with this is to separate the layers so you can mix and match. Choice of toolchain (and C library), "board support" (kernel configuration, device tree, module selection), and package repository (which existing distro you want to use), all must become independent. Until these are properly separated, your choice of cross compiler limits what boards you can boot the result on (even if the binaries you're building would run in a chroot on that hardware), and either of those choices limit what packages you can install into the resulting system.

This means Aboriginal Linux needs to be able to build _just_ toolchains and provide them to other projects (done), and to accept external toolchains (implemented but not well tested; most other projects produce cross compilers but not native compilers).

It also needs build control images to automatically bootstrap a Debian, Fedora, or Gentoo chroot starting from the minimal development enviornment Aboriginal Linux creates (possibly through an intermediate Linux From Scratch build, followed by fixups to make debian/fedora/gentoo happy with the chroot). It must be able to do this on an arbitrary host, using the existing toolchain and C library in an architecture-agnostic way. (If the existing system is a musl libc built for a microblaze processor, the new chroot should be too.)

None of these distributions make it easy: it's not documented, and it breaks. Some distributions didn't think things through: Gentoo hardwires the list of supported architectures into every package in the repository, for no apparent reason. Adding a new architecture requires touching every package's metadata. Others are outright lazy; building the an allnoconfig Red Hat Enterprise 6.2 kernel under SLES11p2 is kind of hilariously bad: "make clean" spits out an error because the code it added to detect compiler version (something upstream doesn't need) gets confused by "gcc 4.3", which has no .0 on the end so the patchlevel variable is blank. Even under Red Hat's own filesystem, "make allnoconfig" breaks on the first C file, and requires almost two dozen config symbols to be switched on to finish the compilation, becuase they never tested anything but the config they ship. Making something like that work on a Hexagon processor, or making their root filesystem work with a vanilla kernel, is a daunting task.

Make Android self-hosting (musl, toybox, qcc).

Smartphones are replacing the PC, and if Android doesn't become self-hosting we may be stuck with locked down iPhone derivatives in the next generation.

Mainframe -> minicomputer -> microcomputer (PC) -> smartphone

Mainframes were replaced by minicomputers, which were replaced by microcomputers, which are being replaced by smartphones. (Nobody needed to stand in line to pick up a printout when they could sign up for a timeslot at a terminal down the hall. Nobody needed the terminal down the hall when they had a computer on their desk. Now nobody needs the computer on their desk when they have one in their pocket.)

Each time the previous generation got kicked up into the "server space", only accessed through the newer machines. (This time around kicking the PC up into the server space is called "the cloud".)

Smartphones have USB ports, which charge the phone and transfer data. Using a smartphone as a development workstation involves plugging it into a USB hub, adding a USB keyboard, USB mouse, and USB to HDMI converter to plug it into a television. The rest is software.

The smartphone needs to "grow up and become a real computer" the same way the PC did. The PC originally booted into "ROM Basic" just like today's Android boots into Dalvik Java: as the platform matures it must outgrow this to run native code written in all sorts of languages. PC software was once cross compiled from minicomputers, but as it matured it grew to host its own development tools, powerful enough to rebuild the entire operating system.

To grow up, Android phones need to become usable as development workstations, meaning the OS needs a self-hosting native development environment. This has four parts:

Kernel (we're good)

C library (bionic->musl, not uclibc)

Posix command line (toolbox->toybox, not busybox)

Compiler (qcc, llvm, open64, pcc...)

The Android kernel is a Linux derivative that adds features without removing any, so it's already good enough for now. Convergence to vanilla linux is important for long-term sustainability, but not time critical. (It's not part of "beating iPhone".)

Android's "no GPL in userspace" policy precludes it from shipping many existing Linux packages as part of the base install: no BusyBox or GNU tools, no glibc or uClibc, and no gcc or binutils. All those are all excluded from the Android base install, meaning they will never come bundled with the base operating system or preinstalled on devices, so we must find alternatives.

Android's libc is called "bionic", and is a minimal stub sufficient to run Dalvik, and not much more. Its command line is called "toolbox" and is also a minimal stub providing little functionality. Part of this is intentional: Google is shipping a billion broadband-connected unix machines, none of which are administered by a competent sysadmin. So for security reasons, Android is locked down with minimal functionality outside the Java VM sandbox, providing less of an attack surface for viruses and trojans. In theory the Linux Containers infrastructure may eventually provide a solution for sandboxing applications, but the base OS needs to be pretty bulletproof if a billion people are going to run code they don't deeply understand connected to broadband internet 24/7.

Thus replacement packages for the C library and posix command line should be clean simple code easy to audit for security concerns. But it must also provide functionality that bionic and toolbox do not attempt, and do not provide a good base for. The musl libc and toybox command line package should be able to satisfy these requirements.

The toolchain is a harder problem. The leading contender (LLVM) is sponsored by Apple for use in Mac OSX and the iPhone's iOS. The iPhone is ahead of Android here, and although Android can use this it has other problems (implemented in C++ so significantly more complicated from a system dependency standpoint, making it difficult to bootstrap and impossible to audit).

The simplest option would be to combine the TinyCC project with QEMU's Tiny Code Generator (TCG). The licensing of the current TinyCC is incompatible with Android's userspace but permission has been obtained from Fabrice Bellard to BSD-license his original TinyCC code as used in Rob's TinyCC fork. This could be used to implement a "qcc" capable of producing code for every platform qemu supports. The result would be simple and auditable, and compatably licensed with android userspace. Unfortunately, such a project is understaffed, and wouldn't get properly started until after the 1.0 release of Toybox.

Other potential compiler projects include Open64 and PCC. Neither of these has built a bootable the Linux kernel, without which a self-bootstrapping system is impossible. (This is a good smoketest for a mature compiler: if it can't build the kernel, it probably can't build userspace packages of the complexity people actually write.)

This is time critical due to network effects, which create positive feedback loops benefiting the most successful entrant and creating natural "standards" (which become self-defending monopolies if owned by a single player.) Whichever platform has the most users attracts the most development effort, because it has the most potential customers. The platform all the software ships on first (often only) is the one everybody wants to have. Other benefits to being biggest include the large start-up costs and much lower incremental costs of electronics manufacturing: higher unit volume makes devices cheaper to produce. Amortizing research and development budgets over a larger user base means the technology may actually advance faster (more effort, anyway)...

Technological transitions produce "S curves", where a gradual increase gives way to exponential increase (the line can go nearly vertical on a graph) and then eventually flattens out again producing a sort of S shape. During the steep part of the S-curve acquiring new customers dominates. Back in the early minicomputer days a lot more people had no computer than had an Atari 800 or Commodore 64 or Apple II or IBM PC, so each vendor focused on selling to the computerless than converting customers from other vendors. Once the pool of "people who haven't got the kind of computer we're selling today but would like one if they did" was exhausted (even if only temporarily, waiting for computers to get more powerful and easier to use), the largest players starved the smaller ones of new sales, until only the PC and Macintosh were left. (And the Macintosh switched over to PC hardware components to survive, offering different software and more attractive packaging of the same basic components.)

The same smartphone transition is inevitable as the pool of "people with no smartphone, but who would like one if they had it" runs out. At that point, the largest platform will suck users away from smaller platforms. If the winner is android we can open up the hardware and software. If the winner is iPhone, we're stuck with decades of microsoft-like monopoly except this time the vendor isn't hamstrung by their own technical incompetence.

The PC lasted over 30 years from its 1981 introduction until smartphones seriously started displacing it. Smartphones themselves will probably last about as long. Once the new standard "clicks", we're stuck with it for a long time. Now is when we can influence this decision. Linux's 15 consecutive "year of the linux desktop" announcements (spanning the period of Microsoft Bob, Windows Millennium, and windows Vista) show how hard displacing an entrenched standard held in place by network effects actually is.

Several reasons.