blog | oilshell.org

Success with Aboriginal, Alpine, and Debian Linux

Russian Translation by Vlad Brown

Three months ago, in Roadmap #5, I wrote that OSH will be a better shell for building Linux distributions. It will run existing code, including bash scripts, but it's stricter and easier to debug.

In the last month, I've made significant progress toward this goal. I fixed dozens of bugs, implemented new features, and simplified the codebase.

OSH can now run thousands of lines of shell scripts that build three distros: Aboriginal Linux, Alpine Linux, and Debian. This post describes what I did, and the technical work that was involved.

Recap of Recent Progress

I haven't written about Linux distributions in awhile. What happened?

OSH was able to run abuild -h back in October, but its parsing speed made debugging sessions unpleasant. On a fast machine, it took more than 1600 milliseconds to parse abuild!

So I pushed two tasks onto the stack, for a total of three:

Run abuild from Alpine Linux. Optimize the parser so running abuild isn't painful. Fix bugs in the parser before optimizing it.

The two releases since October popped #3 and #2 off the stack:

In OSH 0.2, I fixed the bugs revealed by torturing the parser with a million lines of shell. I also introduced parser benchmarks. OSH 0.3 sped up the parser by 6-7x. I introduced more benchmarks, including ones that measure execution speed.

Now OSH can parse abuild in about 250 milliseconds. That's still too slow, but it's not blocking progress.

I plan to release OSH 0.4 at the end of this month. It will be able to run not just abuild, but also shell scripts from Aboriginal Linux and Debian.

After that, the stack will be empty again. I had to shave some yaks, but I didn't lose sight of the goal!

What Is a Linux Distro?

I didn't understand how Linux distros worked until pretty recently. It's useful to think of them as having (at least) these four components:

A set of source tarballs from "upstream" sources: e.g. GNU, Linux, Apache, or LLVM. A "meta" build system that turns source tarballs into binary packages. This build system invariably uses shell scripts. Sometimes GNU make is used; sometimes Python is used; but there are always shell scripts. A script to create the root file system — i.e. to "bootstrap" the system so that it can build its own packages. A package manager that allows end users to install binary packages. For Debian-derived systems, this is apt ; for Red Hat-derived systems (CentOS, Fedora, etc.), it's yum .

What's the Difference Between Distros?

I'm pleased by the diversity of the three distros I worked with because it gives me confidence that OSH is working:

Debian: arguably the most popular distro, and one of the oldest. It has many derivatives like Ubuntu. The debootstrap script I ran normally runs under dash, which is one of the most incompatible shells. Alpine Linux: a "modern" distro for embedded systems and containers. It runs busybox ash. Aboriginal Linux: an educational project with a minimalist/embedded slant. However, it runs under bash, and uses many "bash-isms".

So not only am I testing shell scripts by different authors, I'm also testing OSH for compatibility with scripts written for different shell dialects.

Here is some more background on these projects and detail on what I did:

Debian

debootstrap assembles the Debian root file system from .deb packages. Roughly speaking, .debs are tarballs of binaries, scripts, and metadata. I parsed debootstrap with OSH back in October 2016.

It's ~2600 lines of shell (excerpt). I worked with this script a few years ago, and I remember it looking scary. There were weird incantations that I didn't understand. Now it's easy to read, which I think means I've spent too much time with shell :-)

What Now Works: I used OSH to build an Ubuntu Xenial image, chroot into it, and run commands. The sections below describe the fixes required to make this work.

Alpine Linux

Alpine Linux started out as a distro for embedded systems like routers, but it's also now used for containers in the cloud. Docker, Inc. sponsors it, and postmarketOS is based on it.

It uses musl libc and busybox rather than GNU libc and GNU coreutils/findutils/etc. The former projects are popular on embedded systems.

Overall, Alpine and Debian have a similar architecture, but Alpine is smaller, more security-focused, and has a more consistent style.

abuild is ~2700 lines of shell that builds .apk packages and metadata. (excerpt).

packages and metadata. (excerpt). alpine-chroot-install serves the same purpose as debootstrap. It assembles a system image from a set of binary packages.

What Now Works:

I ran alpine-chroot-install with OSH, and successfully built a system image. I entered the image using chroot, and built OSH in this environment. This OSH build is linked against musl libc. I built three .apk packages with abuild running under OSH-musl. I ran abuild verify to check that the packages looked reasonable.

Aboriginal Linux

Aboriginal Linux isn't a distro, per se. It's an educational project that looks like a distro. It answers the question: What is the smallest number of packages that will create a Linux system that can rebuild itself?

The project is now defunct. But the code still works, and I still find it interesting, e.g. from a security point-of-view.

It's ~3700 lines of bash (excerpt). It was the first project I parsed with OSH.

What Now Works:

I built the i686 target using OSH. This builds a complete system image from source code. In contrast, debootstrap assembles an image from binary packages. I booted the resulting image in QEMU and got a shell prompt!

In summary, I tested OSH on a diverse set of shell scripts found in the wild, and fixed what was necessary to make them run.

I started this process after the last release, and I honestly didn't know how long it would take. There were more problems than I expected, but I was also able to fix them more quickly than expected.

Features Added

What features were missing?

Tracing Support

Some errors I ran into had obvious causes. For example, OSH would throw NotImplementedError when a program used ${s:1:2} (string slicing). Getting past this error by implementing slicing was simple.

Other errors required debugging thousands of lines of other people's shell scripts. So I needed to learn more about bash and debugging. This tip on making xtrace useful helped me. In bash, you can set the $PS4 variable so that traces include the filename and line number.

So I mimicked these debugging features in OSH:

Implement set -x / xtrace , with $PS4 support.

, with support. Add support for $SHELLOPTS , so you can inherit xtrace . Shell scripts often invoke other shell scripts, and this is bash's way to preserve -x across invocations.

, so you can inherit . Shell scripts often invoke other shell scripts, and this is bash's way to preserve across invocations. Add variables that are useful in the PS4 string: $LINENO , and my own $SOURCE_NAME .

Note that bash actually has a debugger called bashdb! Describing the way it works would be another post. In short, it uses hooks specified with the trap builtin, as well as several $BASH_* variables.

Shell Options for Strict Behavior

A recurring theme was relaxing OSH's strict behavior in order to accomodate common shell usage. However, I added the ability to opt in to the strict behavior, with set -o strict-control-flow , strict-array , and strict-errexit .

I'll address this topic in another blog post, but feel free to leave comments if you're curious.

Overhaul of Word Splitting and Evaluation

POSIX has quirky rules for the $IFS variable, which determines:

How unquoted words are split, and How the read builtin splits fields.

I rewrote the buggy regex-based IFS-splitting with an explicit state machine. This is an interesting piece of code which I may explain in another blog post. It's in core/legacy.py. It turned a lot of red tests green.

Two Kinds of C-Escaped Strings

echo -e 'foo

' and $'foo

' are both ways to write C-escaped strings. Their relationship is the same as the relationship between [ and [[ — the former is dynamically parsed, and the latter is statically parsed.

(For example, dynamic parsing allows this: char=n; echo -e "1\\${char}2" , but static parsing doesn't.)

I implemented these with a similar, but not identical, lexers, using the style described in my posts on lexing. I again found that metaprogramming is useful for avoiding code duplication.

Stripping Glob Prefixes and Suffixes With POSIX APIs

This is another feature that touches some computer science. I discovered that semantics that originate with ksh can't be efficiently expressed with POSIX APIs:

POSIX fnmatch() does glob-style string matching, but it doesn't return the position of the match.

does glob-style string matching, but it doesn't return the of the match. POSIX regexec() does return match positions, but it doesn't support non-greedy matching like Python's regex API does.

In theory, Python's API should be able to efficiently express the semantics of ${s%suffix} vs. ${s%%suffix} , so OSH used the strategy of translating globs to Python regexes. For example, the expression ${s%%*suffix} could be implemented with the regex .*?(suffix) .

However, abuild uses character classes in globs, e.g. ${i%%[<>=]*} , which aren't straightforward to translate.

So I reimplemented these operators using the conventional, inefficient algorithm: a linear number of calls to fnmatch() , one for each position in the string! (in the worst case)

This makes the overall algorithm quadratic. If fnmatch() isn't linear, which it often isn't, then stripping glob prefixes and suffixes will be even slower than quadratic.

However this issue doesn't appear to arise in practice, as all shells use the slow algorithm. Of course, Oil will provide string manipulation functions that aren't slow in theory. I want the language to be safe to use in adversarial contexts.

Minor Features

Running the distro scripts required several other shell features. In most cases, I had already done the hard part: representing code with the lossless syntax tree. The implementation often "falls out" after choosing a good representation.

Slicing of strings and arrays: ${s:1:2} and ${a[@]:1:2} .

and . Process substitution: diff <(sort left.txt) <(sort right.txt) . This feature is inherently flaky because it doesn't wait() on the forked process, and it didn't set $! until bash 4.4.

. This feature is inherently flaky because it doesn't on the forked process, and it didn't set until bash 4.4. The type builtin without -t . abuild unfortunately matches the output of type with a regex.

builtin without . abuild unfortunately matches the output of with a regex. More of the test builtin: -L and -h are aliases to check if a path is a symlink. [ -t 1 ] to check if stdout is a TTY. There is no color in abuild without this! -nt and -ot to compare timestamps on files.

builtin:

Shell WTFs

Reimplementing these shell quirks was both fun and depressing. As penance, I've been maintaining a wiki page of Shell WTFs (which is not well-organized).

I could blog every day about one of these and not be done for months. But I remind myself that my goal is to improve shell with the Oil language, not dwell on the past. Legacy behavior is only useful as far as it gives users an upgrade path to Oil.

Bugs Fixed

In addition to implementing features, I also found and fixed bugs in OSH.

File Descriptor Usage

As far as I know, a shell must handle file descriptors differently than any other Unix program. It can't open any files in the descriptor range 3-9, because shell scripts may use them directly.

The main program and the source 'd scripts are now moved out of the way immediately after open() , with dup2() .

'd scripts are now moved out of the way immediately after , with . I fixed a crash in statements like echo hi 6>&1 , which debootstrap uses.

To debug these issues, I used the /proc/$$/fd/ mechanism mentioned in OSH Runs Real Shell Programs. It's a nice way of showing the file descriptor state of a process.

Bugs Related to CPython's Buffering

In The Riskiest Part of the Project, I mentioned several difficulties with using CPython to write a Unix shell.

I encountered another problem: Python does its own buffering of file I/O. I believe this is on top of libc's buffering, although I haven't looked into it deeply.

sys.stdout.flush() is required after type prints its output; otherwise $() may be incorrectly evaluated. Hat tip to timetoplatypus for mentioning this with respect to the dirs builtin.

is required after prints its output; otherwise may be incorrectly evaluated. Hat tip to for mentioning this with respect to the builtin. The read builtin can't use Python's f.readline() . The descriptor that underlies the sys.stdin file object changes when you redirect, which interacts badly with buffering.

Instead, I have to read a byte at a time from file descriptor 0 . This seems inefficient, but I noticed that dash, mksh, and zsh all do the same thing (in C). For example, try:

$ strace zsh -c 'read x <<< "hello world"'

Other Bugs

Fix precedence of && and || . Confusingly, they have equal precedence in the command language, but the normal unequal precedence in the [[ expression language.

and . Confusingly, they have precedence in the command language, but the normal precedence in the expression language. Fix the scope of variables set with FOO=bar myfunc . Shells differ in behavior here!

. Shells differ in behavior here! Fix ${x/pat/replace} when x is undefined. (This case revealed a bug in mksh.)

when is undefined. (This case revealed a bug in mksh.) Fix a crash when cd -ing away from a directory that's been removed.

-ing away from a directory that's been removed. readonly R; unset R should return 1 and respect errexit , not unconditionally fail. Although I think programming errors are different than runtime errors, even in dynamically-typed languages, errexit will be on by default in Oil. (It would also be nice to make this a statically-detected error.)

What Was Not Done

I punted on a few things that weren't strictly necessary to build the distros, or which had easy workarounds:

The trap builtin is unimplemented; warnings are printed on stderr .

builtin is unimplemented; warnings are printed on . alias is also unimplemented. I changed a couple aliases in alpine-chroot-install to functions. Trivia: bash is the only shell that doesn't expand aliases by default; it requires shopt -s expand_aliases .

is also unimplemented. I changed a couple aliases in alpine-chroot-install to functions. Trivia: bash is the only shell that doesn't expand aliases by default; it requires . set -h / hashall is a stub that does nothing. This option is used by Aboriginal and affects bash's $PATH cache, which I don't yet understand.

Also note that these OSH builds are in a sense "shallow". I changed the shebang lines of the top-level scripts, which are thousands of lines long, but they often invoke more shell scripts with a #!/bin/bash or #!/bin/sh shebang line.

For example, building any Linux distro will require running dozens of configure scripts. Fortunately, OSH can already run those.

What's Next?

As mentioned, the upcoming OSH 0.4 release will include all this work.

After concentrating so much on the code, I now have several writing tasks backed up:

Why Write a New Shell? After every release, I receive questions about the project's motivations. I need to explain each goal concisely and link to them all from a single place.

After every release, I receive questions about the project's motivations. I need to explain each goal concisely and link to them all from a single place. Project Retrospective . The work described in this post is a major milestone. It's worth reviewing how we got here. And what's left?

. The work described in this post is a major milestone. It's worth reviewing how we got here. And what's left? Lexing Posts . I have unpublished drafts of posts in this series (see the lexing tag).

. I have unpublished drafts of posts in this series (see the lexing tag). Now that OSH is in better shape, I'd like to resume writing about shell-the-good-parts. The first two posts are now a year old!

If I have time: a review of academic papers about shell. nickpsecurity brought an interesting paper to my attention, and I followed the citations and read two more papers. I responded in comments on lobste.rs and reddit. There is more to say about them!