blog | oilshell.org

Oil 0.8.pre4 - The Biggest Shell Programs in the World

This is the latest version of Oil, a Unix shell that's our upgrade path from bash:

Oil version 0.8.pre4 - Source tarballs and documentation.

To build and run it, follow the instructions in INSTALL.txt. The wiki has tips on How To Test OSH.

The Highlights

This release has four user-facing themes:

Fixes to Oil so that it can run neofetch, a bash program that's more than 10,000 lines long. Thanks to Crestwave for the tough job of debugging neofetch under Oil, and for patching it upstream! Fixes and features toward running ble.sh, another one of the biggest shell programs in the world. Thanks to Koichi Murase for the phenomenal testing and bug reports! Optimize the number of processes started. The last post with comics gives background knowledge on this, and the next blog post will explain further. QSN: Quoted String Notation. A new interchange format that formalizes string literals like 'foo \x00 bar

. More precisely, it adapts Rust's string literal syntax to express arbitrary byte strings (unlike JSON).

It's now used in 7 or 8 places in Oil. It occurs naturally in Unix programs like shells and coreutils.

I'll also write about it in the future.

Comment on Version Numbering

Despite the pre4 version qualifier, this is by far the best Oil release ever. I use Oil interactively while doing the release, running thousands of lines of its own shell scripts in the process.

I may change the version numbering scheme in the near future to reflect this. Note that this release includes a new $OIL_VERSION variable (issue #683 below).

Closed Issues

Here are some issues addressed in this release. It's an underestimate because I also fixed many bugs under issue 653 to run ble.sh.

You can also view the full changelog.

#712 ternary operator ? should be right associative #706 unset should unshadow variables higher on the stack (at least for nonlocals) #705 read fails on empty lines #702 Can't escape closing brace with backslash or single quotes in parameter expansion #700 xtrace output doubles backslashes and single-quotes #698 Error with backslashes in unquoted variables with globbing off #695 "${#:+\e}" should evaluate to \e, not e #694 read only reads a single line even with a different delimiter #690 ${var@a} to get flags, etc. #688 ${@:0:1} evaluates to ${@:0} #683 provide a way to query the version #679 Run neofetch #660 ${arr[0]=1} change variable to string rather than assigning cell #651 cell sublanguage: unset -v 'a[0]' (ble.sh) #648 Recursive arithmetic evaluation (ble.sh) #640 arith assignment where var name is dynamic doesn't work #291 single quotes within double quoted brace sub treated differently for the # ## % %% / operators #273 Implement $(< file) #254 Test the number of processes started by various shell snippets

Selected Open Issues

I'm still looking for more help with Oil. Related links:

Under the Hood: The Code Is in Good Shape After 4 Years

I'm still working on translating Oil to C++, which I mentioned in the March recap. One nice side effect is that it forces me to revisit and clean up the code.

For example, the optimizations to start fewer processes were a result of "pulling on a thread": a pesky fork_external parameter that I wanted to get rid of.

Dependency Inversion Leads to Pure Interpreters

Translation also encourages refactoring to dependency inversion, especially of I/O interfaces. This is because I/O is harder to translate than pure computation.

(I mentioned "dependency injection" in both the March Recap and the February Recap, but I now call it inversion. This is to avoid confusion with "DI frameworks", which aren't related to Oil.)

This refactoring will make "pure" subinterpreters possible, which relates to ble.sh (mentioned above), as well as to evaluating untrusted config files (more on this later).

Here are two comments I wrote about dependency inversion. They might help contributors understand Oil's code.

Although note that all contributors have implicitly followed the style. That is, there's nothing that unusual about it. And pull requests don't need to follow the style at first, as long as they have tests to ensure that later refactoring doesn't break anything.

Lexer Modes

I like the small size of these diffs, because it's evidence that the lexer mode technique is expressive enough to make subtle fixes to the OSH language:

(Related: How To Parse Shell Like a Programming Language summarizes our strict but powerful parsing model.)

Conclusion

I'm encouraged by our ability to make quick fixes to run the biggest shell programs in the world!

Please try Oil on your shell scripts and let us know what doesn't work. And let us know if you have questions about how to get started with the code.

The next post is: Oil Starts Fewer Processes Than Other Shells.

Appendix A: More Bad Parts of Shell

I stopped keeping track of #shell-the-bad-parts awhile ago, but this release brought to mind several more.

A rant about arrays in bash . That's not even the whole story: We discovered more quirks (and changes between minor versions) while designing a compatible subset of arrays for ble.sh to use.

. That's not even the whole story: We discovered more quirks (and changes between minor versions) while designing a compatible subset of arrays for ble.sh to use. Shells disagree heavily on the unset builtin , which is probably why POSIX doesn't specify it. Does it reveal variables of the same name higher on the stack? Shell's use of dynamic scope makes this a fundamental question. This issue is related to the "temp binding" issue from June 2019, raised by the Smoosh test suite.

, which is probably why POSIX doesn't specify it. Does it reveal variables of the same name higher on the stack? Shell's use of dynamic scope makes this a fundamental question. Parsing ${} . The fact that single quotes are literal data in "${x-'default'}" and operators in "${x#'suffix'}" is very confusing (and not documented as far as I can tell). In other words, the same syntax means two different things, which violates Oil's language design principles.

Appendix B: Metrics for the 0.8.pre4 Release

Let's compare this release with the previous one, version 0.8.pre3.

Test Results

Running big shell scripts led to a big increase in the number of OSH spec tests:

Not much work was done on the Oil language. I added failing tests to expose a few issues:

We have ~600 new lines of significant code, e.g.due to QSN, which I still need to write about.

cloc for 0.8.pre3: 15,633 lines of Python and C, 300 lines of ASDL (excluding testdata).

lines of Python and C, lines of ASDL (excluding testdata). cloc for 0.8.pre4: 16,281 lines of Python and C, 299 lines of ASDL

And ~1200 new lines of physical code:

Benchmarks

These benchmarks didn't change, which is good. (They're noisy, which I'd like to eventually fix.)

Native Code Metrics

Let's concentrate on the in-progress oil-native translation, rather than the soon-to-obsolete OVM.

This release mainly refactored code, so the number of translated lines hasn't increased that much: