blog | oilshell.org

Oil 0.7.pre9 and a Fast Shell Parser

This is the latest version of Oil, a compatible Unix shell and an experimental new language:

If you're new to the project, see Why Create a New Shell? and the 2019 FAQ.

To build and run it, follow the instructions in INSTALL.txt. Please try it on your shell scripts and report bugs. I'm also looking for feedback on the Oil language, which you can send through Github or Zulip.

oil-native Shows How We'll Optimize Oil

I've warned that Oil is too slow because it's written in an abstract style with a focus on correctness.

This release takes a major step toward speeding it up. There's a new oil-native tarball:

with a demo you can run in 10-20 seconds:

$ build/mycpp.sh tarball-demo ... -rwxrwxr-x 1 andy andy 487584 Dec 8 20:47 _bin/osh_parse.opt.stripped You can now run _bin/osh_parse.opt.stripped. Example: + _bin/osh_parse.opt.stripped -c 'echo "hello $name"' (command.Simple words: [ (compound_word parts:[ (word_part.Literal token:(Token id:Id.Lit_Chars val:echo span_id:0)) ]) (compound_word parts: [ ... ] ) ] )

What is this?

_bin/osh_parse is Oil's principled parser automatically translated to C++ . This listing of the tarball shows that it's pure C++.

is Oil's principled parser . This listing of the tarball shows that it's pure C++. To build and run it, you need only a C++ compiler and a shell.

It produces syntax trees that are identical to those produced by the current parser. I verified this on our wild corpus, which contains 1.2 million lines of shell.

to those produced by the current parser. I verified this on our wild corpus, which contains 1.2 million lines of shell. It's more than twenty times faster than the current Oil parser — according to the benchmarks published with every release for the last two years.

faster than the current Oil parser — according to the benchmarks published with every release for the last two years. It's 3-4x faster than the zsh parser, but 40-60% the speed of the bash parser. The Oil and zsh parsers both do a lot more work than the bash parser, e.g. for interactivity and error messages. More on this later.

than the zsh parser, but the speed of the bash parser. The Oil and zsh parsers both do a lot more work than the bash parser, e.g. for interactivity and error messages. More on this later. It's a demo, not a working shell. I'll continue to release it to show progress, but please keep using and testing the oil tarball, not the oil-native one.

The next post will cover:

A new tool mycpp, which I used to create this fast parser. It translates statically-typed Python to C++.

Other C++ code generators like ASDL.

The six-step translation process. If you want details sooner, check out the many Zulip threads on #oil-dev about it.

about it. Details on performance.

Changes and Contributors

The rest of this post summarizes changes in the last four releases. Since the demo of the Oil language in October, most work has been on translation, but there are also some features and bug fixes.

0.7.pre6 on November 11th (changelog)

Contributions:

Aaron Sokoloski Fixed Unicode behavior in string operations that take single-character globs. For example, ${s#?} and ${x//?/char} . This algorithm was tricky and I didn't know how to do it myself! Implemented printf %d \'c , which is shell's obscure syntax for the ord(c) function. The Oil language should simply use ord(c) .



Other:

Reorganize docs and improve the HTML toolchain. The /release/$VERSION/ page has been upgraded, and there's a new documentation index. The contents of most docs are still in progress.

Oil language tup(42) is a tuple with one element, instead of Python's confusing 42, (with trailing comma). Singleton tuples are rare.

Under the hood — I'll expand on these topics in the next blog post. Development of the mycpp translator. Development of the mylib runtime. Development of the C++ target for ASDL. Refactorings to aid translation. Add type annotations to certain files.



0.7.pre7 on December 2nd (changelog)

Contributions:

조성빈 added an uninstall script to undo what install does.

script to undo what does. Aaron Sokoloski added type annotations to a few files. This is the first step to making code faster via mycpp.

Other:

Implemented bash's ${prefix*} , which the homebrew package manager uses to unset variables starting with a certain prefix. I'd like more testing of important shell scripts along these lines. See the #should-run-this label on Github, and How To Test OSH.

, which the homebrew package manager uses to unset variables starting with a certain prefix. I'd like more testing of important shell scripts along these lines. See the #should-run-this label on Github, and How To Test OSH. Almost all other changes were related to translation and mycpp. It took two months of full-time work! But the results are encouraging.

and mycpp. It took two months of full-time work! But the results are encouraging. I made the first oil-native tarball (linked above).

0.7.pre8 on December 6nd (changelog)

Oil now has JSON support! This is natural because the language has Python/JavaScript-like data structures. Built yajl and its corresponding Python binding py-yajl into the app bundle. oilshell/py-yajl is a simplified fork of py-yajl , with just 862 lines of C, compared to the original's 1578 lines.

Improvements to the oil-native release.

0.7.pre9 on December 8nd (changelog)

Aaron Sokoloski added more type annotations. We need help, so please join the conversation on Zulip. We'll fill you in on how it works!

Oil language: Changed the "pretty print expression" command from pp f(x) to = f(x) . It's like an assignment with nothing on the LHS. As in assignments, everything to the right of = is parsed in expression mode.

to . It's like an assignment with nothing on the LHS. As in assignments, everything to the right of is parsed in expression mode. The first word of a command can no longer look like =foo . Most likely you want to add a space like = foo , or quote it like '=foo' .

. Most likely you want to add a space like , or quote it like . Published benchmarks and metrics for the oil-native tarball.

tarball. Document JSON support.

What's Next?

The C++ translation isn't done, but the oil-native demo has made me optimistic that this large project is feasible. After all, I wrote more than two years ago that using CPython is the riskiest part of the project!

But now we have a concrete path forward. (Appendix A summarizes the path I tried and abandoned.) The parser is about 40% of the Oil codebase, and it took two months to translate. Given that, I expect the rest to take two to six months of continuous work to translate.

However, at the current rate, this will likely happen more than six months from now, because there are many other parts of the project.

When I was working on translation, I wasn't working on features or bug fixes, for either OSH or Oil. According to the Appendix B, the code has changed only a little in the last two months.

working on features or bug fixes, for either OSH or Oil. According to the Appendix B, the code has changed only a little in the last two months. When I was working on documentation, I wasn't working on translation, features, or bug fixes. Writing good docs for Oil will also take several months of full-time work.

In short, it would be better if development was more parallel than serial. There are many independent parts of the project.

Help Wanted

The requests in How to Help still stand:

Think of a feature that would motivate you to use Oil. Users are more likely to become developers. Better interactive completion? Oil has had good interactive features for almost a year, but development on them has stalled while I work on other things. Speed? This is blocking me, and why I'm working on translation. But I feel like there should be a feature that motivates people to use a (temporarily) slow shell, since there are many popular apps that are slow.

that would motivate you to use Oil. Users are more likely to become developers. Try Oil on your shell scripts and report bugs. Scripts you've written yourself or know well are good candidates at first.

Try it interactively and report roadblocks. I think a big hurdle to overcome is assembling a useful oshrc .

and report roadblocks. I think a big hurdle to overcome is assembling a useful . If know Python and shell, consider submitting a patch . Github tells me that 30 people have done this so far. You don't need to know any C++. The addition of mycpp means that simple Python code can be sped up by an order of magnitude for free!

. Github tells me that have done this so far. Ask us questions on Zulip. Or feel free to lurk too :-) I don't expect patches to drop out of thin air without help. Here's a thread about the code structure that Aaron started. New patches don't necessarily have to pass type checking. Tests are more important, and we can help you with those too.

on Zulip. Or feel free to lurk too :-)

I continue to maintain issue labels to help new contributors:

There are several important categories of work:

#should-run-this lists shell scripts to test. Running important programs will make Oil appealing to more users.

Improving the #osh-language.

Improving the #oil-language.

#feature. You can prototype them in plain Python!

We have many #interactive-shell ideas, but not enough hands to implement them.

#documentation.

#devtools, like using Guix or Nix as a virtualenv for C and shell . I'll accept any PR that makes build/dev.sh minimal and then bin/osh run in an isolated environment. These commands generate Python source code, build a C extension, and run the resulting plain Python program. I got a couple PRs but it wasn't clear how to use them. (This is the "dev build", which is different than release build or the mycpp build.)

.

Overall, I think oil-native is evidence that Oil will work. Try it and let me know if you disagree!

An abstract shell interpreter can be turned into a production-quality shell. The shell will be compatible with old programs, but it also has powerful Python-like data structures and JSON support.

In other words, Oil is our upgrade path from bash.

If that appeals do you, consider helping out. The most important thing is to try it and figure out what's preventing you from being a user. Users are more likely to be developers!

Appendix A: What happened to OPy, OVM, and OVM2?

We're still using the OPy bytecode compiler for oil , but it will be retired when Oil becomes a pure C++ program via mycpp, ASDL, and other translators.

, but it will be retired when Oil becomes a pure C++ program via mycpp, ASDL, and other translators. OVM is our slice of CPython. We'll keep Python objects like dict , list , tuple , str , int , bool , but we'll remove the bytecode interpreter in favor of native code. This solves the "double interpretation" problem.

, , , , , , but we'll remove the bytecode interpreter in favor of native code. This solves the "double interpretation" problem. As an experiment, I wrote 1000 lines of code toward OVM2, a minimal bytecode interpreter to run Oil. This approach was too much work for the small speed benefit. Relying on MyPy for type checking and the C++ compiler for code generation has produced much better results.

Appendix B: Metrics for Release 0.7.pre9

Let's compare the current release with version 0.7.pre5, released two months ago on October 4th.

The parser benchmarks deserve their own post, so here are the metrics we routinely track.

Most of the development work was on translation, which doesn't affect spec tests:

There were a few new features like the ${prefix@} implementation. And we add failing tests to expose bad behavior, sometimes before we can fix it.

The Oil language now has JSON support:

Even though there were at least a hundred commits to aid translation, the source code didn't get much bigger:

cloc for 0.7.pre5: 13,925 lines of Python and C, 307 lines of ASDL.

lines of Python and C, lines of ASDL. cloc for 0.7.pre9: 14,156 lines of Python and C, 308 lines of ASDL.

Physical code:

Native Code and Bytecode Metrics

The yajl dependency for JSON support added almost 5K lines of C code:

The native code size increased by a corresponding amount:

The bytecode shrunk a bit:

These are minor differences compared to the reductions that mycpp should enable in the coming months. All bytecode will be replaced with native code.