blog | oilshell.org

You Can Now Try the Oil Language

(Last updated 2020-08-12)

In August, I published early design notes for the Oil language.

Since then, I've continued prototyping it, and I started at least 30 design discussions on oilshell.zulipchat.com. I appreciate all the feedback, and I'm looking for more!

I just released Oil 0.7.pre5, which contains all this work. Here's a summary:

The syntax of Oil is mostly done. I added a Python-like expression language to a compatible shell. Unlike shell, Oil has powerful data types like dictionaries, lists, tuples, ints, and floats. Functions operate on any of these types. The semantics are in a pretty good state, but scope and error handling aren't worked out yet. I reused the spec test framework for Oil, and almost 200 Oil spec tests now pass. Everything still up for discussion. This is an early release. The language ready to try, but not ready to use :-) On the other hand, you can use this release to run existing shell scripts. I dogfood Oil, and its stricter semantics have caught several problems in my own shell programs.

There are some new docs like the Eggex manual, but overall, the code is currently way ahead of the documentation.

This post gives an outline of the docs I want to write. Feel free to ask questions on Reddit or on Zulip! They'll help me decide what to write about.

Docs I Want to Write

There are drafts of many of these docs on Zulip.

Oil From One Million Feet

One way to explain Oil is by comparison to other languages.

Shell and Python are the two biggest influences on Oil.

It's also influenced by JavaScript, Ruby, Perl, Julia, R, and Go.

(thread)

Small vs. Big Languages

Oil is a big language because it's meant to "subsume" other languages. For example, it contains all of shell, and much of Python, and they're both big languages.

On the other hand, Oil's implementation is smaller than bash or Python.

I still believe that Shell, Awk, and Make Should Be Combined, although the strategy for getting there has changed.

(thread)

Oil from 10,000 feet

I drafted a blog post which covered each language feature, the rationale for its design, and outlined future work. Here's the table of contents:

Oil Mostly Borrows From Other Languages Differences vs. Python

High-Level Descriptions Paradigms and Style What Should It Be Used For? (#shell-the-good-parts) Links To Older Descriptions

Syntactic Concepts Static Parsing Parse Options to Take Over @ , () , {} , set , and = (Shell Language Deprecations) Sigils, Sigil Pairs, and Lexer Modes Command vs. Expression Mode (see below) Keywords vs. Builtins

Syntax The Expression Language Is Mostly Python Word Language: Inline calls $f(x) , Expression sub $[] Static printf , Formatters (deferred) Homogeneous Arrays (Word or Expression Syntax) New Keywords var , const , setvar set , setglobal , setref = and pass proc and func ; return Dict Literals Look like JavaScript String Literals May be Raw or C (in expression mode) Docstrings and Multiline Commands (not done)

Runtime Semantics shopt -s simple_word_eval Does Static Word Evalation Scope and Namespaces (not done) Procs Have Open or Closed Signatures Functions Look Like Julia, JavaScript, and Go Data Types Are Mostly Python

Special Variables

Shell-Like Builtins Builtins Accept Long Options New: write , flags to read , use , push , repr

Builtins Can Take Ruby-Like Blocks (partially done) cd , env , and shopt Have Their Own Stack forkwait and fork builtins Replace () and & Syntax each { } Runs Processes in Parallel and Replaces xargs

More Use Cases for Blocks (not done) Flag Parsing to Replace getopts Configuration Files Unit Tests Find Dialect (deferred) Awk Dialect (deferred) Make Dialect (deferred)

Builtin Functions Borrowed From Python Borrowed From C

Textual Protocols / Interchange Formats (not done) JSON QTSV

Implementation Status Deferred Features

How to Give Good Feedback

Appendix: Why an Upgrade?

It's clear I need to split this into many docs!

OSH vs. Oil

In past releases, OSH has been concrete — you can run your shell scripts with it — but Oil has been vague.

That's now changed! This release has an oil executable, which is a busybox-like symlink to the "app bundle".

Here's how it works: bin/oil is just bin/osh with the addition of shopt -s oil:all . The option group oil:all is a shortcut for around 10 parsing and execution options which gradually upgrade OSH to Oil.

In the last post, I explained why Oil is now a dialect within OSH. Essentially, I realized that the strategy of creating two different "worlds" makes the shell both harder to implement and harder to use.

The goal of Oil is unchanged: it's your upgrade path out of bash. That path is more seamless if there's a single binary and a single language with a few options.

I'll also describe the oil:basic option group, which lets you use Oil features, but minimizes the breakage in existing shell scripts.

Command vs. Expression Mode

This is an essential syntactic concept. An Oil program starts in command mode, and commands are composed of words:

echo "hello $name" ls | wc -l

However there are several keywords and sigils that put you in expression mode, e.g. so that * means multiplication rather than glob:

var x = 1 + 2*3 + f(x) # After =, you're in expression mode = myfunc(42, 'foo') # Pretty-prints the result, without assigning

Inline calls also put you in expression mode:

echo $strfunc(1 + 2*3) # Between (), you're in expression mode echo @arrayfunc(x, y)

There are also expression substitutions with $[expr] :

echo "attr = $[obj.attr]" echo "key = $[d->key]" echo "item = $[array[1 + 2*3]]"

More

If you want a peek at what I'll be writing about, I maintain a Blog TODO thread on Zulip.

Feel free to start new topics with questions. They'll help me decide what to write about. (The "New Topic" button is at the bottom of the screen. Click a message body to reply under the same topic.)

This lobste.rs comment also lists some interesting threads, and things I'm looking for feedback on. At some point I'll post a summary to Zulip, since I know it's a lot to read.

Also see blog posts tagged #oil-language!

What's Next?

This is the general feature set I want for "V1" of Oil. The details will change based on your feedback, but I think the "foundation" of the language will converge pretty soon.

After that, I plan to tackle the riskiest part of the project: Oil is still too slow!

To fix this, I plan to resume the mycpp work I started in April. I think it will yield a reasonable speedup with a reasonable amount of engineering effort, but there's no guarantee.

If it doesn't, I explained on Zulip that Oil Is Made of Ideas. It's Not a Pile of Code in a Particular Language.

So you can reimplement it in another language. Its source code is significantly smaller than bash — i.e. I "compressed" and cleaned up bash for you. And I also added a new and powerful expression language on top!

Appendix A: Source Code Files

Toward that end, I started publishing key source files at the bottom of the each release page. Summary:

The unified OSH and Oil lexer is specified with regular expressions, and compiled with re2c.

The Oil expression language is specified with a context-free grammar, using pgen2 syntax. In contrast, the OSH parsers are hand-written.

The unified OSH and Oil syntax tree is specified with Zephyr ASDL.

More on this later. (Or ask me about it on Zulip.)

Appendix B: Metrics for Release 0.7.pre5

Let's compare the current release with version 0.6.0, released three months ago on July 1st.

There are 115 new tests passing:

In addition, we now have almost 200 Oil spec tests passing:

Oil spec tests for 0.7.pre5: 207 tests, 195 passing, 12 failing

There are ~1500 new significant lines of code in OSH:

cloc for 0.6.0: 12,447 lines of Python and C, 239 lines of ASDL

lines of Python and C, lines of ASDL cloc for 0.7.pre5: 13,925 lines of Python and C, 307 lines of ASDL.

And ~2500 new lines of physical code in OSH:

In the oil_lang/ directory, there were 1,175 physical lines, and now there are 3,655. This number isn't that meaningful because some of Oil is in the frontend/ directory.

Nevertheless, it's a good sign that Oil is still a small program, even with the addition of the large Oil language. It lets me make aggressive, global changes to the codebase.

The small size is largely due to the use of the domain-specific languages mentioned above.

Native Code and Bytecode Metrics

I restored Python's floating point support to the Oil build, so the amount of native code increased:

The binary size also increased:

As well as the bytecode size:

These are minor differences compared to the optimizations and reductions I hope to make in the coming 6 to 12 months.

Acknowledgements

Thanks to Ilya Sher and Kartik Agaram for great discussions on the Oil language. I'm still looking for more feedback!