blog | oilshell.org

Release of OSH 0.6.pre5

This is the latest version of OSH, a bash-compatible shell:

Please try it on your shell scripts and report bugs! To build and run it, follow the instructions in INSTALL.txt.

If you're new to the project, see Why Create a New Shell?.

In the last post on autocompletion, I hinted at what's new in this release. Read on for details.

This post summarizes the last three releases:

0.6.pre3 on August 30th (changelog)

0.6.pre4 on September 9th (changelog)

0.6.pre5 on October 7th (changelog)

Highlights

OSH can run portions of real bash completion scripts . This led to implementing new language features, builtins, and special variables; as well as fixing many bugs.

. This led to implementing new language features, builtins, and special variables; as well as fixing many bugs. To make these hairy completion scripts run, I developed shell debugging tools , like an "out of band" logging stream and a crash dump.

, like an "out of band" logging stream and a crash dump. OSH now supports alias expansion , which was a major architectural change.

, which was a major architectural change. Improved OSH to Oil translations.

1030 spec tests now pass, compared with 880 as of the last release announcement. For reference, there were 249 passing when I started keeping track in March 2017.

spec tests now pass, compared with as of the last release announcement.

Details

0.6.pre3

Ricardo Grant implemented compgen -A function , which auto-completes function names.

, which auto-completes function names. Published more test suites : parse-errors and runtime-errors: They should help us rapidly improve OSH error messages. Other: osh-usage , oshc-deps and arena .

: OSH to Oil translations . This helps with the Oil language design. Fixed all translator crashes revealed by the wild tests. Improved the translation quality in many cases, e.g. of here docs. Compatibility shims sh-expr and shExpr() .

. This helps with the Oil language design.

Under the hood:

All parsers now use exceptions rather than return codes to indicate syntax errors. When I started the project, I felt that return codes would translate more easily to C or C++, but this was a mistake. Exceptions (or non-local control flow) are essential when writing recursive parsers. Recursive parsers in C typically use longjmp() rather than return codes.

0.6.pre4

Alias expansion , with the alias and unalias builtins. Stubbed out shopt -s expand_aliases for bash compatibility. Like other shells, we always expand aliases.

, with the and builtins. Wild tests : Categorized some errors as not-shell, not-osh.

: Categorized some errors as not-shell, not-osh. More translation improvements: until loop, empty here doc, etc.

improvements: loop, empty here doc, etc. Bug fixes : Fix the $IFS word splitting algorithm so x"y " is evaluated correctly. Fix an unhandled exception in source . Make $(x a parse error. Prior to this fix, an actual EOF character could close a command substitution.

:

Under the hood:

Interleave parsing and execution in the main loop. This is necessary to expand aliases after they've been defined. It's also necessary for the rare shell scripts that "change languages", like the one I found when Parsing 183,000 Lines of Git's Shell Source Code (2016). You can still statically parse code with osh -n , though it necessarily ignores aliases. In contrast, the Oil language won't have aliases and will be entirely statically parsed. According to the bash manual: For almost every purpose, shell functions are preferred over aliases.

in the main loop. This is necessary to expand aliases after they've been defined. Thread a ParseContext object throughout the parsing code to handle the runtime → parse-time feedback.

0.6.pre5

Ricardo Grant implemented declare -F to list functions.

New metrics on the OPy compiler's bytecode:

overview: Compare OPy vs. the CPython compiler.

oil-with-opy: Detailed metrics on OPy-generated bytecode.

oil-with-cpython: Detailed metrics on CPython-generated bytecode.

src-bin-ratio-with-opy: To remind us to keep the Oil binary small.

Analyzing the bytecode with the R language improved my understanding of the compiler that I cobbled together. Explaining the above metrics metrics would make a good blog post. I discovered some inefficiencies, described in issue #180.

Language features for Completion:

Extended Globs like @(*.py|*.sh) . We use GNU libc to implement this, so it doesn't work on Alpine Linux or OS X. If you're up for a fun algorithms challenge , please chime in on issue #192. Implementing this requires no understanding of Oil's code.

like . We use GNU libc to implement this, so it doesn't work on Alpine Linux or OS X. Associative Arrays : declare -A assoc , ${assoc["foo"]}

: , LHS Array Assignment like COMPREPLY[j++]=foo .

like . Partial implementation of ,, and ^^ for uppercase/lowercase .

and for . Variable references like ${!N} can evaluate to $1 , $2 , etc. when N is an integer. Example: ${!OPTIND} .

like can evaluate to , , etc. when is an integer. Example: . Overhaul of regex matching, e.g. [[ foo =~ .*\.py ]] . Quoted parts are properly regex-escaped.

. Quoted parts are properly regex-escaped. command ls foo disables function lookup. bash treats this as a builtin, but it's more like part of the shell language.

disables function lookup. bash treats this as a builtin, but it's more like part of the shell language. Bug fix: if ! declare ... is no longer run in a subshell. Collateral enhancement: The last command in a pipeline runs in the current shell process where possible. In other words, all pipelines behave as if shopt -s lastpipe was set. (zsh behaves this way by default.)

is no longer run in a subshell.

Builtins for completion:

Partially implemented the complete and compgen builtins. complete -F myfunc registers a user-defined completion function . OSH can now invoke such functions correctly in many cases. Logic to complete function names, alias names, shell option names, and help topics. Simple file system completion.

and builtins. Stubbed out compopt .

. shopt -q queries global shell settings.

queries global shell settings. getopts handles explicit arguments when passed, rather than reading "$@" .

handles explicit arguments when passed, rather than reading . Stubbed out shell options in order to run ~/.bashrc set -v / set -o verbose shopt -s progcomp , shopt -s hostcomplete

Completion API variables like COMP_CWORDS , COMP_LINE , etc. Stubbed out COMP_WORDBREAKS

, , etc.

Bug fixes for completion:

The getopts builtin respects dynamic scope when setting OPTIND and other output variables. In other words, it looks up the stack for a variable to mutate.

builtin respects dynamic scope when setting and other output variables. In other words, it looks up the stack for a variable to mutate. (( i = 0 )) no longer evaluates i (as is necessary for (( i += 42 )) ).

no longer evaluates (as is necessary for ). Following bash, string-to-integer coercions with [ are now stricter than coercions with [[ .

are now stricter than coercions with . Implemented declare -g to force variables to be global (used by bash-completion). This matters in this case:

f () { local x = 1 # The "globals" in lib.sh are now local! Unless declared with -g. source lib.sh }

Dev Tools:

OSH can produce a JSON crash dump of the interpreter state, which is enabled by the OSH_CRASH_DUMP_DIR environment variable. It still needs work, and I will write more about it when it's usable.

of the interpreter state, which is enabled by the environment variable. It still needs work, and I will write more about it when it's usable. A new --debug-file stream to see internal info and warnings.

stream to see internal info and warnings. --xtrace-to-debug-file so the output of set -o xtrace doesn't go to stderr (which is usually a terminal).

so the output of doesn't go to stderr (which is usually a terminal). A new repr builtin to inspect the value of variables (not done, may change).

builtin to inspect the value of variables (not done, may change). A more faithful implementation of bash's FUNCNAME , BASH_SOURCE , BASH_LINENO . (Note: I don't like their semantics and would change them if given a chance.)

, , . (Note: I don't like their semantics and would change them if given a chance.) Polish many error messages and add location info, e.g. to bash's questionable string to integer coercions. (Remember, they can be turned off with set -o strict-arith .)

Other:

Re-enable readline history.

Website: I revamped the page that lists all releases.

OPy: Add the ability to remove docstrings with:

$ opyc compile -emit-docstring = 0 foo.py

What's Next?

To be honest, I'm burnt out on the interactive shell, and it's not done yet. I need help!

Leave a comment or chat with me on oilshell.zulipchat.com if you're experienced with both Python and shell, and want to help.

Here are some blog posts I should write:

What I've Learned About Shell Auto-Completion (So Far) About bash, zsh, fish, etc.

How I Debug Completion Scripts The --debug-file flag needs an asciinema demo.

Patches to the bash-completion Project (not upstreamed yet)

Comments about the Oil Project, Software Evolution, and Adoption I haven't blogged much over the last couple months, but I have written comments on Hacker News and Lobste.rs that shed light on the project's motivations.

How to Contribute to Oil Without Understanding the Code Extended glob (#192), fuzzing the parser (#171), cloning find and xargs (#85). Issues labeled "good first issue"



Appendices

Selected Release Metrics

The new functionality above is reflected in the spec test metrics:

I also mentioned improvements in the OSH-to-Oil translations:

(Many translations are still incorrect, but they no longer fail by crashing!)

I want Oil's source to be compact and easily understandable. All these new features only cost us ~700 significant lines of code!

cloc for 0.6.pre2: 9,136 lines of Python and C, 144 lines of ASDL

lines of Python and C, lines of ASDL cloc for 0.6.pre5: 9,839 lines of Python and C, 148 lines of ASDL

Including whitespace and comments:

Slight decrease in native code:

which results in a slightly smaller binary:

But the bytecode size went up, since there are more lines of Python:

Unimplemented Language Features

OSH is at the point where I implement features on demand. I won't go out of my way to implement a feature — a "real" shell script has to motivate it.

The only unhandled exceptions in the spec tests are now NotImplementedError for the following features:

<& redirect: I don't actually understand this operator, e.g. compared with >& .

redirect: I don't actually understand this operator, e.g. compared with . The bash operator |& to pipe stderr as well as stdout . It's analogous to &> and &>> , which were recently implemented.

to pipe as well as . It's analogous to and , which were recently implemented. ${!a[@]} to give the keys of associative arrays. (This is completely distinct from the normal "named reference" behavior.)

to give the keys of associative arrays. (This is completely distinct from the normal "named reference" behavior.) Other aspects of associative arrays are not implemented (since associative arrays are new in this release).

Hard Bugs Encountered in These Releases

(1) Python's readline binding silently swallows exceptions in registered callbacks. Like the author of this blog post, I found this out the hard way.

(2) I hit another occurrence of the hardest bug: file descriptors silently getting clobbered, leading to a process that can't even print debug messages.

The underlying cause was that import random in Python opens /dev/urandom as a side effect. I fixed the bug by simply removing it, since it was an unnecessary transitive dependency caused by cgi.escape() .

A shell interpreter is unique in that it must know about all file descriptors used by the process it runs in. It can't use libraries and frameworks that open files "behind its back".