blog | oilshell.org

How I Plan to Use Tests: Transforming OSH

The last post describes how I'm using tests to work toward an initial OSH release. This post will describe how I expect to use tests after the release.

The Problem

When I release OSH, you'll notice that it's too big and too slow. It makes me cringe to say that it's bigger and slower than bash, but that's true right now.

This is because it's an unoptimized Python program that I wrote as a prototype.

But I have a plan to make it faster and smaller. I know by now that big programs don't always respect their author's plans, but I've written and tested enough code that I think it will work. (Remember, this is the riskiest part of the project.)

I described some of that work in The OPy Front End is Working. In short, I plan to make OSH faster by gradually transforming the code from Python to OPy. I'm explicitly avoiding a big-bang rewrite.

Free Tests

How does this relate to testing? Here are two different strategies for testing a compiler:

Make assertions on the output of the compiler. When compiling to x86, you might test for add and ret instructions, and their operands. When compiling to Python bytecode, you would test for BINARY_ADD and RETURN_VALUE instructions. Run the compiled program and make assertions on its output. OSH is written in Python, so its tests are an implicit test that the CPython compiler and runtime did their jobs correctly.

Although it's not an either-or decision, I believe approach #2 is superior. Testing calcifies interfaces, and Python bytecode isn't a stable interface.

In fact my strategy to speed up OSH involves adding and remove bytecodes. This will turn the CPython VM into OVM.

So the spec tests for OSH will actually help me develop OPy. They're almost like free tests. OSH is a non-trivial program that will exercise most parts of the compiler. I'd go as far as to say that the parts it doesn't use can be deleted!

Summary

The main point of this post is that OSH will be too big and slow upon release, but I have a plan to address this.

If you're interested in programming language implementation, I've included two more sections below. Appendix A has some ideas for OPy, and Appendix B is about MicroPython.

I'd like feedback on these ideas in the comments.

Appendix A: Notes on OPy

The following links show what I'm thinking. Note that some threads are from /r/ProgrammingLanguages rather than my own subreddit /r/oilshell.

(1) Comments on Working Toward an OSH Release

I link to 3 prior comments, which I've linked and summarized below.

I finally admitted that I'll never get around to writing OSH in native code. But I do want it to be faster, so I need a different solution.

The first release of OSH will be in Python 2.7. After that it will evolve into OPy.

The lexer is very slow because I haven't yet ported it to something like re2c.

There's a link showing what isn't yet parsed in OSH: extended globs, coprocesses, etc.

(2) Comments on OVM Will Be a Slice of the CPython VM

I've tried six strategies to reign in the inherent complexity of a bash-compatible shell.

One of them was a Python-to-Lua-bytecode compiler that didn't really work. I only spent a few weeks on it.

(3) Comments on Cobbling Together a Python Interpreter

OPy will be more static than Python and Lua. The compiler will be bigger , and the runtime will be smaller . Bytecode compilation can be done ahead of time, as in OCaml. (OCaml has a bytecode compiler + runtime as well as native compilers.)

, and the . Bytecode compilation can be done ahead of time, as in OCaml. (OCaml has a bytecode compiler + runtime as well as native compilers.) OPy will be dynamically typed, but local variable names and class member names will be statically resolved, roughly in the fashion of the Wren language. The way I'm thinking of this is: get rid of classobject.c in Python, and compile classes down to tupleobject.c . If names are statically resolved, this should be possible.

(4) Comments on Rewriting Python's Build System from Scratch

More elaboration on resolving names statically. At runtime, I want to look things up by number rather than by name . That is, rely less on dict lookups and more on integer offsets.

rather than . That is, rely less on lookups and more on integer offsets. Python already has a similar optimization in the LOAD_FAST bytecode, compared with LOAD_ATTR , etc. But I think even LOAD_FAST incurs unnecessary overhead at the beginning of function calls, so OPy could be even faster (?).

Appendix B: MicroPython is Impressive

While I'm looking into the future, I'll mention that I've been reading the source to MicroPython. It's a from-scratch reimplementation of Python 3 for microcontrollers, started in April 2013 and feature-complete in April 2015.

MicroPython; A Journey From Kickstarter to Space — A talk by creator Damien George at PyCon Australias. It recaps the project history and there is a bit about the implementation.

I like what I see so far. It apparently supports much of CPython's functionality, but the source code is smaller and it uses less memory at runtime, in order to fit on a microcontroller.

The first thing I ran into is a lack of Unix libraries. This is because it's designed for microcontrollers, which have no operating system. You can run it on Linux by building out of the unix/ directory, but it's pretty bare-bones.

I think I'll still be using CPython for the foreseeable future, but I will continue studying and experimenting with MicroPython. If you've used it, please leave a comment!

And no, it's not lost on me that this will be the seventh programming platform I've tried for writing a shell!