blog | oilshell.org

I'm Attending Recurse Center this Summer

The Recurse Center is an educational retreat for programmers, located in New York City. I've heard about it for years on Hacker News, and I've always liked the idea.

In early February, I applied and was accepted. A few days ago, I finally figured out where I'll be staying! I'll be there from May 21st to August 8th.

This post is a status update: what's happened recently, and what I plan to do in the future.

Plans for Oil

I'm giving myself permission to take a break from Oil for the summer. I've worked on it for almost two full years, and I don't expect a three-month break to derail the project.

(The first post showed the first Python commit on 4/20/2016, but I actually started a C++ version in a different repo on 3/29/2016.)

In particular, the cadence of regular Oil releases will slow. This blog will probably also fall behind — it's behind even without the trip to NYC!

The purpose of my trip is to learn from other people, so it wouldn't make sense to just slog away at Oil.

On the other hand, there's more than one educational thing about this project, which somebody might be interested in. So I'll be surprised if the project goes completely dormant.

TODO Before Leaving

I don't want to drop the project abruptly, so here's a TODO list for the next two months.

(1) Make an Oil release. It will be either 0.5.alpha3 or 0.5 . I wrote in the 0.4 release announcement that I wanted a significant user-facing feature for 0.5 . If that doesn't happen, I'll call it 0.5.alpha3 .

(2) Flush the blog backlog. Unfortunately, unpublished drafts of blog posts have piled up. So I plan to summarize several posts in a single post, like I've done in past posts tagged #blog-topics.

If you're interested in more detail, you can leave a question in the comments. I'm prone to leaving long replies. (Here's another one.)

What I've Been Working On

Although I haven't written many blog posts recently, I've continued to work on Oil. The project reached some concrete milestones at the beginning of the year, so I gave myself permission to start several new things at once.

The next two sections describe work in progress, so I'll be brief. Feel free to ask questions if you want details.

User-Facing Features

(1) A shell trace tool. This is a web service that helps you debug your shell programs. You can think of it as sh -x with a better UI. I'm working on it with my friend Eric.

(2) Static analysis of shell scripts. The idea is that if you type oshc deps foo.sh , it will display all the external binaries the script depends on ( rsync , curl , etc.). This necessarily involves some heuristics, not all of which I've figured out yet.

A good use case for this is making shell app bundles.

Infrastructure

(3) Performance measurement and experiments. To help optimize OVM and the code generated by OPy, I started learning more about CPython performance.

I profiled CPython with the Linux perf tool, and made a flame graph. I hit the common -fno-omit-frame-pointer issue. I found that function-based profiling isn't good for interpreter loops , which are generally recursive functions with big switch statements. This flame graph isn't very useful. If you've encountered this problem before, please leave a comment.

I tried to use Python's builtin systemtap instrumentation, which was surprisingly frustrating.

I tried the ShedSkin compiler. I had heard of it many years ago, but had never used it. The demos are impressive, and I think it's closer to what I want than Cython. It's a relatively small piece of code, and it does whole-program type inference. In contrast, I think I want explicit type annotations and more localized type inference.

I read about the experimental FAT Python project.

(4) Refactoring the OPy compiler. Note that OPy is project-specific infrastructure. It won't be exposed to users.

I made its output deterministic . The fix was to add a __hash__() function to a class commonly inserted into a Python set() . I then created golden checksums for 222 .py → .pyc translations, comprising ~67,000 lines of code.

. The fix was to add a function to a class commonly inserted into a Python . I then created golden checksums for 222 → translations, comprising ~67,000 lines of code. After this, I was free to wildly refactor the code, without fear of breaking anything. I'm really pleased with the result. Last year, the bytecode compiler was something I made minimal modifications to. Now I understand it and fully own it.

the code, without fear of breaking anything. I'm really pleased with the result. Last year, the bytecode compiler was something I made minimal modifications to. Now I understand it and fully own it. I ran into some interesting issues related to the "trusting trust" attack. Because of the way the Python VM works, changes to variable names in the compiler source "leaked" into the output .pyc files!

in the compiler source "leaked" into the output files! I wrote opy/callgraph.py , a module that walks the static callgraph of Oil at runtime, right before main() is executed. This will require another blog post to explain properly, but at a high level, I'm trying to resolve the tension between type checking and metaprogramming with a kind of multi-stage programming.

, a module that walks the of Oil at runtime, right before is executed. This will require another blog post to explain properly, but at a high level, I'm trying to resolve the tension between type checking and metaprogramming with a kind of multi-stage programming. I'm learning about symbol tables, type inference, and type checking.

(5) A source code browser. I started with a Clang-based C++ source browser, and I plan to add Python support. This relates strongly to the OPy compiler because source browsers also require statically resolving names and determining types.

This might be a fun thing to work on at Recurse Center, because code comprehension tools have an obvious relationship to learning.

A Small Commitment

I know there are readers excited about Oil, and I don't want to lose that support while I'm gone.

So, even though I'll be working on other things, I aim to respond quickly to questions about the development process, and to pull requests. I generally respond in day or so, and I'd like to keep that up.

I'd love it if other people could work on OSH while I'm gone. I've tagged some issues #help-wanted on the issue tracker.

I realize that there's a learning curve to overcome when working on Oil. The project is written in a unique style, with several custom test frameworks and code generators. Feel free to ask me questions about them.

On the other hand, it's very possible that Oil will play a big role in my time at Recurse Center. In fact, that would be ideal.

But I'm not going to overplan or overthink it; I'll just let things happen. The goal is to do something I wouldn't do if I were at home. That might relate to Oil or it might not.

Conclusion

In summary, I have:

A small list of tasks to wrap up before leaving. A large list of sub-projects in flight, related to Oil features and infrastructure. A full summer of open-ended learning!