A Unix Shell in Ruby - Part 2: Builtins

Published on February 21, 2012 by Jesse Storimer

Welcome to the second article in this series. Last time we implemented a verrrry basic shell that could just run most basic commands.

This time around we're going to look at how it's broken and how we can fix it. First up, we'll need a few builtins.

A Note on Launching

Last time I explained the semantics of exec : a call to exec will replace the current process with another. This time, when we launch shirt we're going to do so with the exec(1) utility. So we'll launch our shell like this:

$ exec ./shirt

When we do this from bash , our bash process becomes a shirt process. This way we can ensure that we're not running as a descendant of a bash shell and we can minimize the dependence on bash and its features.

Onward.

Even cd doesn't work!

First, we'll try the cd command:

$ exec ./shirt $ pwd /Users/jessestorimer/projects/shirt $ cd /usr/local $ pwd /Users/jessestorimer/projects/shirt

:(

Even though cd appears to have worked it didn't actually change our current directory. If we look back at the code from last lesson it becomes obvious why this is the case.

Backstory

When we tell shirt to run a command it does it in two steps: fork + exec. The call to fork creates a new child process. The call to exec transforms that new process into a cd process.

Although the child process knows the process ID of its parent, it can have no direct effect on the parents' state. So then, we shouldn't be surprised that the parent process (our shirt process) is unaffected when one of its children changes directories.

Just to make this crystal clear: the cd is taking effect on the child process, but that change is not reflected in the parent process. So once the child has finished and exited it's as though the cd never happened.

We'll get around this by making cd a 'builtin' command.

A Note About Builtins

Every shell has some built in commands. These are commands which are inherently different from typical utilies in that they change the state of the shell and run in the shell process itself.

For instance, a utility like cat simply prints text to the terminal. It does not change the state of the shell itself. The same goes for common commands like grep , tail , and curl . But there are certain commands that need to change the state of the shell, and cd is one of them.

Implementing cd

OK, so we can't implement cd in the same way that our other commands are handled. We'll have to add a special handling to our shell for builtin commands.

Before that, let's do a slight refactor of our shell from last time to have a more consistent run loop that we can build upon:

#!/usr/bin/env ruby loop do $stdout . print '-> ' line = $stdin . gets . strip pid = fork { exec line } Process . wait pid end

In this implementation we've reduced a bit of duplication when printing the prompt. We start by entering an endless loop, printing the prompt, getting a line of input using gets , removing leading and trailing whitespace with strip , and then processing that input as before.

Here's our first pass at implementing cd :

#!/usr/bin/env ruby require 'shellwords' BUILTINS = { 'cd' => lambda { | dir | Dir . chdir ( dir ) }} loop do $stdout . print '-> ' line = $stdin . gets . strip command , * arguments = Shellwords . shellsplit ( line ) if BUILTINS [ command ] BUILTINS [ command ]. call ( * arguments ) else pid = fork { exec line } Process . wait pid end end

Perfect, now cd works as expected. Here's the explanation.

The BUILTINS constant stores a hash denoting the builtin commands we support. The key specifies the builtin command and the value is a lambda which encapsulates the behaviour. For cd we use Ruby to change the directory without first creating a child process. In this way, the current directory of the shell process itself will be changed.

After getting the input we split the input line into a command and its arguments. This is done with Shellwords , which is part of the Ruby standard library. Why is it used, you ask? Shellwords will split the string the same way that bash does. The documentation for Shellwords.shellsplit has a great example of this.

The last addition is a conditional. If the command is one of the builtin functions then we should use that instead of the fork+exec method.

Just Try to Leave, I Dare You!

Let's implement another builtin, exit . exit doesn't exist as an executable anywhere on your system, it's always implemented as a builtin. Here's the relevant code for our implementation of exit :

BUILTINS = { 'cd' => lambda { | dir | Dir . chdir ( dir ) }, 'exit' => lambda { | code = 0 | exit ( code . to_i ) } }

Now we can do a session like:

$ ls shirt $ pwd /Users/jessestorimer/projects/shirt $ cd /usr/local $ pwd /usr/local $ exit

Most of the time you'll probably exit without passing an exit code, but it can be useful sometimes.

Change me!

We'll implement one more builtin, exec . This will behave the same way that bash ' exec does, it will transform the shell process into another process. Useful for scripting or when you want the shell to become another program leaving no trace. Here's the implemenation:

BUILTINS = { 'cd' => lambda { | dir | Dir . chdir ( dir ) }, 'exit' => lambda { | code = 0 | exit ( code . to_i ) }, 'exec' => lambda { |* command | exec * command } }

You get the idea.

Before You Go

We're implementing builtins by splitting a string and matching it to an array of predefined commands. In reality a shell like bash implements a real language and would do this work with a real parser. We're taking shortcuts for the purposes of moving quickly and learning.

OK, now you can leave

That's it for our first look at builtins. This shell is getting more and more functional :) You can get the latest version of shirt on Github.

If you want to see what other builtin commands are part of a shell like bash use man builtin .

In part 3 we'll look at handling environment variables, specifically the PATH .

For a more thorough introduction to Unix programming basics, check out Working With Unix Processes.