...you must first understand Unix.

I gave another talk at my local LUG this week. The idea was to set the scene so I could then move onto more useful things like git, perl, javascript, etc.

I wanted to begin by getting everybody comfortable and familiar with the command line. The concept I wanted to put across was that using the CLI is like walking into the film 10 minutes before the end. It doesn't make sense because you don't know what lead up to it.

Imagine knowing nothing about Lord Of The Rings and then only being shown the ending. You'd see Sam & Frodo slogging their way up a mountain with the single goal of throwing a ring into it, for some reason. After almost killing themselves to get there, Frodo announces he's not going to do it after all. Then he vanishes. Then a little hairless freak runs in, floats around excitedly, Frodo reappears, baldie has the ring. He falls into the lava and the ring melts. For some reason, at the exact same time, a massive black tower with a flaming cat's eye on the top falls down; and the ground collapses, swallowing up the huge army of orcs but very conveniently not harming the small army of humans they were facing.

How much of that makes any kind of sense? None, it's nonsense. You can't grasp the meaning of any of these events without knowing the backdrop. And you can't understand why the CLI is the way it is today for the same reason. It was built by some amazingly clever people, and they didn't try and make it as obtuse and hard-to-understand as possible. Quite the contrary, they did their best to make it as sensible and intuitive as possible.

So in order to understand how the CLI is sensible and intuitive, it helps to understand how it got to where it is today. Hence my talk.

We began with my phone. A Samsung Galaxy S2, running the Ice Cream Sandwich version of Android. Very intuitive, very easy to use. Somebody with no experience can be handed an Android phone and be up & running with it in seconds. No manual to read, nothing to explain, they can just figure it out. And it looks nice, and it has pretty buttons and nice backgrounds... It has a very gentle, shallow learning curve.

This is good, because it means that you can go from being a novice to an experienced user in no time.

This is bad, because it means that the difference between a master and a beginner is almost nothing. There is very little power or sophistication in the Android interface - what you see it what you get, and there's very little you don't see.

You can't do clever and powerful things with it. You can only make phone calls and run apps with it. And all the apps are stand-alones. Except for a few apps that are specifically written to work with specific other apps, they're all self-enclosed.

Conversely, there is the CLI. This isn't pretty and you can't just sit a novice down and let them get on with it. They will get nowhere.

Given a bit of instruction and a few useful commands, they'll at least be able to get a few things done. But they'll still be just rank beginners barely scratching the surface of what can be done.

I've been using the CLI for years, as a hacker and a programmer, compiling the whole of Linux From Scratch from it and even writing my own applications for it. Yet I am still aware that I know far, far less than half of what there is to know. There are people out there who can make me look like a clueless amateur. And even THEY don't know half of what there is to know.

The command line has a steep learning curve, and it just keeps going - few, if any, people can really claim to know just about everything worth knowing. And even when you know all the commands there are and how to use them, you'll still never stop coming up with new and useful ways of tying them together, because CLI apps talk to each other.

Unix was invented in the 60s. It's considered arcane, complicated, inconsistent and unfriendly by many. And yet it, and its derivatives, are absolutely everywhere today - BSD, Linux, OS X, iOS, Android; it's on PCs, iPhones, and servers - whereas many 'friendlier' and allegedly-better alternatives have appeared and died out in the meantime leaving barely a trace. Why is that? And why does it have such weird names and jargon?

Well, for one thing, there were the notes from the initial design discussion.

At the end of the discussion, Canaday picked up the phone, dialed into a Bell Labs dictation service, and read in his notes. "The next day these notes came back," Thompson said, "and all the acronyms were butchered, like 'inode' and 'eyen.'" Butchered or not, the notes became the basis for UNIX. Each researcher received a copy of the notes, "...and they became the working document for the file system," Thompson said.

So right from the start, there was weirdness, because the design document was full of typos and mistakes.

On the plus side, though, this did free the makers from the constraints of having to find a suitable English word for the completely new ideas they were implementing. This was liberating, in a way - it freed them to call their creations absolutely anything they liked.

UserFriendly.org

So, why did they chose the names they did? Why do we have a CLI populated with

awk

cat

less

ed

sed

grep

vi

Well, awk, I grant you, is an unintuitive name. It's simply the initials of its creators. That is not helpful. Fair enough.

'cat' to show the contents of a file.. well, speaking as a cat owner, I can say that a command that rips the insides out of something and shows it to you can appropriately be named 'cat', right enough. But in fact, 'cat' is short for 'concatenate' - which was a perfectly fair description of its original use, legitimately shortened because it's way too long to type out in full.

So why do you use 'less' to show a file's contents more interactively? How does that make sense?

Well, if you just 'cat' a file, then if it has more lines in it than can fit on your screen, you'll lose the top and be unable to read it. So they invented the pager, an application that would show you a screenful of text at a time. And to allow you to know that output was being paged, and how far through you were, and how much left there was, the pager had a prompt: At the bottom of the screen, it would display the word "More", and a percentage of how far through the output you were.

So it was logical enough that the pager would itself be named more. You can see the link.

But 'more' was limited. If you realised you wanted to go back to a part of the file that had already scrolled past, you had to re-run the command. An upgrade was needed, and a better pager was created: One that could scroll up and down; and handle useful things like searches. And the name? Well, it was like 'more' but it did more.. and as everyone knows, less is more. So 'less' was born.

'ed'? Well, you need an editor, you want the shortest possible name for it.. ed is a pretty good choice.

'sed' - once you have the concept of text streams, a need to edit them, and an editor called 'ed', what else would you call a stream editor?

grep. Now here's an interesting one.

If you use ed, you'll know that it doesn't show you the file contents unless you ask it to - it was written in the days before interactive displays, many people used ed with only a printer to output to.

And often you didn't want it to show you all lines - just a subset. So it could handle that. You could say "show me the first 5 lines" or "show me the next 8 lines" or "show me lines 100-110", and that was all fine.

But sometimes you wanted to say "Show me all lines containing x" where 'x' was a bit of text you were interested in. And ed could handle that, too.

To search in ed, you used /

So /foo would look for the next instance of 'foo'

But you didn't want the NEXT instance, you wanted EVERY instance. So you wanted a global search. That was g/foo

So far so logical?

Once you found all those lines, you wanted to see them. So you told ed to print them. So the command was g/foo/p - globally search for 'foo' and print the matching lines.

Superb. But ed was clever - it didn't just do literal string matching, it did proper regular-expression handling. So you could use g/^[0-9]/p to show you all lines beginning with a number. Or g/;$/p to show you all lines ending in a semi-colon.

So the "show me matching lines" functionality, which ed users made massive use of, could be summed up as "globally search for the regular expression and print it" - or, in ed format, g/regular expression/p. Or, to shorten it a bit more, g/re/p

Yup. Grep. That's where it came from - thank you, ed. When they needed an application to find lines that contained a regex within files, they used the familiar ed idiom. Any user at the time would have understood the application name being 'grep' because they'd have known ed.

And 'vi' - well, contrary to what some would say, it wasn't the sixth attempt at an editor made by somebody who liked roman numerals. It was ed's successor, ex, that took advantage of the newfangled monitors some people had and offered a visual, interactive mode. The shortest non-ambiguous contraction of 'visual' being, of course, vi.

Seems like early text editors had a surprising amount of influence on the rest of the OS, doesn't it? Does that surprise you? After all, in any modern OS, the text editor is a pretty small deal. Windows and Notepad, Ubuntu and gedit.. meh.

But back then, people *lived* in the text editor. Early Unix users did nothing but hack and write code, the editor was vitally important to them. In fact, it was as important as the OS itself:

"I allocated a week each to the operating system, the shell, the editor, and the assembler to reproduce itself...", Thompson explained.

Yep, the editor got as much time as the OS, the shell, and the compiler. As somebody who spends half his life in a vim session, I'm bang alongside the idea that the editor is a very big deal.

Something else that had long-lasting influence over software we still use today was the ADM-3A terminal. If you ever wondered why vi uses 'hjkl' as left,down,up,right - this is why.

It's also a good illustration of the distinction between 'easy' and 'efficient' - the modern cursor key layout, with the 'up' in front of the 'down', is obvious, intuitive, and easy to master. No argument. But it's not the most efficient layout - it's not possible to keep all four fingers on the four navigation keys - you have to settle for three fingers and move them around. And if you're typing and suddenly need to move, you have to move completely away from where you're typing and switch to the cursors.

hjkl answers both issues - the four keys in a line make it possible to keep one finger on each and so move about as quickly as possible, without ever having to move from the text-entry part of the keyboard to any other. It's not as easy, not as obvious, but it's quicker and more efficient when you're used to it. Even on my modern keyboard, I go for hjkl far more than I reach for cursors.

Look closely at the ADM-3A's keyboard, you'll see the arrows on the hjkl. You'll also see that the 'home' key doubles up as the ~ key. If you ever wondered why 'cd ~/' was how to get back to your home directory - this is why. On modern hardware it makes no sense at all. On that old machine, 'ls ~/' to list your home directory was completely intuitive and memorable.

Whilst we're at it, do you know why 'ls' shows less files than 'ls -a'?

Yes, because files with names beginning with a '.' don't get displayed. Dotfiles are hidden files.

Why? We set files to be readable, writable, or executable via file permissions. Why do we set their visibility with a leading period, instead of via "chmod"?

Well, according to Rob Pike, because:

Long ago, as the design of the Unix file system was being worked out, the entries . and .. appeared, to make navigation easier. I'm not sure but I believe .. went in during the Version 2 rewrite, when the file system became hierarchical (it had a very different structure early on). When one typed ls, however, these files appeared, so either Ken or Dennis added a simple test to the program. It was in assembler then, but the code in question was equivalent to something like this:

if (name[0] == '.') continue;

This statement was a little shorter than what it should have been, which is

if (strcmp(name, ".") == 0 || strcmp(name, "..") == 0) continue;

but hey, it was easy. Two things resulted. First, a bad precedent was set. A lot of other lazy programmers introduced bugs by making the same simplification. Actual files beginning with periods are often skipped when they should be counted. Second, and much worse, the idea of a "hidden" or "dot" file was created. As a consequence, more lazy programmers started dropping files into everyone's home directory. I don't have all that much stuff installed on the machine I'm using to type this, but my home directory has about a hundred dot files and I don't even know what most of them are or whether they're still needed. Every file name evaluation that goes through my home directory is slowed down by this accumulated sludge. I'm pretty sure the concept of a hidden file was an unintended consequence. It was certainly a mistake.

It was unplanned functionality added by mistake, because it was quick & easy. It's also a modern-day standard that we're stuck with. That's the law of unintended consequences for you.

So that's a whole lot of the apparent weirdness of Unix covered. But none of it really explains why Unix was invented in 1969 and is still in widespread use today. What's the big secret? What's the source of Unix's flexibilty? The origin to the Unix philosophy? The key to unlimited power?

It's this: |

The pipe symbol.

The creators of Unix went home one evening, and came back the next morning to find Thompson had put pipes into everything.

"Thompson saw that file arguments weren't going to fit with this scheme of things and he went in and changed all those programs in the same night. I don't know how...and the next morning we had this orgy of one-liners." "He put pipes into UNIX, he put this notation into shell, all in one night," McElroy said in wonder.

In Unix, all 'core' applications can take text in, and output it. Just like a real pipe, streams flow in and flow out - but in Unix, streams are text, not water.

So you take your raw data, and you can pass it to an application that could filter it, and to another that could transform it, and then onto another, and another...

This lead to the idea of applications to perform common tasks, that you passed the stream through. And so was born the Unix philosophy, 'Write programs that do one thing and do it well. Write programs to work together. Write programs that handle text streams, because that is a universal interface.'

For example, every web browser, every word processor, and many other applications we all use daily, have a 'find' functionality. You press Ctrl-F and a find dialogue box pops up. Then you can search for a bit of text. Great.

But that find functionality is a tiny and fairly trivial part of the big package. Very little development time is put into it. It's not that efficient, because, frankly, who cares if it takes ten milliseconds instead of two to find a bit of text on a page?

Conversely, grep does absolutely nothing but find matching text in a file. It does so in very clever ways, because many clever people have worked to make it better over the years. Because grep is used by many people in many situations; it's plumbed into many Unix pipes; it's an important and widely-used application.

And the bonus is, if somebody makes grep better, then all the applications and commands that you've built it with get better too. Because the command-line is so modular, and so many applications are built out of the same building-blocks, it's worth making each module as good as it can be, and everybody benefits from this.

So without having to resort to writing a single line of code, you can create all kinds of useful applications. Examples?

To edit the most recently-modified file in the current directory:

"ls" - shows you the files

"ls -t" - shows you the files in order of modification

"ls -t | head -n 1" - shows only the first entry in the file list

"vi `ls -t | head -n 1`" - opens the first entry in the vi editor

To open any file that contains the string "fubar" in the current directory or its sub-directories:

"grep fubar . -r" - recursively grep for files containing 'fubar'

"grep fubar . -rl" - only output the filenames, not the matching lines

"vi `grep fubar . -rl`" - open the matching files in vi

And to make that a simpler command to type, in your .bashrc add:

function vig { vi `grep $1 . -rl` }

and you can now just run "vig fubar" to get the same effect.

So without writing a single line of C or any other coding language, you have created a new application, indistinguishable from any C-coded, binary-compiled application.

And as observed before, if somebody else makes "grep" better, your "vig" command will improve as well!

And that's the power inherent in Unix that leads to it being alive and well and powering phones, computers, and servers all around the world several decades after it was created.

Seems such a simple and obvious idea, doesn't it? But as Terry Pratchett observed, "This man had invented the ball-bearing, such an obvious device that no one had thought of it." - the best ideas are the ones that, after being discovered, everyone believes were totally obvious.

And I hope this post has in some way contributed to your belief that Unix and the CLI is, after all, an obvious invention.