A co-worker watched me type the other day and noticed that I use certain Unix commands for purposes other than they are intended. Yes, I abuse Unix commands.





1. grep dot (view the contents of files prefixed by their filename)

I want to view the contents of a few files but I want each line prepended with the the file's name. My solution?

$ grep . *.txt jack.txt:Once upon a time jack.txt:there was a fellow named Jack. lyingryan.txt:Now that "trickle down economics" has been lyingryan.txt:tested for 30 years and the data shows it lyingryan.txt:has been a total failure, candidates lyingryan.txt:still claim that cutting taxes for lyingryan.txt:billionaires will help the economy. market.txt:Jack went to market to sell the family market.txt:cow. market.txt:He came back with a handful of magic beans. $

grep is a search tool. Why am I using it like a weird version of cat ? Because cat doesn't have an option to prepend the filename to each line of text. And it shouldn't.

Note that " . " matches lines with at least 1 character. That is, blank lines are not included. If we change " . " (matches any 1 character) to " ^ " (matches the beginning of a line) then every line will be matched because every line, no matter how short or long, has a beginning! However the period key is easier to type than the caret, at least on my keyboard. Therefore if I don't need the blanks, I don't request them.

Example use: The other day I grabbed the /etc/network/interfaces file from 6 different Linux boxes. I needed to review them all. Each was copied to a filename that was the same as the hostname. " grep . * " let me view them all easily and each line was annotated where it came from.

2. "more star pipe cat" (cat files with a header between each one)

Let's look at another way to accomplish my example of comparing 6 files. In this case I want to print the contents of each file but separate the contents with the file name. Yes, I could do it in a loop:

$ for i in *.txt ; do echo === $i === ; cat $i ; done

However that takes a lot of typing.

This is where I abuse " more ". Are you familiar with more ? More prints the contents of files but pauses every screenful to ask "More?" Pressing SPACE shows one screenful more. Pressing RETURN shows one line more.

When more was new it was very dumb. It had no search functions, you could skip forward a file but not skip back, it assumed your screen size was 24 lines long. Heaven forbid you weren't on a hardware terminal fixed at 80x24; these new fangled graphic screens with windows that could be resized confused more . Resizing your window while using more made it even more confused. Another problem was that if you piped the output of more to another program things get totally confused because those prompts get sent down the pipe. Certainly the next program in the pipe doesn't expect to see a "More?" every 24 lines.

Luckily someone came along and created a replacement for more that fixed all of those problems. Obviously these features and bugfixes were added to more and we all benefited. No, that's not what happened. Obviously they wrote a new program from scratch and called it "more 2.0" so we could keep typing "more" but have all those new features. No, that's not what happened. In the grand tradition of Unix having a sense of humor this new program was called " less ". Thus begat funny conversations like, "Do you use more ?" "No, I couldn't live without less ." Or the joke: So a pager walks into a bar, and the bartender says "What are you, more or less?"

Some versions of Unix have the old traditional more and less commands. However in many Unix and Unix-like systems both are the same program but the code detects that it was run as more and goes into " more emulation mode". Either way, more gets you the old behavior with a few bugs fixed and less gets you all the cool new stuff.

If you have been using Linux for fewer than 5 years there is a good chance that you didn't know that more existed and quite possibly you were confused why less is called less . Now you know.

Which brings us back to our story. Sometimes people get so used to typing "more" that they type it when they mean "cat". For example they type:

more * | command | command2

when they mean:

cat * | command | command2

For example:

more * | grep -v bar | sort

Old more would send the prompts to grep which would pass them to sort which would get very confused. You'd have to press SPACE a number of times and, since you didn't see any output, you would usually bang on the keyboard in frustration. It's all a big mess.

less is smart enough to detect that its output is going to a pipe and would emulate " cat ". This is very smart.

Even smarter is that when less is emulating more instead of producing "the big mess" it acts like cat but outputs little headers for each file:

$ more * | cat :::::::::::::: jack.txt :::::::::::::: Once upon a time there was a fellow named Jack.

:::::::::::::: lyingryan.txt :::::::::::::: Now that "trickle down economics" has been tested for 30 years and the data shows it has been a total failure, candidates still claim that cutting taxes for billionaires will help the economy. :::::::::::::: market.txt ::::::::::::::: Jack went to the free-market to sell the family cow.

He came back with a handful of magic beans. $

Isn't that pretty?

That works on Linux but not on *BSD. However there's a solution that works on both. We simply take advantage of the fact that if "head" is given more than one file name it prints a little header in front of each file. However we want to see the entire file, not just the first 10 that head normally shows. No worries. We assume the files are shorter than 99999 lines long and do this:

$ head -n 99999 * ==> jack.txt <== Once upon a time there was a fellow named Jack.

==> lyingryan.txt <== Now that "trickle down economics" has been tested for 30 years and the data shows it has been a total failure, candidates still claim that cutting taxes for billionaires will help the economy.

==> market.txt <== Jack went to market to sell the family cow.

He came back with a handful of magic beans. $

Note: You can do " head -n 0 " on Linux to mean "all lines". However that doesn't work on FreeBSD and other Unixes. (Hey, BSD folks: can you fix that?) You can also use " tail +0 " but the header it draws is not as pretty.

3. "grep --color=always '^|foo|bar'

As you get older your eyesight gets worse. It becomes more difficult to find something in a field of text. Here's an eye test. Below is a list of recently run jobs on a Ganeti cluster.

$ gnt-job list 157486 success CLUSTER_VERIFY_CONFIG 157487 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) 157488 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) 157489 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) 157712 success OS_DIAGNOSE 157779 success CLUSTER_VERIFY 157780 success CLUSTER_VERIFY_CONFIG 157781 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) 157782 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) 157783 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) 157994 success OS_DIAGNOSE 158073 running CLUSTER_VERIFY 158074 success CLUSTER_VERIFY_CONFIG 158075 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) 158076 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) 158077 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) 158156 success OS_DIAGNOSE 158367 success CLUSTER_VERIFY 158368 waiting CLUSTER_VERIFY_CONFIG 158371 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) 158432 waiting OS_DIAGNOSE $

How quickly can you find which is the job that is running? It's kind of burried in there. (The answer is job #158073)

The most interesting jobs are the ones that are running and the ones that are waiting to run. It would be nice to have those highlighted. My first instinct was to simply use grep to remove the successful jobs:

$ gnt-job list | grep -v success 158073 running CLUSTER_VERIFY 158368 waiting CLUSTER_VERIFY_CONFIG 158432 waiting OS_DIAGNOSE $

However it is useful to see those jobs in context with all the other jobs. What I really want is to have the running and waiting jobs highlighted. Ah! " egrep --color=always " would color the things it finds, right? Ah, but egrep only shows what is found. We get:

$ gnt-job list | egrep --color=always 'running|waiting' 158073 running CLUSTER_VERIFY 158368 waiting CLUSTER_VERIFY_CONFIG 158432 waiting OS_DIAGNOSE $

So how can we output every line but also highlight certain words? Well " . " matches everything so we could use that, right? No, it matches every single character. We'd just get 100% red text (try it: egrep . file file2 ). What else does every line have? It has a beginning! We make a regular expression that matches lines with "a beginning" -or- lines with "running" -or- lines with "waiting". Every line will match and therefore be output. Since "the beginning of each line" has no length, nothing additional will be highlighted in red.

This regular expression matches any line that has a beginning or has the word "running" or has the word "waiting". The matched text will be colored red.

$ gnt-job list | egrep --color=always '^|running|waiting' 157486 success CLUSTER_VERIFY_CONFIG 157487 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) 157488 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) 157489 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) 157712 success OS_DIAGNOSE 157779 success CLUSTER_VERIFY 157780 success CLUSTER_VERIFY_CONFIG 157781 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) 157782 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) 157783 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) 157994 success OS_DIAGNOSE 158073 running CLUSTER_VERIFY 158074 success CLUSTER_VERIFY_CONFIG 158075 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) 158076 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) 158077 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) 158156 success OS_DIAGNOSE 158367 success CLUSTER_VERIFY 158368 waiting CLUSTER_VERIFY_CONFIG 158371 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) 158432 waiting OS_DIAGNOSE $

Now you can easily see which jobs are running and waiting and still get the full context.

(Note: for some reason this doesn't work on Mac OS and *BSD. However "$" matches the end of a line and works the same way.)

I set up an alias so I can use this command all the time:

alias j="gnt-job list | egrep --color=always '^|running|waiting'"

Note the careful use of ' within " .

If you would like more than just the words "running" and "waiting" highlighted slightly more complex regular expressions are required:

Highlight starting at the world, continuing to the end of the file:

egrep --color=always '^|running.*$|waiting.*$'

Highlight the entire darn line if it has either word in it:

egrep --color=always '^|^.* (running|waiting) .*$'

Of course, if you are typing these commands instead of using them in a script or alias, the least typing to highlight "foo" and "bar" is:

egrep '^|foo|bar'

Chances are "--color=auto" is the default and the right thing will happen. If not, add the "--color=always".

Note: A co-worker just pointed out that "" matches every line and doesn't result in all text being highlighted. He wins for reducing the regex's to be even smaller. Just remove the " ^ " at the front:

alias j="gnt-job list | egrep --color=always '|running|waiting'"

or

egrep --color=always '|^.* (running|waiting) .*$'

Note2: Someone pointed out that ack will do this with --passthru but ack isn't always on machines I use.

4. "fmt -1" (split lines into individual words)

If you are not familiar with the fmt command, that's probably because you use a modern text editor like vim or emacs which can do the formatting for you. In the old days we had to call an external command to do our formatting. Back then all Unix commands were small, single-function, tools that could be combined to do great things. Now every new Unix command seems to be trying to have more features than MS-Word. But I digress.

" fmt -n " takes text as input and reformats it into nicely shaped paragraphs with no line longer than n. That is, "fmt -65" formats text in nice paragraphs with no line longer than 65 characters.

But what if you have a word that is longer than 65 characters? Does it truncate it? No, then you get a line with just that word on it. (Ok, I lied about "no lines longer than n".)

So how can we abuse this program? Simple! Suppose we have a bunch of text and want to list out the individual words one per line. Well, words that are "too long" are printed on their own line and we want every word to be printed on its own line. Therefore why don't we tell "fmt" that all words are "too long" by saying we want the paragraphs to be formatted to be 1 character long!

$ fmt -1 <fraudulent.txt Fraud is telling a lie that benefits you and not the person or people you tell it to. $

Why would you want to do that? There are plenty of situations where this is useful!

Recently I found myself with a long lines of text that mixed usernames and numbers. I wanted to extract out the names. Sure, I could have figured something out with awk or put it into a text editor and copied out the names. Instead I did this:

$ fmt -1 <the_file.txt | egrep -v '^[0-9]' fred mary jane bob $

Recently I was curious which IP addresses are mentioned on my wiki:

$ cat *.wiki | fmt -1 | egrep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort -u 192.168.1.4 192.168.1.7 255.255.255.0# 255.255.255.192 255.255.255.240 8.3.8.1 <code>172.16.240.1 <code>172.16.240.2

Ok, that's not a perfect list but I was able to do that in a few seconds rather than an hour of writing code.

A simple improvement: Transform < and > and a lot of other punctuation into spaces, then delete spaces at the end.

$ cat *.wiki | tr "#:@;()<>=,'-\"" ' ' | fmt -1 | egrep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | tr -d ' ' | sort -u 172.16.240.1 172.16.240.2 192.168.1.4 192.168.1.7 255.255.255.0 255.255.255.192 255.255.255.240 8.3.8.1

That's a lot cleaner. 8.3.8.1 is a version number, not an IP address, but this is good enough for a first pass through the list.





I hope you enjoyed my little tour of Unix commands that are useful to mis-use. I'd be interested in hearing what commands you abuse. Please post to the comments!