Tutorials

Useful information mostly written by me, the conspicuous exception being the bash manpage ...

Contents

Why grep ?

grep is not only one of the most useful commands, but also, mastery of grep opens the gates to mastery of other tools such as awk , sed and perl .

So what does it do ?

grep basically searches. More precisely,

grep foo file returns all the lines that contain a string matching the expression "foo" in the file "file".

Another way of using grep is to have it accept data through STDIN . instead of having it search a file. For example,

ls |grep blah lists all files in the current directory whose names contain the string "blah"

Compatibility Notes

This tutorial is based on the GNU version of grep. It is recommended that you use this version. To use it, firstly, it needs to be installed on your system. Secondly, your PATH needs to be set so that GNU grep is used in preference to the standard version.

Wildcards For Grep

The Basics: Wildcards for grep

The Wildcard Character

So the first question that probably comes to mind is something like "does this grep thing support wildcards ? And the answer is better than yes. In fact saying that grep supports wildcards is a big understatement. grep uses regular expressions which go a few steps beyond wildcards. But we will start with wildcards. The canonical wildcard character is the dot "." Here is an example : >cat file big bad bug bag bigger boogy >grep b.g file big bad bug bag bigger notice that boogy didn't match, since the "." matches exactly one character.

The repetition character

To match repetitions of a character, we use the star, which works in the following way:

the expression consisting of a character followed by a star matches any number (possibly zero) of repetitions of that character. In particular, the expression ".*" matches any string, and hence acts as a "wildcard".

Examples: Wildcards

The File for These Examples >cat file big bad bug bag bigger boogy Wildcards #1 >grep "b.*g" file big bad bug bag bigger boogy Wildcards #2 >grep "b.*g." file bigger boogy repetition >grep "ggg*" file bigger

Taking it Further - Regular Expressions

Frederic Smith

Fred Smith

eric

First, we introduce the concept of an "escaped" character.

An escaped character is a character preceded by a backslash. The preceding backslash does one of the following:

(a) removes an implied special meaning from a character (b) adds special meaning to a "non-special" character

Examples

hello.gif

grep 'hello\.gif' file

grep 'hello.gif' file

hello-gif , hello1gif , helloagif

Now we move on to grouping expressions, in order to find a way of making an expression to match Fred or Frederic

an expression consisting of a character followed by an escaped question mark matches one or zero instances of that character.

Example

bugg\?y

bugy , buggy

bugggy

An expression surrounded by "escaped" parentheses is treated by a single character.

Examples

Fred\(eric\)\? Smith

Fred Smith

Frederic Smith

\(abc\)*

abc

abcabcabc

abc

grep "Fred\(eric\)\? Smith" file

More on Regular Expressions

Matching a list of characters

Example

[Hh]ello matches lines containing hello or Hello

Example

[0-3] is the same as [0123]

[a-k] is the same as [abcdefghijk]

[A-C] is the same as [ABC]

[A-Ca-k] is the same as

[ABCabcdefghijk]



[[:alpha:]] is the same as [a-zA-Z]

[[:upper:]] is the same as [A-Z]

[[:lower:]] is the same as [a-z]

[[:digit:]] is the same as [0-9]

[[:alnum:]] is the same as [0-9a-zA-Z]

[[:space:]] matches any white space including tabs



These alternate forms such as [[:digit:]] are preferable to the direct method [0-9]

Example

grep "([^()]*)a" file

(hello)a (aksjdhaksj d ka)a

x=(y+2(x+1))a

Matching a Specific Number Of Repetitions of a Pattern

grep "[[:digit:]]\{3\}[ -]\?[[:digit:]]\{4\}" file

Nailing it Down to Start of the Line and End of the Line

>cat file hello hello world hhello >grep hello file hello hello world hhello

The $ character matches the end of the line. The ^ character matches the beginning of the line.

Examples

grep "^[[:space:]]*hello[[:space:]]*$" file

grep "^From.*mscharmi" /var/spool/mail/elflord

This or That: matching one of two strings

The expression consisting of two expressions seperated by the or operator \| matches lines containing either of those two expressions.

Example

grep "cat\|dog" file

grep "I am a \(cat\|dog\)"

Backpedalling and Backreferences

<H1>some string</H1>

H2 H3 H4 H5 H6

H1

<H[1-6]>.*</H[1-6]>

<H1>Hello world</H3>

The expression

where n is a number, matches the contents of the n'th set of parentheses in the expression

Examples

<H\([1-6]\).*</H\1>

"Mr \(dog\|cat\) came home to Mrs \1 and they went to visit Mr \(dog\|cat\) and Mrs \2 to discuss the meaning of life

Some Crucial Details: Special Characters and Quotes

Special Characters

? \ . [ ] ^ $

A closing square bracket loses its special meaning if placed first in a list. for example []12] matches ] , 1, or 2.

matches ] , 1, or 2. A dash - loses it's usual meaning inside lists if it is placed last.

A carat ^ loses it's special meaning if it is not placed first

Most special characters lose their meaning inside square brackets

Quotes

grep "!" file

grep '!' file

When should you use single quotes ? the answer is this: if you want to use shell variables, you need double quotes. For example,

grep "$HOME" file

grep '$HOME' file

Extended Regular Expression Syntax Back to top

We now discuss egrep syntax as opposed to grep syntax. Ironically, despite the origin of the name (extended), egrep actually has less functionality as it is designed for compatibility with the traditional egrep. A better way to do an extended "grep" is to use grep -E which uses extended regular expression syntax without loss of functionality.