The grep command is regarded as one of the most essential building blocks of command line automation. It is a search tool that can be used to perform basic text filtering and processing tasks on files and streams. Although it is deceptively simple, it can sometimes take the challenge out of finding the proverbial needle in a haystack. In this article, I'm going to show you several examples of how it can be used to perform real-world tasks.

The most common and basic scenario of grep usage is a text search. It can be used to find instances of a word or phrase in files and text streams. You can invoke it at the command line by typing the command name, the search query, and the target files in which to search. To find the word "needle" in the haystack.txt file, I use the following command:

$ grep needle haystack.txt

This will cause grep to display any line from haystack.txt that contains the text "needle". In this example, it's important to note that grep is matching the raw characters and not the word. It will, for example, also show lines that include "needless" and other words that contain "needle". You can instruct grep to search for the query text as a word by using the -w parameter. This will limit the output to lines in which the query text is surrounded on both sides by any combination of spaces, punctuation, or line breaks.

$ grep -w needle haystack.txt

Grep can also be used to find text in multiple files at once. If you specify a glob or multiple files as the target of the search, grep will look in all of them and the output will tell you where each search result was found.

$ grep -w needle haystacks/*.txt

The filename will appear at the beginning of each line of output, followed by a colon and the matched line. You can tell grep to hide the filenames by including the -h parameter. If you just want a list of the files and you don't want the matched text, just use the -l parameter.

Taking grep to the next level with regular expressions

The real power of the grep command can be unlocked by using regular expression (regex) syntax, a simple language for describing the generalized structure of text content. It can be used for finding text that is not always the same, but has a predictable pattern. An introduction to regex syntax is beyond the scope of this article, but you can find handy reference guides in many places on the Internet.

In the following example, I will search my IRC logs and extract the 10 most recent links:

$ grep -wo http://.* channel.log | tail

The -o parameter tells grep to only display the actual match. This means that the output will show just the URLs and not the entire line in which each one was found. At the end of the grep command, we pipe the contents into tail, a command that displays only the end of the text stream. By default, tail will show ten lines.

Next, I want to see how many things I have said in IRC. My nick is "segphault," but I also use "segphone" when I'm connected from my smartphone. I want to find all lines that are sent by either of those nicks and then count the total:

$ grep -c "^<segph(ault|one)>" channel.log

When grep receives the -c parameter, it will not display the matches or the matched lines. Instead, it will display a number which indicates how many lines were matched. The query is enclosed in quotation marks because it contains special characters that would be misinterpreted by the command line shell. The quotation marks are not part of the search. If you want to perform a query that uses quotation marks, you have to escape them or wrap it in single quotes.

In our last example, we will look for lines that contain shouting. To do this, we want to search the logs for any line that has a word that is entirely capitalized. To reduce the number of false positives from acronyms, we only want to match all-caps words that are five characters or more.

$ grep -w "[A-Z]+{5,}" channel.log