Awk is a powerful text-parsing tool for Unix and Unix-like systems, but because it has programmed functions that you can use to perform common parsing tasks, it's also considered a programming language. You probably won't be developing your next GUI application with awk, and it likely won't take the place of your default scripting language, but it's a powerful utility for specific tasks.

What those tasks may be is surprisingly diverse. The best way to discover which of your problems might be best solved by awk is to learn awk; you'll be surprised at how awk can help you get more done but with a lot less effort.

Awk's basic syntax is:

awk [ options ] 'pattern {action}' file

To get started, create this sample file and save it as colours.txt

name color amount

apple red 4

banana yellow 6

strawberry red 3

grape purple 10

apple green 8

plum purple 2

kiwi brown 4

potato brown 9

pineapple yellow 5

This data is separated into columns by one or more spaces. It's common for data that you are analyzing to be organized in some way. It may not always be columns separated by whitespace, or even a comma or semicolon, but especially in log files or data dumps, there's generally a predictable pattern. You can use patterns of data to help awk extract and process the data that you want to focus on.

Printing a column

In awk, the print function displays whatever you specify. There are many predefined variables you can use, but some of the most common are integers designating columns in a text file. Try it out:

$ awk '{print $2;}' colours.txt

color

red

yellow

red

purple

green

purple

brown

brown

yellow

In this case, awk displays the second column, denoted by $2. This is relatively intuitive, so you can probably guess that print $1 displays the first column, and print $3 displays the third, and so on.

To display all columns, use $0.

The number after the dollar sign ($) is an expression, so $2 and $(1+1) mean the same thing.

Conditionally selecting columns

The example file you're using is very structured. It has a row that serves as a header, and the columns relate directly to one another. By defining conditional requirements, you can qualify what you want awk to return when looking at this data. For instance, to view items in column 2 that match "yellow" and print the contents of column 1:

awk '$2=="yellow"{print $1}' colours.txt

banana

pineapple

Regular expressions work as well. This conditional looks at $2 for approximate matches to the letter p followed by any number of (one or more) characters, which are in turn followed by the letter p:

$ awk '$2 ~ /p.+p/ {print $0}' colours.txt

grape purple 10

plum purple 2

Numbers are interpreted naturally by awk. For instance, to print any row with a third column containing an integer greater than 5:

awk '$3>5 {print $1, $2}' colours.txt

name color

banana yellow

grape purple

apple green

potato brown

Field separator

By default, awk uses whitespace as the field separator. Not all text files use whitespace to define fields, though. For example, create a file called colours.csv with this content:

name,color,amount

apple,red,4

banana,yellow,6

strawberry,red,3

grape,purple,10

apple,green,8

plum,purple,2

kiwi,brown,4

potato,brown,9

pineapple,yellow,5

Awk can treat the data in exactly the same way, as long as you specify which character it should use as the field separator in your command. Use the --field-separator (or just -F for short) option to define the delimiter:

$ awk -F "," '$2=="yellow" {print $1}' file1.csv

banana

pineapple

Saving output

Using output redirection, you can write your results to a file. For example:

$ awk -F, '$3>5 {print $1, $2} colours.csv > output.txt

This creates a file with the contents of your awk query.

$ awk '{print > $2".txt"}' colours.txt

You can also split a file into multiple files grouped by column data. For example, if you want to split colours.txt into multiple files according to what color appears in each row, you can cause awk to redirect per query by including the redirection in your awk statement:

This produces files named yellow.txt, red.txt, and so on.

In the next article, you'll learn more about fields, records, and some powerful awk variables.

This article is adapted from an episode of Hacker Public Radio, a community technology podcast.