After covering sed in details , its also good to know awk (gawk) – a programmable stream editor

Awk helps with manipulating of structured data and generating reports. awk is actually a programming language with syntax similar to C. awk Uses three ‘blocks’ of instructions: BEGIN, main loop and END and it uses similar principle of line addressing to sed

awk features

The ability to look upon a text file as a series of records

Variables

Arithmetic (floating point too) and string operators

Loops and conditions

Generate formatted reports

Define functions

Execute UNIX commands directly from scripts

Process the output of UNIX commands directly

Process command line arguments

Work with multiple input streams

Programming model

Three ‘blocks’ of instructions are used in awk:

BEGIN, executed before the first input line is read

The main loop executed for each line of input

END executed after the last input line has been read

The BEGIN and END procedures are optional

Each input line is treated as a record, referred to as $0 and each word (delimited by spaces or tabs) is treated as a field. Fields are referenced by using a “$” ($1 – first field, $2 – second, and so on).

Simple Example:

# cat emp avi 1200 haifa dani 2300 tel aviv rina 3100 aco # awk '{ print $1,"-",$3 }' emp avi - haifa dani - tel rina - aco 1 2 3 4 5 6 7 8 9 # cat emp avi 1200 haifa dani 2300 tel aviv rina 3100 aco # awk '{ print $1,"-",$3 }' emp avi - haifa dani - tel rina - aco

Simple print the file:

awk ' { print } ' filename 1 awk ' { print } ' filename

use BEGIN and END:

# awk 'BEGIN { print "Customers List:

==="} { print } END { print "====

num:" NR }' emp Customers List: === avi 1200 haifa dani 2300 tel aviv rina 3100 aco ==== num:3 1 2 3 4 5 6 7 8 9 # awk 'BEGIN { print "Customers List:

==="} { print } END { print "====

num:" NR }' emp Customers List : === avi 1200 haifa dani 2300 tel aviv rina 3100 aco === = num : 3

Writing a script:

#!/usr/bin/awk -f { print $1,"-",$3 } 1 2 #!/usr/bin/awk -f { print $ 1 , "-" , $ 3 }

run it:

# simp emp avi - haifa dani - tel rina - aco 1 2 3 4 # simp emp avi - haifa dani - tel rina - aco

Script with blocks

#! /usr/bin/awk -f BEGIN { print "Customers List:" print "===============" } { print NR , "-" ,$0 } END { print "=========" print "num:" NR } 1 2 3 4 5 6 7 8 9 10 11 12 13 #! /usr/bin/awk -f BEGIN { print "Customers List:" print "===============" } { print NR , "-" , $ 0 } END { print "=========" print "num:" NR }

Run it:

# ./simple emp Customers List: =============== 1 - avi 1200 haifa 2 - dani 2300 tel aviv 3 - rina 3100 aco ========= num:3 1 2 3 4 5 6 7 8 # ./simple emp Customers List : === === === === === 1 - avi 1200 haifa 2 - dani 2300 tel aviv 3 - rina 3100 aco === === === num : 3

Note the line number using NR.

Line Addressing

Commands can be restricted to lines with patterns

#!/usr/bin/awk -f BEGIN { print "Header" } /[0-9]+/ { print "Found Number" } /[A-Za-z]+/ { print "Found Word" } /^$/ { print "Found Blank line" } END { print "Footer" } 1 2 3 4 5 6 7 8 #!/usr/bin/awk -f BEGIN { print "Header" } / [ 0 - 9 ] + / { print "Found Number" } / [ A - Za - z ] + / { print "Found Word" } / ^ $ / { print "Found Blank line" } END { print "Footer" }

Use it:

# cat ./uselinead hello 100 233 bye hi 20 # ./linead uselinead Header Found Number Found Word Found Number Found Word Found Blank line Found Word Found Blank line Found Number Footer 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 # cat ./uselinead hello 100 233 bye hi 20 # ./linead uselinead Header Found Number Found Word Found Number Found Word Found Blank line Found Word Found Blank line Found Number Footer

Predefined awk values

FS – Field separator – default spaces and tabs

OFS – Output field separator – default space

RS – Record separator – default newline

ORS – Output record separator – default newline

OFMT – Output format – default “%.6g”

Constants:

NF – Number of Fields, ie number of words on a particular line

NR – Number of Records, ie number of lines read thus far

FILENAME – The name of the current file being processed

FNR – Current line number in the current file (nawk only)

Example:

useemp

#!/usr/bin/awk -f BEGIN{ FS=","} {print $3,$2} 1 2 #!/usr/bin/awk -f BEGIN { FS = "," } { print $ 3 , $ 2 }

useemp2

#!/usr/bin/awk -f BEGIN{ FS=","; OFS="*"} {print $3,$2} 1 2 #!/usr/bin/awk -f BEGIN { FS = "," ; OFS = "*" } { print $ 3 , $ 2 }

run it

# cat ./emp2 avi,1200,haifa dani,2300,tel aviv rina,3100,aco # ./useemp emp2 haifa 1200 tel aviv 2300 aco 3100 # ./useemp2 emp2 haifa*1200 tel aviv*2300 aco*3100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # cat ./emp2 avi , 1200 , haifa dani , 2300 , tel aviv rina , 3100 , aco # ./useemp emp2 haifa 1200 tel aviv 2300 aco 3100 # ./useemp2 emp2 haifa * 1200 tel aviv * 2300 aco * 3100

Variables

Variables are not declared, just given names and values. Un-initialised variables are set to zero. The type is based on the assignment (string, number)

example – calculate the sum of files size:

#!/usr/bin/awk -f { print; numfiles=numfiles + 1; numbytes=numbytes + $5 } END { print numfiles, "files,", numbytes, "bytes" } 1 2 3 #!/usr/bin/awk -f { print ; numfiles = numfiles + 1 ; numbytes = numbytes + $ 5 } END { print numfiles , "files," , numbytes , "bytes" }

Run it using pipe:

ls -l | ./calcsize total 56 -rw-rw-r-- 1 developer developer 187 אוג 16 2017 avg -rwxrwxr-x 1 developer developer 310 אוג 16 2017 avg.awk -rwxrwxr-x 1 developer developer 117 אוג 16 2017 calcsize -rwxrwxr-x 1 developer developer 382 אוג 14 2017 checkops -rw-rw-r-- 1 developer developer 48 אוג 16 2017 emp -rw-rw-r-- 1 developer developer 48 פבר 16 09:39 emp2 -rwxrwxr-x 1 developer developer 198 פבר 16 09:24 linead -rw-rw-r-- 1 developer developer 254 אוג 14 2017 oplist -rwxrwxr-x 1 developer developer 154 פבר 16 09:16 simple -rwxrwxr-x 1 developer developer 148 אוג 16 2017 simple2 -rwxrwxr-x 1 developer developer 49 פבר 16 09:37 useemp -rwxrwxr-x 1 developer developer 41 פבר 16 08:47 useemp1 -rwxrwxr-x 1 developer developer 57 פבר 16 09:37 useemp2 -rw-rw-r-- 1 developer developer 26 פבר 16 09:25 uselinead 15 files, 2019 bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ls - l | . / calcsize total 56 - rw - rw - r -- 1 developer developer 187 אוג 16 2017 avg - rwxrwxr - x 1 developer developer 310 אוג 16 2017 avg . awk - rwxrwxr - x 1 developer developer 117 אוג 16 2017 calcsize - rwxrwxr - x 1 developer developer 382 אוג 14 2017 checkops - rw - rw - r -- 1 developer developer 48 אוג 16 2017 emp - rw - rw - r -- 1 developer developer 48 פבר 16 09 : 39 emp2 - rwxrwxr - x 1 developer developer 198 פבר 16 09 : 24 linead - rw - rw - r -- 1 developer developer 254 אוג 14 2017 oplist - rwxrwxr - x 1 developer developer 154 פבר 16 09 : 16 simple - rwxrwxr - x 1 developer developer 148 אוג 16 2017 simple2 - rwxrwxr - x 1 developer developer 49 פבר 16 09 : 37 useemp - rwxrwxr - x 1 developer developer 41 פבר 16 08 : 47 useemp1 - rwxrwxr - x 1 developer developer 57 פבר 16 09 : 37 useemp2 - rw - rw - r -- 1 developer developer 26 פבר 16 09 : 25 uselinead 15 files , 2019 bytes

Another example

given the following file:

Name CM Ph Cmp Math avi levy 68 72 91 73 eli cohen 31 59 73 87 bibi netanyahu 83 80 89 61 donald tramp 53 72 78 93 Julia roberts 69 68 79 89 1 2 3 4 5 6 Name CM Ph Cmp Math avi levy 68 72 91 73 eli cohen 31 59 73 87 bibi netanyahu 83 80 89 61 donald tramp 53 72 78 93 Julia roberts 69 68 79 89

and the awk script:

#!/usr/bin/awk -f BEGIN { print "grades report" print "=============" } NR == 1 { next } { lines++; fullname = $1 " " $2 print fullname, ($3 + $4 + $5 + $6) / 4 sum1 += $3; sum2 += $4; sum3 += $5; sum4 += $6 } END { print "" print "Totals" print "======" print sum1/lines, sum2/lines, sum3/lines, sum4/lines } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 #!/usr/bin/awk -f BEGIN { print "grades report" print "=============" } NR == 1 { next } { lines ++ ; fullname = $ 1 " " $ 2 print fullname , ( $ 3 + $ 4 + $ 5 + $ 6 ) / 4 sum1 += $ 3 ; sum2 += $ 4 ; sum3 += $ 5 ; sum4 += $ 6 } END { print "" print "Totals" print "======" print sum1 / lines , sum2 / lines , sum3 / lines , sum4 / lines }

run it:

# ./avg.awk ./avg grades report ============= avi levy 76 eli cohen 62.5 bibi netanyahu 78.25 donald tramp 74 Julia roberts 76.25 Totals ====== 60.8 70.2 82 80.6 1 2 3 4 5 6 7 8 9 10 11 12 # ./avg.awk ./avg grades report === === === === = avi levy 76 eli cohen 62.5 bibi netanyahu 78.25 donald tramp 74 Julia roberts 76.25 Totals === === 60.8 70.2 82 80.6

Conditions and Loops

The syntax is similar to C

Given the following input file:

# Year : Month : Day : Customer : D / W : Amount 2015:11:9:Joe:W:5.00 2015:11:12:Mary:W:5.50 2015:12:10:Joe:W:10.00 2015:12:15:Mary:W:10.00 2016:1:2:Hank:W:35.00 2016:1:31:David:D:100.00 1 2 3 4 5 6 7 # Year : Month : Day : Customer : D / W : Amount 2015 : 11 : 9 : Joe : W : 5.00 2015 : 11 : 12 : Mary : W : 5.50 2015 : 12 : 10 : Joe : W : 10.00 2015 : 12 : 15 : Mary : W : 10.00 2016 : 1 : 2 : Hank : W : 35.00 2016 : 1 : 31 : David : D : 100.00

Using loops and conditions:

#! /usr/bin/awk -f # Year : Month : Day : Recipient : D / W : Amount BEGIN { FS = ":" } # skip lines started with # /^[#]/ { next } # simple conditions $5 == "W" { withdrawals[$4] += $6 } $5 == "D" { deposits[$4] += $6 } END { print "Deposit totals:" for (i in deposits) printf("\t%s: $%g

", i, deposits[i]) print "" print "Withdrawal totals:" for (i in withdrawals) if(withdrawals[i] > 15) printf("\t%s: $%g

", i, withdrawals[i]) } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 #! /usr/bin/awk -f # Year : Month : Day : Recipient : D / W : Amount BEGIN { FS = ":" } # skip lines started with # / ^ [ #]/ { next } # simple conditions $ 5 == "W" { withdrawals [ $ 4 ] += $ 6 } $ 5 == "D" { deposits [ $ 4 ] += $ 6 } END { print "Deposit totals:" for ( i in deposits ) printf ( "\t%s: $%g

" , i , deposits [ i ] ) print "" print "Withdrawal totals:" for ( i in withdrawals ) if ( withdrawals [ i ] > 15 ) printf ( "\t%s: $%g

" , i , withdrawals [ i ] ) }

Run it: