For a full list of BASHing data blog posts, see the index page.

Fancy numbering of records

In A Data Cleaner's Cookbook I demonstrate a few of the ways you can give records a serial number using the nl command — for example:

With nl alone you can't add a string before a number. To do that, or to embed the serial record numbering within a string, you can use seq with its -f option ("printf" formatting) and paste the result to the list of records:

paste <(seq -f "Record_%03g" 3) nl_file



paste <(seq -f "Record_%03g_of_3" 3) nl_file

Rather than serial record numbers, you can add a field with unique "random" numbers, like a UUID. A simple way to do this is with a while loop:

while read line; do echo -e "$(uuidgen)\t$line"; done < nl_file



while read; do echo -e "$(uuidgen)\t$REPLY"; done < nl_file

The loop trick can also be used to put the UUID at the end of the record:

while read line; do echo -e "$line\t$(uuidgen)"; done < nl_file

or in the middle, with a bit of cutting and paste-ing:

while read line; do paste <(cut -f1 <<<"$line") <(echo "$(uuidgen)") <(cut -f2 <<<"$line"); done < nl_file

I find UUIDs are overkill for unique record numbering. For a random string of 10 alphanumeric characters I would use

head -n3 /dev/urandom | tr -cd '[:alnum:]' | cut -c -10

like this:

while read line; do echo -e "$(head -n3 /dev/urandom | tr -cd '[:alnum:]' | cut -c -10)\t$line"; done < nl_file

What if your data table has a header line and you don't want the header numbered? See the "Ignore a header" section of the header page in A Data Cleaner's Cookbook.

Interesting numbering problems arise with irregular multi-line records, as in the file "records" shown below. The records are irregular in this case because they might occupy 2, 3 or 4 lines, although they are always separated with "---":

How to insert a serial number before each record? In this case I've added a 4-digit number with AWK:

awk 'NR==1 {$0="0001" RS $0} /^---/ {$0=$0 RS sprintf("%04d",(++c)+1)} 1' records

Last update: 2019-02-17

The blog posts on this website are licensed under a

Creative Commons Attribution-NonCommercial 4.0 International License