Today, I realised that my customer is running sed for more than an hour and the strange thing is that the input file is no more than a few MB. Also the pattern in sed is pretty straightforward doing global substitution. BTW, it is running on Solaris 10

Is this the natural of the problem that takes sed to run that long or sed is inefficient in certain circumstances?

In this exercise, I created a file with 2000 lines. The first line has 12 characters and all subsequent lines are having an increment of 12 characters with the last line of 24000 characters.

sed 's/\\\\/@/g;s/\\/@/g' took 35+ minutes on my Sun Fire V440. That's really inefficient. Okay, sed is definitely not the right tool for his job. Let's take a look at the other alternative.

Perl has this " -p " flag that allow your in-line code to be wrap around a

while (<>) { ... # your script } loop so that you can write a one-liner. Guess what, Perl took only 5 seconds to finish that substitution. Hey, that's a lot of CPU cycles saved!

Here is the code and the run time info:

$ cat run.sh #! /bin/bash comma() { perl -e 'print "c:\\\\a\\\\b\\\\c,"x'${1:-1} echo "" } n=1 while [ $n -le $1 ] do comma $n ((++n)) done $ ./run.sh 2000 > run2000.txt $ wc run2000.txt 2000 2000 24014000 run2000.txt $ time sed 's/\\\\/@/g;s/\\/@/g' run2000.txt > run1.txt real 35m6.692s user 35m5.559s sys 0m0.430s $ time perl -pe 's/\\\\/@/g;s/\\/@/g' run2000.txt > run2.txt real 0m4.948s user 0m4.491s sys 0m0.145s $ digest -a md5 run1.txt run2.txt (run1.txt) = 8820c914e0e038cec9da6f0883b6d964 (run2.txt) = 8820c914e0e038cec9da6f0883b6d964 $ uname -a SunOS chihung 5.10 Generic_118822-11 sun4u sparc SUNW,Sun-Fire-V440 $ psrinfo -v Status of virtual processor 0 as of: 02/25/2009 00:14:28 on-line since 12/13/2008 00:37:43. The sparcv9 processor operates at 1281 MHz, and has a sparcv9 floating point processor. Status of virtual processor 1 as of: 02/25/2009 00:14:28 on-line since 12/13/2008 00:37:43. The sparcv9 processor operates at 1281 MHz, and has a sparcv9 floating point processor. Status of virtual processor 2 as of: 02/25/2009 00:14:28 on-line since 12/13/2008 00:37:43. The sparcv9 processor operates at 1281 MHz, and has a sparcv9 floating point processor. Status of virtual processor 3 as of: 02/25/2009 00:14:28 on-line since 12/13/2008 00:37:41. The sparcv9 processor operates at 1281 MHz, and has a sparcv9 floating point processor.

Labels: performance, Solaris, unix