sed -e 's/foo/bar/g' myfile.txt

Anybody who has used regular expressions within a text editor or programming language will find sed easy to grasp. Awk takes a little more getting used to. A record-oriented tool, awk is the right tool to use when your data contains delimited fields that you want to manipulate.

presidents.txt

George Washington John Adams Thomas Jefferson James Madison James Monroe

To extract just the first names, we can use the following command:

$ awk '{ print $1 }' presidents.txt George John Thomas James James

Or, to just find those records with "James" as the first name:

$ awk '$1 ~ /James/ { print }' presidents.txt James Madison James Monroe

Awk can do a lot more, and features programming concepts such as variables, conditionals and loops. But just a basic grasp of how to match and extract fields will get you far.

Now over 30 years old, the UNIX command line utilities sed and awk are useful tools for cleaning up and manipulating data. In their Taxonomy of Data Science , Hilary Mason and Chris Wiggins note that when cleaning data, "Sed, awk, grep are enough for most small tasks, and using either Perl or Python should be good enough for the rest." A little aptitude with command line tools can go a long way.is a stream editor: it operates on data in a serial fashion as it reads it. You can think of sed as a way to batch up a bunch of search and replace operations that you might perform in a text editor. For instance, this command will replace all instances of "foo" with "bar" within a file:Consider this list of names, which we'll imagine lives in the file