[Today’s random Sourcerer profile: https://sourcerer.io/code_monk]

sed and its better-known sibling, grep, are stalwarts of the UNIX command line, and have been around nearly as long as UNIX itself. If you’ve spent any time at all as a software engineer working with UNIX, you are certain to have used them. Their story is interesting, not least because it can’t be told without mentioning many acknowledged giants of computer science. It’s especially interesting when you interpret it in the context of all the other emerging parts of the nascent UNIX ecosystem that were also in motion at the time.

So first then, a little of that context — and I’d be wrong not to mention that I’ve sourced much of the described historical sequence of events from Michael and Rhonda Hauben’s Netizens Netbook.

Of the key milestones in the early years of UNIX, the invention of pipes — where the output of one program becomes the input for the next — was seminal. The ability to use the command line interface directly to articulate the ideas of functional composition delighted the early UNIX developers. Pipes were first implemented in UNIX in 1972/3 and their usage has hardly changed since then — for example, the output of the ls command is often piped to make it easier to read:

ls -al | less

Pipes were not only a significant innovation in their own right, but they made possible a new way of thinking about software and arguably enabled the emergence of the notion of a ‘software tool’. At Bell Laboratories’ Computing Techniques Research Department (the birthplace of UNIX) Doug McIlroy was working on what would now be called an early text-to-speech program. While trying to manipulate large dictionaries, he became frustrated with the limited capabilities of the system’s ed editor, and asked its author Ken Thompson if he could abstract the editor’s regular expression recognizer into an implementation that could be used on its own on the command line with piped input and output.

Ken duly delivered a program called grep — which, he explained, stood for global regular expression print. grep (like sed) is still in major usage today — it’s a command-line utility for searching text files for lines that match a provided regular expression, and it became especially powerful when it participated in chains of piped commands. Doug McIlroy commented at the time that it gave substance to “the unarticulated notion of software tools brought home by the liberation of grep from within the editor.” grep’s success cemented one of the later UNIX hackers’ tenets describing the use of software tools to create an effective programming environment — that each program should do one thing well.

It wasn’t long before developers wanted to extend the power of grep to not only search text files for lines that match a provided regular expression, but to substitute occurrences of strings that match said regular expression, or delete such occurrences, and so on. It might have led to global regular expression substitute, or global regular expression delete, were it not for another Bell Labs researcher, Lee E. McMahon, who turned this collection of related requirements into ‘a tool of remarkable utility, that is largely underappreciated today, because it capitalizes on the perfect familiarity with ed that was universal … but no more’. Thus sed (short for stream editor) was born.

sed is a stream-oriented editor where input and output are typically to and from files or pipelines. It works by interpreting a sequence of commands that specify the editing actions to be performed — much like writing simple shell scripts. A key advantage of sed is that you can specify the editing instructions and execute them in a single pass through the target file — leading to the ability to edit very large files that might perhaps defeat a conventional interactive editor.

The official manual explains that sed is typically invoked like this:

sed SCRIPT INPUTFILE…

For example, a script to add a date to filenames such that foo.txt becomes foo-20180104.txt:

for f in *.txt; do mv “$f” “$(echo $f | sed s/\\.txt$/-$(date +%Y%m%d).txt/)”; done;

An introduction and tutorial by Bruce Barnett offers a more gradual introduction.

Besides substitution with the s command (the most important sed command), in total some 25 sed commands are available and it is powerful enough for a large number of purposes, eventually yielding to languages like awk or perl. sed is still useful today — particularly for its substitution capabilities — and finds application wherever large files need to be edited using minimal resources. Serendipitously, these are just the kind of conditions found when wrangling large datasets prior to doing data science on them.

sed was invented in the early days of UNIX by the very people who invented UNIX itself. That same Bell Labs team reads like a hall of fame for world-renowned computer scientists — Ken Thompson, Dennis Ritchie, Doug McIlroy, Lee McMahon, Brian Kernighan, Steve Bourne to call out a few. Lee McMahon, sed’s author, is also renowned outside the computer science world for devising the McMahon system for organizing Go competitions. If, like me, you thought these folks to be long buried in the annals of computer science — think again. That very same Ken Thompson who designed and implemented the original UNIX operating system turned up at Google 35 years later to co-invent the Go programming language! And by the way, somehow, Ken also found time to co-create the UTF-8 character encoding.

That some of the giants of our industry are still alive and contributing is a sobering reminder of just how young the industry is.