LCA: Lessons from 30 years of Sendmail

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

The Sendmail mail transfer agent tends to be one of those programs that one either loves or hates. Both its supporters and its detractors will agree, though, that Sendmail played a crucial role in the development of electronic mail before, during, and after the explosion of the Internet. Sendmail creator Eric Allman took a trip to Brisbane to talk to the LCA 2011 about the history of this project. Sendmail is, he said, 30 years old now; in those three decades it has thrived without corporate support, changed the world, and thrived in a world which was changing rapidly around it.

The history

Sendmail had its start at the University of California, Berkeley, in 1980; it was initially something Eric did while he was supposed to be working on the Ingres relational database management system. In those days, the Computer Science department had a dozen machines, but the main system was "Ernie CoVAX," which was accessed via ASCII terminals. There was a limited number of ports, so users had to connect via a patch panel in the mail room; contention for available ports was often intense.

Things got more interesting when the Ingres project got an ARPAnet connection; a single PDP11 machine, with two ports, was the only way to access the net at that time. There was no way the entire department was going to share those two ports without somebody getting hurt, so another solution was required. Eric looked at the problem, concluded that what everybody really wanted was the ability to send mail through the gateway machine, and decided that he would make a way to access email from other machines on campus. From this beginning delivermail was born.

There was a set of design principles that Eric adopted at that time. There was only one of him, so programming time was a truly finite resource. Redesigning user agents and mail stores was out of the question. Delivermail had to adapt to the world around it, not the other way around. The resulting program worked, but was not without its problems. The compiled-in configuration lacked flexibility, there was no address translation as messages moved between networks, and the parsing was simple and opaque. But it succeeded in moving mail around and giving the entire department access to the net.

Then the department got the BSD contract. Bill Joy needed a mail transfer agent to connect to the network, so he talked Eric into taking on the job. After all, how hard could it be? Among other things, the new MTA needed to support the SMTP mail protocol - which wasn't specified yet. Supporting SMTP also forced the addition of a mail queue, a job which turned out to be much harder than it looked. Eric hacked away, and Sendmail was shipped with 4.1BSD in 1982 with support for SMTP, header rewriting, queueing, and runtime configuration.

After that, Eric left Berkeley for a "lucrative" (heavy on the quotes) career in industry. Sendmail, meanwhile, was picked up by the Unix vendors. The Unix wars were in full force at that time; the inevitable result was a proliferation of different versions of Sendmail. The program became balkanized and incompatible across systems.

Eric returned to Berkeley in 1989 and started hacking on Sendmail again; the immediate need was support for the ".cs" subdomain at the university. That work snowballed into a major rewrite culminating in Sendmail 8; this version integrated a great deal of code from both the industry and the community. It added support for ESMTP, a number of new protocols, delivery status notifications, LDAP integration, eight-bit mail, and a new configuration package. Uptake increased after the Sendmail 8 release as a result of these features, but also as the result of the publication of the O'Reilly "bat" book. Documentation, it turns out, really matters.

Sendmail Inc. was created in 1998 with the fantasy that it would let Eric get back to coding. In reality, starting a company is more about marketing, sales, and money than about technology - a lesson many of us have learned. It was one of the first companies trying to mix open source and proprietary offerings; in those days, the prevailing wisdom is that a company needed proprietary lock-in to have any chance of success. Over time, though, functionality migrated to the free version; thus Sendmail gained support for encryption, authentication, milters (mail filters), virtual hosting, spam filtering, and more. And that's where things stand today.

Lessons learned

As one might expect, 30 years of experience have led to a number of lessons worth passing on. Eric shared a few of them.

One is that requirements change all the time. The original delivermail program had reliability as its primary focus - few things are more hazardous to one's academic career than losing a professor's grant proposal. Over time, the requirements shifted toward functionality and performance; Sendmail had to scale up in speed and features as the Internet took off. Then users were demanding protection from spam and malware; that shifted Sendmail development toward keeping mail out. We have, Eric noted, gone full circle toward unreliable mail service. After that came requirements around legal and regulatory compliance - that is where a great deal of Sendmail Inc.'s business lies. There is currently an increasing focus on controlling costs, mobility, and social network integration. Without the ability to adapt to meet these shifting requirements, Sendmail would not have thrived through all these years.

With regard to Sendmail's design decisions, Eric said that some turned out to be right, some were wrong, and some were right at the time but are wrong now. One criticism that has been made is that Sendmail is an overly general solution; it can route and rewrite messages in ways which are generally unneeded in these days of Internet monoculture. Eric defended that generality by saying that the world was in great flux when Sendmail was designed; there was no way to really know how things were going to turn out. And, he said, he would do it again: "the world is still ugly."

Rewriting rules for addresses are a part of that generality; even at the time, it seemed like overkill, but he couldn't come up with anything better. It was, he said, probably the right thing to do. That said, the decision to use tabs as active characters was the stupidest thing he has ever done. That's how makefiles did it, and it seemed cool at the time. As a whole, he said, the concept was right, but the syntax and flow control could have been a lot better. Even so, he's glad he did matching based on tokens; basing Sendmail configuration around regular expressions would have been far worse.

If he were doing the configuration system now, it would look a lot more like the Apache scheme.

The message munging feature was needed for the rewriting of headers; it facilitated interoperability between different networks. It is still used a lot, he said, though it's arguably not necessary. Sendmail could benefit from a pass-through mode which shorts out the message munging, but that leaves open the question of what should be done with non-compliant messages. Should they be fixed, rejected, or just dropped? There is, he said, no obvious answer.

The embedding of SMTP and queueing in the mail daemon was the right thing to do; he does not agree with the Postfix approach of proliferating lots of small daemons. The queue structure itself involves two files for every message: one with the envelope, and one with the body. That forces the system to scan large numbers of small files on a busy system, which is not always optimal. At the time it was the right way to go; now he would probably use some sort of database for the envelopes. The decision to use plain text for all internal files was right, though; it makes debugging much easier.

With regard to the use of the m4 macro preprocessor for configuration, Eric admitted that the syntax is painful. But he needed a macro facility and didn't want to reinvent the wheel. The "damned dnl lines" for comments were a mistake, though, and completely unnecessary. In summary, some sort of tool was needed; m4 might not have been the best choice, but it's not clear what would have been.

With regard to extending or changing features: Sendmail has tended toward extending features and maintaining compatibility, and that has not always been the right thing to do. The hostname masquerading facility was one example; that feature was simply done wrong the first time around. Rather than fixing it, though, Eric papered over the problems with new features. It would have been better to inflict some short-term pain on users, perhaps aided by a migration tool, and be done with it. The unwillingness to replace mistaken features has a lot to do with why Sendmail is difficult to configure.

Sendmail goes out of its way to accept and fix bogus input; that was in compliance with the robustness principle ("be conservative in what you send but liberal in what you accept") that was widely accepted at the time. It increases interoperability, but at the cost of allowing broken software to persist indefinitely, leading to large costs down the road. Nonetheless, it was the right idea at the time for the simple reason that everything was broken then. But he should have tightened things up later on.

What would he have done differently? At the top of the list is trying to fix problems as soon as possible. These include tabs in the configuration file and the V7 mailbox format. He's really tired of seeing " >From " in messages; he said he could have fixed it and expressed his apologies for not having taken the opportunity. He would make more use of modern tools; Sendmail has its own build script, which is not something he would do today. He would use more privilege separation, though he would not go as far as Postfix. He would have made a proper string abstraction; strings are by far the weakest part of the C language.

There are also a number of things he would do the same, starting with the use of C as the implementation language. It is, he said, a dangerous language, but the programmer always knows what is going on. Object-oriented programming, he said, is a mistake; it hides too much. Beyond that, he would continue to do things in small chunks. The creation of syslog (initially as a way of getting debugging information out) was obviously the right thing to do; he was surprised that there was no centralized way of dealing with logging data on Unix systems. He would still implement rewriting rules, albeit with a different syntax. And he would continue not to rely too heavily on outside tools. There is a cost to adding dependencies on tools; sometimes it's better to just build what you need. There are, he said, projects using lex when all they really need is strtok() .

There were a number of "takeaways" to summarize the talk:

The KISS (keep it simple, stupid) principle works.

If you don't know what you are doing, advance designs will not help.

The world is messy, just plan on it.

Flexibility trumps performance when the world changes every day.

Fix things early; your installed base will only get larger if you succeed, and the pain of not fixing things will only get worse.

Use plain text for internal files and protocols.

Good documentation is the key to broad acceptance; most projects, he said, have not yet figured this out.

The talk was evidently based on a chapter from an upcoming book on the architecture of open-source applications.

One member of the audience asked Eric which MTA he would recommend for new installations today. His possibly surprising answer was Postfix. He talked a lot with Postfix author Wietse Venema during its creation, and was impressed. Postfix is, he said, nice work, even if he doesn't agree with all of the design decisions that were made.

