Book excerpt





Chapter 0:

SOFTWARE TIME

It was winter 1975. I hunched over the teletype in the terminal room, a hulking console that shook each time its typewriter head whammed leftward to begin a new line. I stared at smudgy lines of black code. I’d lost track of the time several hours before; it had to be after midnight. The janitors had switched off the hall lights. I didn’t have anyone’s permission to hang out there in the NYU physics building, using the computer account that the university was giving away, for free, to high school students. But then nobody was saying no, either.

I was fifteen years old and in love with a game called Sumer, which put me in charge of an ancient city-state in the Fertile Crescent. Today’s computer gamers might snicker at its crudity: Its progress consisted of all-capital type pecked out line by line on a paper scroll. You’d make decisions, allocating bushels of grain for feeding and planting, and then the program would tell you how your city was doing year by year. “Hamurabi,” it would announce, like an obsequious prime minister who feared beheading, “I beg to report . . .”

Within a couple of days of play I’d exhausted the game’s possibilities. But unlike most of the games that captivate teenagers today, Sumer invited tinkering. Anyone could inspect its insides: The game was a set of simple instructions to the computer, stored on a paper tape coded with little rows of eight holes. (The confetti that accumulated in a plastic tray on the teletype’s side provided nearly as much fun as the game.) It had somehow landed among my group of friends like a kind of frivolous samizdat, and we shared it without a second thought. Modifying it was almost as easy as playing it if you took the couple of hours required to learn simple Basic: You just loaded the tape’s instructions into the computer and started adding lines to the program.

Sumer was a blank canvas — history in the raw, ready to be molded by teenage imaginations. My friends and I took its simple structure and began building additions. Let’s have players choose different religions! What if we throw in an occasional bubonic plague? Barbarian invaders would be cool. Hey, what about catapults?

That night I was obsessed with rebellion. Sumer had always had a rough provision for popular uprising; if you botched your stewardship badly enough, the people would rise up and remove you from power. (Sumer’s original author was an optimist.) I thought there needed to be more variety in the game’s insurrections, so I started inventing additions — new subroutines that would plunge Sumer into civil war or introduce rival governments competing for legitimacy.

I didn’t care how late it was. The F train ran all night to take me home to Queens. The revolution had to be customized!

A quarter century later, in May 2000, I sat in an office in San Francisco and stared at a modern computer screen (high resolution, millions of colors). Wan ranks of half-guzzled paper coffee cups flanked my keyboard. It was 5:00 A.M.

I was forty years old, a founder and now managing editor of the online magazine Salon, and in charge of a software development project. It had taken us months of meticulous planning. It promised to revolutionize our Web site with dynamic features. And it was disintegrating in front of me.

The lead programmer, after working around the clock for weeks and finally announcing that his work was done, had taken off for Hawaii on a long-planned family vacation. That left his boss, Chad Dickerson, our company’s technology VP, to figure out why the database that stored all our Web site’s articles would not talk to the programs that were supposed to assemble that data into published pages. Chad had been up for two nights straight trying to fix the problem. Otherwise, our two million readers would find nothing but old news on our site come Monday morning.

Hadn’t we built software before? (Yes.) Didn’t we test everything? (Not well enough, apparently.) How could we have screwed up so badly? (No good answer.)

I ate the last bag of pretzels from our vending machine, paced, and waited. Nothing helped. There was too much time. Time to read the email from a hapless colleague who’d planned a champagne-and-cake celebration in honor of the new project, and to respond: “Maybe we should hold off.” Time to feel alienated and trapped and wonder whether it had been a smart idea to give our system’s central server the name Kafka.

We finally published the first edition of our new, “improved” site at around 9:00 A.M. As my coworkers showed up at their desks in their usual Monday morning routines, it took them a while to realize that a half dozen of us had never gone home the night before.

Within a few weeks the software calmed down as our developers fixed the most pressing problems. But every time I hear about a company preparing to “upgrade its platform” by introducing a big new software scheme, I cringe.

The 1990s technology-industry boom introduced us to the concept of “Internet time.” The phrase meant different things to different people, but mostly it meant fast. Under the digital age’s new temporal dispensation, everything would happen — technologies would emerge, companies would rise, fortunes would be made — at gasp-inducing speed. That meant you couldn’t afford the time to perfect anything — but no need to worry, because nobody else could, either.

Internet time proved fleeting and was quickly displaced by newer coinages untainted by association with a closed-out decade’s investment fads. But the buzzword-mongers were onto something. Time really does seem to behave differently around the act of making software. When things go well, you can lose track of passing hours in the state psychologists call “flow.” When things go badly, you get stuck, frozen between dimensions, unable to move or see a way forward. Either way, you’ve left the clock far behind. You’re on software time.

A novice programmer’s first assignment in a new language, typically, is to write a routine known as “Hello World” — a bit of code that successfully conjures the computer’s voice and orders it to greet its master by typing those words. In Basic, the simple language of my Sumer game, it looks like this:

10 PRINT "HELLO WORLD!" 20 STOP

“Hello World” programs are useless but cheerful exercises in ventriloquism; they encourage beginners and speak to the optimist in every programmer. If I can get it to talk to me, I can get it to do anything! The Association for Computing Machinery, which is the ABA or AMA of the computer profession, maintains a Web page that lists versions of “Hello World” in nearly two hundred different languages. It’s a Rosetta stone for program code.

“Hello World” looks more forbidding in Java, one of the workhorse programming languages in today’s business world:

class HelloWorld { public static void main (String args[]){ System.out.println("Hello World!"); } }

Public static void: gazillions of chunks of program code written in Java include that cryptic sequence. The words carry specific technical meaning. But I’ve always heard them as a bit of machine poetry, evoking the desolate limbo where software projects that begin with high spirits too often end up.

It’s difficult not to have a love/hate relationship with computer programming if you have any relationship with it at all. As a teenage gamer, I had tasted the consuming pleasure of coding. As a journalist, I would witness my share of the software world’s inexhaustible disaster stories — multinational corporations and government agencies and military-industrial behemoths, all foundering on the iceberg of code. And as a manager, I got to ride my very own desktop Titanic.

This discouraging trajectory of twenty-five years of software history may not be a representative experience, but it was mine. Things were supposed to be headed in the opposite direction, according to Silicon Valley’s digital utopianism. In the months following our train wreck of a site launch at Salon, that discrepancy began to eat at me.

Programming is no longer in its infancy. We now depend on unfathomably complex software to run our world. Why, after a half century of study and practice, is it still so difficult to produce computer software on time and under budget? To make it reliable and secure? To shape it so that people can learn it easily, and to render it flexible so people can bend it to their needs? Is it just a matter of time and experience? Could some radical breakthrough be right around the corner? Or is there something at the root of what software is, its abstractness and intricateness and malleability, that dooms its makers to a world of intractable delays and ineradicable bugs — some instability or fickleness that will always let us down?

“Software is hard,” wrote Donald Knuth, author of the programming field’s most respected textbooks. But why?

Maybe you noticed that I’ve called this Chapter 0. I did not mean to make an eccentric joke but, rather, to tip my hat to one small difference between computer programmers and the rest of us: Programmers count from zero, not from one. The full explanation for this habit lies in the esoteric realm of the design of the registers inside a computer’s central processing unit and the structure of data arrays. But the most forthright explanation I’ve found comes from a Web page that attempts to explain for the rest of us the ways of the hacker — hacker in the word’s original sense of “obsessive programming tinkerer” rather than the later, tabloid sense of “digital break-in artist.”

Why do programmers count from zero? Because computers count from zero! And so programmers train themselves to count that way, too, to avoid a misunderstanding between them and the machines they must instruct. Which is all fine, except for the distressing fact that most of the human beings who use those machines habitually count from one. And so, down in the guts of the system, where data is stored and manipulated — representations of our money and our work lives and our imaginative creations all translated into machine-readable symbols — computer programs and programming languages often include little offsets, translations of “+1” or “-1,” to make sure that the list of stuff the computer is counting from zero stays in sync with the list of stuff a human user is counting from one.

In the binary digital world of computers, all information is reduced to sequences of zeros and ones. But there’s a space between zero and one, between the way the machine counts and thinks and the way we count and think. When you search for explanations for software’s bugs and delays and stubborn resistance to human desires, that space is where you’ll find them.

As this book began taking shape in my thoughts, I would drive to work every day over the San Francisco Bay Bridge. One morning, as my car labored up the lengthy incline that connects the Oakland shore to the elevated center of the bridge’s eastern span, I noticed, off to the right, a new object blocking the panorama of blue bay water and green Marin hill: The tip of a high red crane peeked just over the bridge’s deck level. It was there the next day and the next, and soon it was joined by a line of a dozen cranes, arrayed along the bridge’s north side like a rank of mechanical beasts at the trough, ready to feed on hapless commuters.

Work was beginning on a replacement for this half of the doubledecker bridge. A fifty-foot chunk of its upper level had collapsed onto the lower roadway during the 1989 Loma Prieta earthquake. Now a safer, more modern structure would rise next to the old.

In the weeks and months that followed, each of the 240-foot-tall cranes began pounding rust-caked steel tubes, 8 feet in diameter and 300 feet long, into the bay floor. In the early morning hours we could sometimes hear the thuds in my home in the Berkeley hills. One hundred sixty of these enormous piles would be filled with concrete to support the new bridge’s viaduct. The whole process was choreographed with precision and executed without a hitch; it felt inevitable, its unfolding infused with all the confidence we place in the word engineering.

If the subject of software’s flaws is discussed for more than a few minutes at a time, it is a certainty that someone will eventually pound a fist on the table and say, “Why can’t we build software the way we build bridges?”

Bridges are, with skyscrapers and dams and similar monumental structures, the visible representation of our technical mastery over the physical universe. In the past half century, software has emerged as an invisible yet pervasive counterpart to such world-shaping human artifacts. “Our civilization runs on software,” says Bjarne Stroustrup, who invented a widely used computer language called C++.

At first this sounds like an outlandish and self-serving claim. civilization got along just fine without Microsoft Windows, right? But software is more than the program you use to send email or compose a report; it has seeped into every cranny of our lives, without many of us noticing. It is in our kitchen gadgets and cars, toys and buildings. Our businesses and banks, our elections and our news media, our movies and our transportation networks, our health care and national defense, our scientific research and basic utility services — the stuff of our daily existence hangs from fragile threads of computer code.

And we pay for their fragility. Software errors cost the U.S. economy about $59.5 billion annually, according to a 2002 study by the National Institute of Standards and Technology, as two out of three projects came in significantly late or over budget or had to be canceled outright.

Our civilization runs on software. Yet the art of creating software continues to be a dark mystery, even to the experts. Never in history have we depended so completely on a product that so few know how to make well. There is a big and sometimes frightening gap between our accelerating dependence on software systems and the steady but slow progress in our knowledge of how to make them soundly. The dependence has increased exponentially, while the skill — and the will to apply it — advances only along a plodding line.

If you talk with programmers about this, prepare for whiplash. On the one hand, you may hear that things have never looked brighter: We have better tools, better testing, better languages, and better methods than ever before! On the other hand, you will also hear that we haven’t really made much headway since the dawn of the computer era. In his memoirs, computing pioneer Maurice Wilkes wrote of the moment in 1949 when, hauling punch cards up the stairs to a primitive computer called EDSAC in Cambridge, England, he saw the future: “The realization came over me with full force that a good part of the remainder of my life was going to be spent in finding errors in my own programs.” From Wilkes’s epiphany to the present, despite a host of innovations, programmers have been stuck with the hard slog of debugging. Their work is one percent inspiration, the rest sweat-drenched detective work; their products are never finished or perfect, just varying degrees of “less broken.”

Software is a heap of trouble. And yet we can’t, and won’t, simply power down our computers and walk away. The software that frustrates and hogties us also captivates us with new capabilities and enthralls us with promises of faster, better ways to work and live. There’s no going back. We need the stuff more than we hate it.

So we dream of new and better things. The expert who in many ways founded the modern field of software studies, Frederick Brooks, wrote an influential essay in 1987 titled “No Silver Bullet,” declaring that, however frustrated we may be with the writing of computer programs, we will never find a magic, transformational breakthrough — we should only expect modest, incremental advances. Brooks’s message is hard to argue with but painful to accept, and you can’t attend a computer industry conference or browse a programmers’ Web site today without bumping into someone who is determined to prove him wrong.

Some dream of ripping down the entire edifice of today’s software and replacing it with something new and entirely different. Others simply yearn for programs that will respond less rigidly and more fluidly to the flow of human wishes and actions, for software that does what we want and then gets out of our way, for code that we can count on.

We dream of it, then we try to write it — and all hell breaks loose.

Chapter 1:

DOOMED

[July 2003]

Michael Toy places his palms on his cheeks, digs his chin into his wrists, squints into his PowerBook, and begins the litany. “John is doomed. He has five hundred hours of work scheduled between now and the next release. . . . Katie’s doomed. She has way more hours than there are in the universe. Brian is majorly doomed. Plus he’s only half time. Andy — Andy is the only one who doesn’t look doomed. There are no hundreds on his list.”

They don’t look doomed, these programmers sitting around a nondescript conference room table in Belmont, California, on a summer day. They listen quietly to their manager. Toy is a tall man with an impressive gut and a ponytail, but he seems to shrink into a space of dejection as he details how far behind schedule the programmers have fallen. It’s July 17, 2003, and he’s beginning to feel doomed himself about getting everything done in the less than two months before they are supposed to finish another working version of their project.

“Everybody who has a list with more time than there is in the universe needs to sit down with me and go over it.”

These lists are the bug lists — rosters of unsolved or “open” problems or flaws. Together they provide a full accounting of everything these software developers know must be fixed in their product. The bug lists live inside a program called Bugzilla. Toy’s programmers are also using Bugzilla to track all the programming tasks that must be finished in order to complete a release of the project; each one is responsible for entering his or her list into Bugzilla along with an estimate of how long each task will take to complete.

“Now let’s talk about why we’re behind. Does anyone have a story to tell?”

There’s silence for a minute. John Anderson, a lanky programming veteran whose title is systems architect and who is, in a de facto sort of way, the project’s lead coder, finally speaks up, in a soft voice. “There’s a bunch of reasons. In order to build something, you have to have a blueprint. And we don’t always have one. Then you hit unexpected problems. It’s hard to know how long something’s going to take until you know for sure you can build it.”

“But you can’t just throw up your hands and say, I quit.” Toy usually prefers to check things off his agenda fast, running his developers’ meetings with a brisk attitude of “let’s get out of here as fast as we can” that’s popular among programmers. But today he’s persistent. He won’t let the scheduling problems drop. “We need to make guesses and then figure out what went wrong with our guesses.”

Jed Burgess, one of the project’s younger programmers, speaks up. “There’s a compounding of uncertainty: Your estimates are based on someone else’s estimates.”

Toy begins reviewing Anderson’s bugs. “The famous flicker-free window resizing problem. What’s up with that?”

Officially, this was bug number 44 in Bugzilla, originally entered on January 19, 2003, and labeled “Flicker Free window display when resizing windows.” I had first heard of the flicker-free window resizing problem at a meeting in February 2003 when the Open Source Applications Foundation (OSAF), whose programmers Toy was managing, had completed the very earliest version of its project, Chandler — an internal release not for public unveiling that came even before the 0.1 edition. Ultimately, Chandler was supposed to grow up into a powerful “personal information manager” (PIM) for organizing and sharing calendars, email, to-do lists, and all the other stray information in our lives. Right now, the program remained barely embryonic.

At that February meeting, Anderson had briefly mentioned the flicker bug — when you changed the size of a window on the Chandler screen, everything flashed for a second — as a minor matter, something he wanted to investigate and resolve because, though it did not stop the program from working, it offended him aesthetically. Now, nearly six months later, he still hasn’t fixed it.

Today Anderson explains that the problem is thornier than he had realized. It isn’t simply a matter of fixing code that he or his colleagues have written; its roots lie in a body of software called wxWidgets that the Chandler team has adopted as one of the building blocks of their project. Anderson must either wait for the programmers who run wxWidgets to fix their own code or find a way to work around their flaw.

“So you originally estimated that this would take four hours of work,” Toy says. “That seems to have been off by an order of magnitude.” “It’s like a treasure hunt,” Anderson, unflappable, responded. “You have to find the first thing. You have to get the first clue before you’re on your way, and you don’t know how long it will take.”

“So you originally estimated four hours on this bug.You now have eight hours.”

“Sometimes,” Anderson offers philosophically, “you just wake up in the morning, an idea pops into your head, and it’s done — like that.”

Mitchell Kapor has been sitting quietly during the exchange. Kapor is the founder and funder of the Open Source Applications Foundation, and Chandler is his baby. Now he looks up from his black Thinkpad. “Would it be useful to identify issues that have this treasure-hunt aspect? Is there a certain class of task that has this uncertainty?”

“Within the first hour of working on the bug,” Burgess volunteers, “you know which it’s going to be.”

So it is agreed: Bugs that have a black hole-like quality — bugs that you couldn’t even begin to say for sure how long they would take to fix — would be tagged in Bugzilla with a special warning label.

Shortly after the meeting, Toy sits down at his desk, calls up the Bugzilla screen, and enters a new keyword for bug number 44, “Flicker Free window display when resizing windows”: scary.

Toy’s fatalistic language wasn’t just a quirk of personality: Gallows humor is a part of programming culture, and he picked up his particular vocabulary during his time at Netscape. Though today Netscape is remembered as the Web browser company whose software and stock touched off the Internet boom, its developers had always viewed themselves as a legion of the doomed, cursed with impossible deadlines and destined to fail.

There was, in truth, nothing especially doomed about OSAF’s programmers: Several of them had just returned from a conference where they presented their work to an enthusiastic crowd of their peers — who told them that their vision could be “crisper” but who mostly looked at the blueprint for Chandler and said, “I want it now!” Though the software industry had been slumping for three straight years, they were working for a nonprofit organization funded by $5 million from Kapor. Their project was ambitious, but their ranks included veteran programmers with estimable achievements under their belts. Andy Hertzfeld had written central chunks of the original Macintosh operating system. John Anderson had written one of the first word processors for the Macintosh and later managed the software team at Steve Jobs’s Next. Lou Montulli, another Chandler programmer who was not at the meeting, had written key parts of the Netscape browser. They’d all looked doom in the eye before.

Similarly, there was nothing especially scary about bug number 44. It was a routine sort of problem that programmers had accepted responsibility for ever since computer software had migrated from a text-only, one-line-at-a-time universe to today’s graphic windowsâ€“and-mouse landscape. What scared Toy was not so much the nature of Bug 44 but the impossibility of knowing how long it would take to fix. Take one such unknown, place it next to all the other similar unknowns in Chandler, multiply them by one another, and you have the development manager’s nightmare: a “black hole” in the schedule, a time chasm of indeterminate and perhaps unknowable dimensions.

Two months before, the entire Chandler team of a dozen programmers had met for a week of back-to-back meetings to try to solve a set of problems that they had dubbed “snakes” — another word Toy had salvaged from Netscape’s ruins. A snake wasn’t simply a difficult problem; it was an “important problem that we don’t have consensus on how to attack.” Snake superseded a looser usage at OSAF of the word dragon to describe the same phenomenon.

Black holes, snakes, dragons — the metaphors all daubed a layer of mythopoetic heroism over the most mundane of issues: how to schedule multiple programmers so that work actually got done. You could plug numbers into Bugzilla all day long; you could hold one meeting after another about improving the process. Software time remained a snake, and it seemed invincible.

This was hardly news to anyone in the room at OSAF. The peculiar resistance of software projects to routine scheduling is both notorious and widely accepted. In the software development world, lateness was so common that a new euphemism had to be invented for it: slippage.

Certainly, every field has its sagas of delay; the snail’s pace of lawsuits is legendary, and any building contractor who actually finishes a job on time is met with stares of disbelief. But there’s something stranger and more baffling about the way software time bends and twists back on itself like a Mobius strip. Progress seems to move in great spasms and then halt for no reason. You think you’re almost done, and then you turn around and six months have passed with no measurable progress.

This is what it feels like: A wire is loose somewhere deep inside the workings. When it’s connected, work moves quickly. When it’s not, work halts. Everyone on the inside tries painstakingly to figure out which wire is loose, where the outage is, while observers on the outside try to offer helpful suggestions and then, losing their patience, give the whole thing a sharp kick.

Every software project in history has had its loose wires. Every effort to improve the making of software is an effort to keep them tight.