Sound Decision Inside the world of audio branding with Skype’s new pings, bounces, and pops By Adi Robertson | Illustrations by Peter Steineck

The year that Skype launched its calling service, the world was in the midst of a sonic crisis: the ringtone.

Mobile phones — to which Skype was an indirect competitor — were becoming ubiquitous, and so were the personalized sounds that went with them. Shortly before the company put out the first of several betas in August of 2003, an analyst report predicted that ringtone sales would soon bring in more money than CD singles.

"In 2003, it seems that a person’s most valued and public expression of self seems to be embodied in the customized features of his cell phone," wrote one woman in a BBC opinion poll. "With priorities like these, it’s no wonder we have so many problems in the world today."

Ringtones weren’t just a signal that someone wanted to talk to you — they said something about who you were. And they were a sign of how profoundly a simple interface choice could change an entire environment.

Where cellphones had severed the link between telephones and landlines, Skype went a step further: it separated voice calling from the telephone entirely. Developed by two Scandinavians who’d previously worked on file-sharing service Kazaa, Skype wasn’t the first company to offer voice over internet protocol (or VoIP) services. But it was free, simple, and released at a time when internet speeds were climbing. By the time it introduced a "version 2.0" with video calls and a new design in 2005, it boasted 54 million registered users worldwide.

People went to Skype to hear someone’s voice, and the sounds that accompanied the application were central to the user’s experience. They were carefully designed to reflect a mix of pleasant, familiar noises: a call was marked by the traditional, pulsing ring of a telephone, while other actions triggered a combination of whimsical bounces, pops, whispers, and zooms. But each one was also supposed to say something about Skype — and to help the company’s name become synonymous with online calling.

Each bounce, pop, whisper, and zoom is supposed to say something about Skype

Audio branding is as old as jingles or the MGM lion’s roar , but it’s only been recognized as a specific field more recently. The Audio Branding Academy, an industry group founded in 2009, says it was aware of 145 agencies worldwide in 2013, up from 126 in 2010. Companies might come to these agencies for everything from a handful of recordings to a sonic identity — a whole catalog of sounds that can be remixed for commercials, online videos, or user interfaces.

This year, Skype is revamping its sonic identity for the first time in 10 years, and it’s turning to a New York-based sonic branding agency called Listen. While reimagining noises like incoming chat pings, call sounds, and error notifications, the small team of audio engineers and designers needs to integrate new apps like Skype for Business, formerly known as work messaging app Lync. It needs to fit Skype into the larger scheme of Windows and Microsoft, another of its clients. And it needs to do so while preserving the identity of one of the most recognizable online communication tools in the world.

For the overwhelming majority of humanity’s existence, the tools we’ve used have come with their own set of audio signals, often unintentional ones. Hammers aren’t designed to give acoustic feedback, but we can hear when we hit a nail squarely . An axe is a rudimentary object, but its aural cues tell us not only whether it’s successfully biting into a tree, but how deep it’s going, and how close it is to striking through.

As our tools have become increasingly digital, they’ve also become silent

As our tools and machines have become increasingly digital, they’ve also become increasingly silent — and many of those natural cues and signals have disappeared. Instead, we rely on noises that have been selected or created to give a specific effect. Electric cars with silent motors mimic noisy gas-powered vehicles, for example, because a motor gives bystanders surprisingly complex warnings — how near a car is, how powerful it might be, and how fast it’s going. While physical keyboards opt for silent rubber buttons instead of clicky mechanical springs , we put time and energy into creating sounds for the digital keyboards on our touchscreen devices.

There’s no such thing as a "natural" computer-interface sound. But for decades, an entire industry of musicians, engineers, and advertisers has devoted itself to creating these acoustic signifiers, from the moment we boot up a machine to the moment we shut it down.

In the 1970s and 1980s, one of the most influential computing achievements was the graphical user interface — the switch from entering text commands to arranging tools and folders on a metaphorical "desktop." But there was no equivalent revolution in audio interfaces. Sound is ill-suited to the kinds of interactions we expect from computers. Unlike our eyes, our ears don’t let us shut out irrelevant input or save information until we’re ready to pay attention.

But William Gaver, who did some of the earliest work on sound and personal computing, thought that "auditory icons" could convey as much information as visual ones. As a graduate student in the early 1980s, Gaver studied under engineer and psychologist Donald Norman, whose work focused on how humans interacted with the objects around them. Gaver began his own work on audio design, which led to an internship with Apple in 1986. There, he began an ambitious project: creating an audio counterpart to the Macintosh computer’s recently introduced file manager, Finder.

The cry of seagulls, for example, might mean new emails

Gaver argued that users were already accustomed to relying on a computer’s unintended noises: They estimated system activity by listening to the whir of a hard drive, diagnosed printer malfunctions based on their clicking, and used the sound of a modem to tell when they’d gotten online. He imagined extending that idea to complement the Finder interface, a system he called "Sonic Finder." With the help of the Finder team, Gaver went through the code and applied recordings of real-world actions like tapping a metal container and breaking dishes to digital actions like dragging files and opening documents. Gaver eschewed the easiest metaphors. The action of copying a file could make the sound of a photocopier, for example, but a computer file didn’t have separate "pages." So he decided it made more sense to represent progress with the sound of water pouring into a glass, the frequency changing as it got closer to finishing .

More ambitiously, the Sonic Finder differentiated between different types of files and elements of the desktop. The sound of moving a big file, for instance, would be lower-pitched than moving a small one, like dragging a heavy object compared to a light one.

But Sonic Finder didn’t actually become part of Finder, and Gaver left Apple for one of the epicenters of computing research: the Xerox Palo Alto Research Center, or PARC. Programmers at PARC had created the original "desktop" interface, and while Gaver moved on from audio design to other forms of human-computer interaction, other PARC researchers were looking for new ways to let people interact with computers using sound.

A project called Audio Aura played with the audio equivalent of augmented-reality glasses. Relying on wireless headphones and infrared location-tracking badges, Audio Aura dropped sound clips around an office, (ideally) subtly alerting employees to new emails or how long a coworker had been away from their desk. Its creators imagined a combination of voices, musical snippets, and "earcons" that sounded like waves and birds. The project was a rough prototype, and the sounds could trend towards the bizarre: The cry of seagulls, for example, might mean new emails, with a volume of birds proportionate to the unread messages .

As personal computers became ubiquitous in the 1990s, most people’s experience of audio interfaces would be far more mundane. But the decade also produced some of the most iconic sounds in computing history: things like the Apple chime, Windows 95’s start-up fanfare, and the five-note "Intel Inside" sequence.

One of these — the lush major chord that plays whenever a Mac is turned on — was a simple fix for a serious aural misstep. It was created by Apple sound designer Jim Reekes to replace the Macintosh II’s tritone boot sound — an unsettling chord sometimes known as the "devil’s interval." The new tone was supposed to act as a refreshing "palette cleanser" on startup, putting users at ease. ("I was thinking about, you know, you’re going to hear the sound every time it crashes," he later said.) Though Apple’s larger developer team apparently wouldn’t agree to add the sound, Reekes said he surreptitiously added it to the Mac OS firmware anyway. It’s been part of the operating system ever since, with only slight modifications.

Although Sonic Finder never got traction at Apple, the idea of unified soundscapes did. In the mid-’90s, another of Apple’s designers, Jim McKee, started developing audio palettes for one of the company’s new big ideas: multiple, customizable Mac OS themes. Touching or scrolling through just about anything would make a sound, mixed from over a hundred small office noises like binder clips snapping, pencils scraping, and drafting implements being dragged around.

"Steve Jobs returned, and pretty much shut it down."

The design was supposed to be subtle enough that users barely noticed. "It added texture to the UI, more than sound," says McKee. "It made it feel like you were actually touching things and moving things." It was even set to slightly vary the pitch and volume of each button click, in the same way that tapping a real-world object doesn’t produce the same sound every time.

Unfortunately, Steve Jobs had just retaken the helm at Apple, and the company was in chaos. McKee designed five palettes for Mac OS 8.5, which were supposed to include customizable visual and audio themes. Apple stripped all but one of those themes out at the last minute. "Steve returned, and pretty much shut it down," McKee says. "What I heard someone say was that he came and reviewed it and said ‘Nobody wants sound coming out of their computers.’"

But if there was sound, companies decided, it should convey exactly what they stood for. The perfect time for this was during the computer’s start-up, which essentially displayed a static advertisement already. When Microsoft approached ambient music pioneer Brian Eno during the development of Windows 95, for instance, it wanted an entire company manifesto packed into the space of a boot screen . "The thing from the agency said, ‘We want a piece of music that is inspiring, universal, blah-blah, da-da-da, optimistic, futuristic, sentimental, emotional,’ this whole list of adjectives," Eno said in a 1996 interview. "And then at the bottom it said, ‘And it must be 3 1/4 seconds long.’" He apparently liked the idea so much he created 84 of them.

Microsoft and Skype’s current sonic branding seems equally complicated. In July, I headed to the New York office of audio-branding studio Listen, where Steve Milton and a handful of other designers have spent six months defining the sounds of Skype’s next iteration.

Milton, who co-founded Listen in 2012, owes his career in part to an airplane trip. When the plane’s speakers dinged to tell passengers they’d reached cruising altitude, Milton noticed that the chord they were playing was a minor third . "I remember being scared and interpreting it as a negative thing, because everyone in Western culture understands a minor third to be sad," he says, as we sit in one of Listen’s modest conference rooms. With a half-step difference, he thought, he would have heard a friendly alert, instead of a sinister warning. "And I remember thinking, Why that sound? Who made that decision? Why is it that way?"

Friendliness, in fact, was central to Skype’s original sound interface. The company’s current design director Steve "Buzz" Pearce remembers the pitch for the program as "the landline of the future," but the team wanted Skype to feel as natural as answering an old phone. When users got a notification that someone was calling, they wanted it to be an intimate, unobtrusive extension of the person at the other end of the line.

Skype’s original sounds, recorded by outside studio Soundtree Music, eschewed the style and tone of the chintzy cellphone ring. "We didn’t want it to become like a brainworm," says Pearce, humming the "Grande Valse" — Nokia’s famous and sometimes tooth-grinding ringtone . "Not like that."

Skype and Soundtree opted to use original sounds as basic building-block elements. "All the actual components [were] recorded organic sounds like wind, water, pops, people’s voices," says Pearce. Wind, he says, provided the white noise in a notification. A bubble pop could be recorded from a ketchup bottle, a glass, or a human gasp or gulp. "We don’t like technical things, even though we are a technical company," he adds.

"If you actually ask people to hum or sing the Skype ringtone, they can’t."

Once recorded, the sounds were layered on top of each other, creating something abstract but acoustically natural. Skype’s most memorable element was the five-beat incoming call notice, mixed from recordings of a human breath, water, and voices. "If you actually ask people to hum or sing the Skype ringtone, they can’t, accurately," says Pearce. "We did that on purpose, because we don’t want it sticking."

When Listen took on the task of redesigning Skype’s sounds, Milton knew there was a high bar to meet. "The old brand director would talk about how whenever the Skype ringtone would occur, his kids would come running in, and they would anticipate seeing or hearing grandma," he says. "Having that sound and knowing an association is important, so we don’t want to lose the essence of that."

Like other brands, Skype has its own set of key "identity" words, which the interface is supposed to embody — the service is supposed to evoke terms like "clean" and "optimistic," compared to Microsoft’s "trustworthiness" and "security." ("That’s not to say Skype wasn’t trusted," clarifies Maria Ramos, who until recently managed Skype’s brand. "But Skype was seen very much as a quirky, fun brand that you use occasionally.") This identity unifies everything from the full musical ringtone to the short blips of an incoming text message — and provided a roadmap for Listen’s audio designers to follow.