If you ever connected to the Internet before the 2000s, you probably remember that it made a peculiar sound. But despite becoming so familiar, it remained a mystery for most of us. What do these sounds mean?

[HTML5 audio: In-line recording of the beginning of a telephone call made by a modem.]

(The audio was recorded by William Termini on his iMac G3.)

As many already know, what you're hearing is often called a handshake, the start of a telephone conversation between two modems. The modems are trying to find a common language and determine the weaknesses of the telephone channel originally meant for human speech.

Below is a spectrogram of the handshake audio. I've labeled some signals according to which party transmitted them, and also put a concise explanation below.

(You can order this poster as a high-res print via Redbubble!)

Hello, is this a modem?

The first thing we hear in this example is a dial tone, the same tone you would hear when picking up your landline phone. The modem now knows it's connected to a phone line and can dial a number. The number is signaled to the network using Dual-Tone Multi-Frequency signaling, or DTMF, the same sounds a telephone makes when dialing a number.

The remote modem answers with a distinct tone that our calling modem can recognize. They then exchange short bursts of binary data to assess what kind of protocol is appropriate. This is called a V.8 bis transaction.

Suppressing echoes

Now the modems must address the problem of echo suppression. When humans talk, only one of them is usually talking while the other one listens. The telephone network exploits this fact and temporarily silences the return channel to suppress any confusing echoes of the talker's own voice.

Modems don't like this at all, as they can very well talk at the same time (it's called full-duplex). The answering modem now puts on a special answer tone that will disable any echo suppression circuits on the line. The tone also has periodic "snaps" (180° phase transitions) that aim to disable yet another type of circuit called echo canceller.

Finding a suitable modulation

Now the modems will list their supported modulation modes and try to find one that both know. They also probe the line with test tones to see how it responds to tones of different frequencies, and how much it attenuates the signal. They exchange their test results and decide a speed that is suitable for the line.

Enough small talk!

After this, the modems will go to scrambled data. They put their data through a special scrambling formula before transmission to make its power distribution more even and to make sure there are no patterns that are suboptimal for transfer. They listen to each other sending a series of binary 1's and adjust their equalizers to optimally shape the incoming signal.

Soon after this, the modem speaker will go silent and data can be put through the connection.

But why?

Why was it audible? Why not, one could ask. Back in the days, telephone lines were used for audio. The first modems even used the telephone receiver like humans do, by talking into the mouthpiece, until newer modems were developed that could directly connect into the phone line. Even then, the idea of not hearing what's happening on a phone line you're calling on was quite new, and modems would default to exposing the user to the handshake audio. And in case you accidentally called a human, you would still have time to pick up the telephone and explain the situation.