When home recording first became mainstream…

It happened for one simple reason:

The analog gear of decades past was slowly, but surely, being replaced…

By a new generation of audio interfaces and other digital gear that was cheaper and easier-to-use than ever before.

And that trend has continued since.

Today…digital audio is the standard nearly all studios, both pro and amateur.

Yet surprisingly few people really understand what it’s all about.

So for today’s post, what I have for you is a comprehensive introduction to the basics of Digital Audio for Music Recording.

These are the 9 topics we will cover:

The Rise of the Digital Era Digital Converters Explained Sample Rate Bit Depth Quantization Error Dither Latency Master Clocks Mp3 Encoding

Let’s begin…

1. The Rise of Digital Era

While digital audio is the standard in music nowadays…

It wasn’t always that way.

Originally, musical information existed only as sound waves in the air.

Then as technology advanced, people discovered ways of converting it to other formats, including:

notes on a page

electrical signals in a cable

radio waves in the atmosphere

bumps on a vinyl record

But ultimately, with the rise of computers, digital audio became the dominant format for music recording because it allowed songs to be easily copied and transported for free.

And the device that makes it all possible is…the digital converter.

To understand how they work, up next…

2. Digital Converters Explained

In the recording studio, digital converters exist in 2 forms:

As a stand-alone device in high-end studios , or…

As part of the audio interface in home studios .

To convert audio into binary code, they take tens-of-thousands of snapshots (samples) per second, to build an “approximate” picture of the analog waveform.

The picture is not exact, because in the moments between samples, the converter must essentially guess what’s going on.

As you can see in the above diagram where:

the red line is the analog signal, and…

the black line is the conversion…

The results aren’t perfect, but they’re good enough to produce excellent sound quality.

Exactly how excellent, depends mostly upon…

3. Sample Rate

Take a look at this picture:

As you can see…

By taking more snapshots per second, higher sample rates:

Gather more real information,

Use less guesswork,

Build a far more accurate picture of the analog signal

And the end result is, of course…better sound quality.

Now let’s talk specific numbers:

Common sample rates in pro audio include:

44.1 kHz (CD Audio)

(CD Audio) 48 kHz

88.2 kHz

96 kHz

192 kHz

The 44.1 kHz minimum is due to a mathematical principle known as…

The Nyquist-Shannon Sampling Theorum

To accurately record digital audio, converters must capture the full spectrum of human hearing, between 20Hz – 20kHz.

According to the Nyquist-Shannon Sampling Theorum…

Capturing a specific frequency, requires at least 2 samples for each cycle…to measure both the upper, and lower points on the waveform.

That means, recording frequencies of up to 20 kHz requires a sample rate of 40 kHz or more. Which is why CD audio lies just above that, at 44.1 kHz.

The Cost of High Sample Rates

While high sample rates DO produce better sound quality…the benefits aren’t free.

The costs include:

Higher processing loads

Lower track counts

Larger audio files

So there’s always a trade-off. Pro studios can more easily support the highest sample rates because they use better gear.

For home studios though, most people find that a default setting of 48 kHz works best.

Up next…

4. Bit Depth

To understand bit depth, let’s first discuss bits.

Short for binary digit, a bit is a single unit of binary code, valued at either a 1 or 0.

The more bits used, more combinations are possible. For example…

As you can see in the diagram below, 4 bits yields a total of 16 combinations.

When used to encode information, each of these numbers is assigned a specific value.

By increasing the bits, the number of possible values grows exponentially.

4 Bits = 16 possible values

= 16 possible values 8 Bits = 256 possible values

= 256 possible values 16 Bits = 65,536 possible values

= 65,536 possible values 24 Bits = 16,777,216 possible values

With bit depth in digital audio, each value is assigned a specific amplitude on the audio waveform.

The greater the bit depth, the more volume increments exist between loud and soft…and the greater the dynamic range of the recording.

A good rule of thumb to remember is: For every extra “bit”, dynamic range increases by 6dB.

For example:

4 Bits = 24 dB

8 Bits = 48 dB

16 Bits = 96 dB

24 Bits = 144 dB

Ultimately what this means is…more bit depth equals less noise…

Because by adding this extra headroom, the useful signal (on the loud end of the spectrum) can be recorded higher above the noise floor (on the soft end of the spectrum).

Up next…

5. Quantization Error

It sounds impressive, that a 24 bit recording yields almost 17 million possible values, right?

Yet that’s still far less than the infinite number of possible values that exist in an analog signal.

So with almost every sample, the actual value lies somewhere in-between two possible values. The converter’s solution is to simply round-it-off or “quantize” it to the nearest value.

The resulting distortion, known as quantization error, happens at 2 phases of the recording process:

in the beginning, during A/D conversion at the end, during mastering

With mastering, the sample rate/bit depth of the final track is often reduced upon conversion to its final digital format (CD, mp3, etc.).

When this happens, some information gets deleted and “re-quantized” resulting in further distortion of the sound.

To deal with this problem, there’s a handy solution known as…

6. Dither

When reducing a 24 bit file down to a 16 bit file, dither is used to essentially mask a large portion of the resulting distortion…

By adding a low-level of “random noise” to the audio signal.

Since the concept is hard to visualize with audio, the popular analogy used to explain it is dithering with images.

Here’s how it works:

When a color photo is converted black and white, mathematical guesswork is done to determine whether each colored pixel should be “quantized” to a black pixel, or a white pixel…

…Just like how guesswork is done to quantize digital audio samples.

As you can see in the figure below, the “before” picture looks pretty crappy, doesn’t it?

But with dither…

a small number of white pixels are randomized into the black regions …

a small number of black pixels are randomized into white regions …

And by adding this “random noise” to the image, the “after” picture looks much better. With audio dithering, the concept is very similar.

Up next…

7. Latency

The ONE BIG FLAW with digital studios today, is the amount of time-delay (latency) that accumulates in the signal chain, especially with DAW’s.

With all the calculations that occur, it takes anywhere from a few milliseconds to a few DOZEN milliseconds for the audio signal to exit the system.

With 0-11 ms of delay – it’s short enough that the average person won’t notice anything.

– it’s short enough that the average person won’t notice anything. With 11-22 ms – you hear an annoying slapback effect that takes some getting-used-to.

– you hear an annoying slapback effect that takes some getting-used-to. With 22 ms+ – the delay makes it impossible to play or sing in-time with the track.

In a typical digital signal chain, there are 4 stages that add to the total delay time:

A/D Conversion DAW Buffering Plugin Delay D/A Conversion

A/D and D/A conversion are 2 the smallest offenders, contributing less than 5 ms of total delay.

However…

Your DAW buffer, and certain plugins (including “look-ahead” compressors and virtual instruments), can add up to 20, 30, 40 ms or more.

To keep it at a minimum:

Deactivate all unnecessary plugins while you’re recording. Adjust your DAW buffer settings to find the shortest time your computer can handle without freezing.

As you’ll notice, buffer times are measured in samples, NOT milliseconds. To convert it:

Divide the number of samples by the session’s sample rate (in kHz) to find the latency time in milliseconds .

For example: 1024 samples ÷ 44.1 kHz = 23 ms

If you hate doing math, here’s an easier way to remember it at 44.1 kHz:

256 samples = 6 ms

512 samples = 12 ms

1024 samples = 24 ms

In MOST cases, these steps should bring the latency down to a manageable level…

But sometimes, if your gear is either too old or too cheap, it may NOT.

In that case…

The Last Resort

Many budget interfaces have a “mix” or “blend” knob, which allows you to combine the session playback, with “live signal” being recorded.

By splitting your live mic/guitar signal and sending half to the computer to be recorded, and half directly to your studio headphones, you avoid latency by side-stepping the signal chain entirely.

The downside to this technique is…you hear the live signal completely dry, with zero effects.

Hopefully though, since computers keep getting faster, this won’t be an issue in the near future.

Up next…

8. Master Clocks

Whenever two or more devices exchange digital information in real-time…

Their internal clocks must be synced so the samples stay aligned…

Preventing those annoying clicks and pops in the audio that otherwise occur.

To sync them, one device functions as the “master“, and the rest as “slaves“.

In simple home studios, the audio interface clock usually leads by default.

In pro studios, which require premium digital conversion and complex signal routings…

A special stand-alone device known as a digital master clock (aka word clock) can be used instead. As many owners claim, the the sound benefits of these high-end clocks can be far less-subtle than you might imagine.

Up next…

9. Mp3/AAC Encoding

In today’s world, compressed audio files are the norm in digital audio.

Because with the limited storage space of iPods, smartphones, and internet streaming, all files must be as-small-as-possible.

Using a method of “lossy data compression”, mp3, AAC, and other similar formats can shrink audio files down to 1/10th their original size.

The encoding process works using a principle of human-hearing known as “auditory masking“…

Which makes it possible to delete tons of musical information, while still maintaining acceptable levels of sound quality to most listeners.

Experienced audio engineers might hear a difference, but average consumers will not.

Exactly how much information gets deleted, depends on the bitrate of the file.

With higher bitrates, less information is removed, and more detail is preserved.

For example, with mp3:

320 kbit/s is the maximum possible bit rate

is the maximum possible bit rate 128 kbit/s is the recommended minimum

is the recommended minimum 256 kbit/s is the sweet spot that most people prefer

To find the ideal format and bitrate for YOUR music, always double-check the recommendations of its destination (iTunes, YouTube, Soundcloud, etc.)