Introduction

Codec 2 is an open source speech codec designed for communications quality speech between 700 and 3200 bit/s. The main application is low bandwidth HF/VHF digital radio. It fills a gap in open source voice codecs beneath 5000 bit/s and is released under the GNU Lesser General Public License (LGPL).

Informal listening tests indicate that Codec 2 at 700 bits/s has better speech quality than MELP and is comparable to TWELP at 600 bit/s.

The Codec 2 project also contains several modems (OFDM, FDMDV, COHPSK and mFSK) carefully designed for digital voice over HF radio; GNU Octave simulation code to support the codec and modem development; and FreeDV – an open source digital voice protocol that integrates the modems, codecs, and FEC. FreeDV is available as a GUI application, an open source library (FreeDV API), and in hardware (the SM1000 FreeDV adaptor).

The motivations behind the Codec 2 project are summarised in this blog post.

Individuals can support Codec2 development by helping out with coding, testing, and documentation, buying a SM1000, or donating via PayPal or Patreon. Companies can support Codec 2 by paid contract development.

Here are some samples:

Codec Male Female Original male female Codec 2 3200 bit/s male female Codec 2 2400 bit/s male female Codec 2 1300 bit/s male female Codec 2 700C bit/s male female

Here is Codec 2 operating at 2400 bit/s compared to some other low bit rate codecs:

Codec Male Female Original male female Codec 2 2400 bit/s male female MELPe 2400 bit/s male female AMBE 2000 bit/s male female LPC-10 2400 bit/s male female

Notes: Thank you very much Armin for providing the MELPe samples. The AMBE samples were generated using a DV-Dongle, a USB device containing the DVSI AMBE2000 chip. The LPC-10 samples were generated using the Spandsp library.

Here is Codec 2 operating at 700 bit/s compared to MELPe at 600 bit/s:

Codec Male Female Original male female Codec 2 700 bit/s male female MELPe 600 bit/s male female

Here is Codec 2 operating at 3200 bit/s compared to some higher bit rate CELP codecs, typically used for VOIP and mobile phone work:

Codec Male Female Original male female Codec 2 3200 bit/s male female AMR 4750 bit/s male female g.729a 8000 bit/s male female

Here are some samples with acoustic background noise, similar to what would be experienced when driving a truck. As you can see (well, hear) background noise is a tough test for low bit rate vocoders. They achieve high compression rates by being highly optimised for human speech, at the expense of performance with non-speech signals like background noise and music. Note that Codec 2 has just one voicing bit, unlike mixed excitation algorithms like AMBE and MELP. The MELPe sample has the noise supression option enabled.

Codec Male with truck noise Original male Codec 2 2400 bit/s male AMBE 2000 bit/s male MELPe 2400 bit/s male LPC-10 2400 bit/s male

Source Code

Browse development code:

https://github.com/drowe67/codec2

https://github.com/drowe67/freedv-gui

See freedv.org for a list of release repositories.

Mailing List

For any questions, comments, support, SM1000 support, please post to the Codec2 Mailing List

Chat

#freedv IRC channel on freenode.net

How it Works

What follows is basic introduction to the core Codec 2 algorithms using maths in ‘C code’ to make it more familiar.

Also see:

A presentation on Codec 2 in Power Point or Open Office form. At linux.conf.au 2012 I presented a graphical description of how Codec 2 works, see the Links section below. This is a really gentle introduction.

Codec2 uses “harmonic sinusoidal speech coding”. Sinusoidal coding was developed at the MIT Lincoln labs in the mid 1980’s, starting with some gentlemen called R.J. McAulay and T.F. Quatieri. I worked on these codec algorithms for my PhD during the 1990’s. Sinusoidal coding is a close relative of the xMBE codec family and they often use mixed voicing models similar to those used in MELP.

Speech is modelled as a sum of sinusoids:

for(m=1; m<=L; m++) s[n] += A[m]*cos(Wo*m*n + phi[m]);

The sinusoids are multiples of the fundamental frequency Wo (omega-naught), hence the name “harmonic sinusoidal coding”. For each frame, we analyse the speech signal and extract a set of parameters:

Wo, {A}, {phi}

Where Wo is the fundamental frequency (also know as the pitch), { A } is a set of L amplitudes and { phi } is a set of L phases. L is chosen to be equal to the number of harmonics that can fit in a 4 kHz bandwidth:

L = floor(pi/Wo)

Wo is specified in radians normalised to 4 kHz, such that pi radians = 4 kHz. The fundamental frequency in Hz is:

F0 = (8000/(2*pi))*Wo

We then need to encode (quantise) Wo, { A }, { phi } and transmit them to a decoder which reconstructs the speech. A frame might be 10-20ms in length so we update the parameters every 10-20ms (100 to 50 Hz update rate).

The speech quality of the basic harmonic sinusoidal model is pretty good, close to transparent. It is also relatively robust to Wo estimation errors. Unvoiced speech (e.g. consonants) are well modelled by a bunch of harmonics with random phases. Speech corrupted with background noise also sounds OK, the background noise doesn’t introduce any grossly unpleasant artifacts.

As the parameters are quantised to a low bit rate and sent over the channel, the speech quality drops. The challenge is to achieve a reasonable trade off between speech quality and bit rate.

Codec 2 Block Diagrams

Here are some block diagrams that illustrate the major sgnal processing elements for a fully quantised configuration of Codec 2. This example includes the LPC correction bit which was a feature of the 2550 bit/s version.

The encoder:

The decoder:

These figures were explained in a presentation I gave at the DCC 2011 conference, for more information see the video of that talk.

Example Bit Allocation

Parameter bits/frame Spectral magnitudes (LSPs) 36 Joint Pitch and Energy 8 Voicing (updated each 10ms) 2 Spare 2 Total 48

At a 20ms update rate 48 bits/frame is 2400 bits/s.

Challenges

The tough bits of this project are:

1. Parameter estimation, in particular voicing estimation.

2. Reduction of a time-varying number of parameters (L changes with Wo each frame) to a fixed number of parameters required for a fixed bit rate. The trick here is that { A } tend to vary slowly with frequency, so we can “fit” a curve to the set of { A } and send parameters that describe that curve.

3. Discarding the phases { phi }. In most low bit rate codecs phases are discarded, and synthesised at the decoder using a rule-based approach. This also implies the need for a “voicing” model as voiced speech (vowels) tends to have a different phase structure to unvoiced (constants). The voicing model needs to be accurate (not introduce distortion), and relatively low bit rate.

4. Quantisation of the amplitudes { A } to a small number of bits while maintaining speech quality. For example 30 bits/frame at a 20ms frame rate is 30/0.02 = 1500 bits/s, a large part of our 2400 bit/s “budget”.

5. Performance with different speakers and background noise conditions.

Is it Patent Free?

I think so – much of the work is based on old papers from the 60, 70s and 80’s and the PhD thesis work used as a baseline for this codec was original. A nice little mini project would be to audit the patents used by proprietary 2400 bit/s codecs (MELP and xMBE) and compare.

Proprietary codecs typically have small, novel parts of the algorithm protected by patents. However proprietary codecs also rely heavily on large bodies of public domain work. The patents cover perhaps 5% of the codec algorithms. Proprietary codec designers did not invent most of the algorithms they use in their codec. Typically, the patents just cover enough to make designing an interoperable codec very difficult. These also tend to be the parts that make their codecs sound good.

However there are many ways to make a codec sound good, so we simply need to choose and develop other methods.

Is Codec2 compatible with xMBE or MELP?

Nope – I don’t think it’s possible to build a compatible codec without infringing on patents or access to commercial in confidence information. We are pushing new boundaries where closed source can’t follow, such as innovative integration between codecs and modems.

Links