For most of us, communication is something that comes naturally and binds living creatures together so we can connect, work and share ideas with countless people of the world. Due to the convenience and huge accessibility of the modern communication systems, we often take it for granted. The ability to select a message at one point and reproduce the exact replica of it at another location does not intrigue us. However, this was the fundamental problem and the area demanding intense research in 20th-century ¹. To perceive how admirable the evolution of communication and information theory is, let’s go back in time and consider telecommunications in the 1940s.

Back then, the telephone system was growing rapidly; however, the major problem was the difference between the received and the transmitted signal. The received signal was not exactly equal to the transmitted signal but transmitted signal and a bit of noise (Figure: 1). When the signal was amplified, the noise got amplified too resulting in a weakened signal at the receiver’s end⁵ (Figure: 2).

Figure 1: The receiver received both the source signal and the noise.

Figure 2: The amplifier amplified both signal and noise.

“Before 1948, there was only the fuzziest idea of what a message was. There

was some rudimentary understanding of how to transmit a waveform and

process a received waveform, but there was essentially no understanding of how to turn a message into a transmitted waveform.”

[Gallager, Claude Shannon: A Retrospective, 2001 pg. 2683]²

In 1948, Shannon published a paper “A Mathematical Theory of Communication” in the Bell Systems Technical Journal and showed how information can be quantified with absolute precision. He believed that all information media: telephone signals, text, radio waves, images and essentially every mode of communication can be encoded in bits¹. Now, instead of simply amplifying the message, one can read the digitized message of the transmitted signal (sequel of 0s and 1s) and repeat it exactly at the receiver’s side⁵.

In his paper, Shannon introduced four major concepts²:

Channel Capacity & The Noisy Channel Coding Theorem

Digital Representation

Source Coding

Entropy & Information Content

Channel Capacity & The Noisy Channel Coding Theorem

To carry any message to the receiver, a medium/channel is required. For instance, students (receiver) are able to listen to the teacher (source) in a classroom through the air (channel). The electromagnetic field can be used as a channel for longer communication. The amount of information that can be carried by any channel is it’s capacity which is measured in bits per second, though we rather use units like megabits per second (Mbit/s) or megabytes per second (MB/s) nowadays.

According to Shannon, every communication channel has a speed limit (also known as Shannon Limit) below which we can transmit any information (even noisy and faint signals) with zero error. However, it is impossible to get error-free communication above the limit as the channel cannot go faster than the limit without losing some information; no matter how sophisticated an error correction scheme is used or how much the data is compressed.

In particular, suppose we want to download an image from Google. There is a certain speed limit of the internet (channel) below which we can download the image without any loss of quality. The average rate of transfer can be deduced from the average size of encoded images and the channel’s capacity⁵:

Rate of transfer = Capacity/(Average size)

= (10 Mbits/s)/(1 Mbits/File)

= 10 Files/s

We can download the images much faster by improving the encoding of images. With the most optimal encoding, an optimal rate of image transfer can be determined as below⁵:

Maximum rate of transfer = Capacity/(Shannon’s Entropy)

Shannon’s noisy channel coding theorem states that we can reconstruct the information perfectly (with a probability close to 1) even in a noisy signal by adding redundancy with enough entropy. However, there is a limit to the minimum quantity of redundancy required to retrieve the original message as it was. Example: If we take a CD and scratch it, it will playback perfectly. This is similar to the redundancy in English words: Whenever we hear she l*v*es dogs, we can easily fill in the blank.

Digital Representation

Shannon realized that any information sources (text, sound, image, video) can be represented as 0’s and 1’s before passing to the channel. Thus, the content of the message is irrelevant to its transmission.

Source coding

Basically, source coding removes redundancy from the information and make it optimal. The most optimal source code today — Huffman coding — own an interesting story of invention². Shannon introduced ‘the Shannon-Fano code’ in 1948 which was not always optimal. So, as an alternative to the final examination, Fano gave his class an option to write a term paper on finding the most efficient coding method. Just when Huffman, a student of Fano, was deciding to give up, he realized that the optimal code can be obtained by representing the most frequent symbols with the shortest codes.

Suppose a person X loves “pizza”, mostly orders it in a restaurant and never orders other foods except “mo:mo”, “spaghetti” and “sandwich”. Traditionally, the codewords to each food were assigned as:

All the codewords are 2 bits long regardless of how common they are. The area is the average length of a codeword we send — in this case, 2 bits.

Let’s use Huffman code now — we will assign “pizza” the shortest code and give longest code to the uncommon foods.

Now, the average length of the codeword is 1.75 bits and we have reached the optimal length.

Entropy and Information Content

The amount of information that can be sent down a noisy channel can be defined in terms of transmit power and bandwidth¹. We can either send information using high power and low bandwidth or high bandwidth and low power. Traditionally, the narrow-band radios were used, which focussed all their power into a small range of frequencies and were highly susceptible to interference: extreme power was confined to small portions of the spectrum². Shannon offered a solution to this by quantifying the amount of information in a signal, stating that information is the amount of unexpected data contained in a message¹. He termed the information content of a message ‘entropy’. Thus, the more a transmission resembles random noise (which is unexpected bits), the more information it holds².

Identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome of a roll of dice (with six equally likely outcomes). The entropy of a uniform deck of cards increases as we shuffle them randomly more and more.