That is a perfect normal distribution of noise in our difference frame. However, to generate entropy we need to somehow convert this normal distribution to a uniform distribution. The first thought would be to use the inverse form of an elegant Box-Muller transform… which, unfortunately, wouldn’t work for us at all. The Inverse Box-Muller transform requires a continuous distribution, and our raw data is all discrete values.

In fact, the problem is even harder than it appears at first glance. Look at the numbers — despite the fact that the range of values we see in the noise is from -33 to +33 , over 96% of our data points are all in the narrow range from -10 to +10 . The size of that range depends on camera conditions: if a picture has too many oversaturated spots or too many dark spots [3] there will be less noise overall, and the range will decrease even further. For “lens covered” blackness the vast majority of data points can shrink to ± 3 : a total of just 7 different values. We need to generate perfect, uniform entropy from a limited sets of these values (around 21 values in perfect conditions, or down to 7 values in the worst conditions). And the distribution of these values will be highly biased — in our distribution the sample zero is happening most often with a frequency of 8.69%.

How much entropy can we hope to extract, theoretically? For this purpose, there is a well-known concept of “minimal entropy” (usually referred as min-entropy). Unlike Shannon’s entropy definition, which characterizes entropy of an overall distribution, min-entropy describes the worst case scenario — the characteristic of the value that happens most often and thus is the most predictable. In this sample, zero happens 8.69% of the time. The simple calculation of -log2(8.69%) gives us a theoretical min-entropy of 3.52 bits — the maximum number of bits we can expect to extract from a single data point. Another way of stating this: every time we read a pixel value of noise, no matter if the value is -30 or +10 or 0 , reading that value reduces our uncertainty only by 3.52 bits. Despite the fact that 33 would require more than 5 bits to encode ( 2^5 = 32 ), it doesn’t matter — the only entropy (uncertainty) hidden inside each value is just 3.52 bits at best. If we have to work with worse camera conditions (i.e. less variance of noise), then our entropy per data point will go down even more. The good news is that our 12 megapixel camera has lots and lots of data points produced every second!

So, depending on camera conditions, we encode all data points into 1, 2 or 3 bits: quantized_value = raw_value % 2^bits . However, even after encoding, our bits are still heavily biased, obviously with zero bits being far more prevalent since our most common value is zero. How can we go from a zero-biased bitstream to a bitstream with uniform distribution? To solve this challenge, we are going to use the granddaddy of all randomness extractors — the famous Von Neumann extractor.

Von Neumann extractor

John Von Neumann figured out how you can get perfect entropy even if your source is a highly biased coin: a coin that lands on one side far more often then another. Let’s say we have a coin that turns up heads ( 0 ) 80% of the time, and tails ( 1 ) 20% of the time. Yet despite having a coin that is so defective, we can still extract perfectly uniform entropy from it!

Here is the key insight from Von Neumann’s extractor. Let’s think about each pair of coin tosses. Obviously, a toss of 0,0 is far more likely than a toss of 1,1 since zero is far more probable to happen. But what will be the probability of a 0,1 toss compared with a 1,0 toss? The first toss has a probability of 80% x 20% = 16% . The second toss will have probability 20% x 80% ... wait, it is the same number! We just changed the order of variables, but the result is exactly the same.

Von Neumann understood that even if your source is biased, the probability of combinations 1,0 and 0,1 are equal and you can just call the first combination the new extracted ‘zero’, and second combination an extracted ‘one’. No matter how bad your source is, the stream of ‘new’ zeroes and ones will be perfectly uniform. Here is a classical Von Neumann extractor of biased random bits:

If you have a pair of 0,0 or 1,1 just toss them away: they do not produce anything.

or just toss them away: they do not produce anything. If you have 1,0 or 0,1 pair, use the final bit as your result and it will have a uniform random distribution: the probabilities of these two pairs are exactly the same!

The Von Neumann extractor works amazingly well and has the added benefit that we do not even need to know how biased our incoming bits are. However, we are throwing away much of our raw data (all these 0,0 and 1,1 pairs). Can we extract more entropy? Elias (’72) and Peres (’92) came up with two different methods to improve the Von Neumann extractor. We will explain Peres method since it is a bit simpler to explain and implement.

We already know that 1,0 and 0,1 have the same probability - there is no difference between 80x20 and 20x80 if a binary source is biased 80%/20%. But what about longer sequences? Let think about the “throw away” sequences 1100 and 0011 . If we use straight Von Neumann method we will get nothing from these: both 11 and 00 will be tossed away. But let's calculate probabilities: 1100 will be 20x20x80x80 , while 0011 will be 80x80x20x20 - the same numbers again.

There is clearly something very similar to what we have already seen. We have two sequences with the same probabilities. Again we can say that a sequence ending with 00 is a new “zero”, while sequence ending with 11 will be a new “one”. This is a case of recursion: we are using the same algorithm just on numbers doubled up instead of originals. Here is the Von Neumann extractor with simple recursion:

If you have a 1,0 or 0,1 pair, use final bit as your result (classic Von Neumann).

or pair, use final bit as your result (classic Von Neumann). If you have a 1,1 or 0,0 pair, convert it into 1 or 0 and add it to a temporary buffer.

or pair, convert it into or and add it to a temporary buffer. After you are done processing the initial data, run the Von Neumann extractor again on the buffer you collected.

Since all 1,1 and 0,0 sequences become just single bits, the Von Neumann classic algorithm by definition will extract more uniform bits if the initial data contained any 1,1,0,0 or 0,0,1,1 . And since this algorithm is recursive we can keep running it as long as something collects in our temp buffer and extract every last bit of entropy from longer sequences. Finally, the Peres algorithm has one more abstraction that allows us to extract even more entropy from a Von Neumann source by using the position of sequences that produce bits in initial stream, which is implemented in our codebase.

Now we have a powerful extractor that can deal with an input of biased bits, and we are almost ready to extract some entropy. But there is one more step we need to do first.

Quick Test: Chi-Squared

There is no test for “true” entropy. What might look like a perfect random entropy coming from, say, quantum physics, can be in fact completely deterministically created by a software generator and fully controlled by an attacker selecting an initial seed. There is no test on the data one can run that determines a difference between a “true quantum random process” and a “software generated sequence”. The very success of modern cryptography is the fact that we can make a “software generated sequence” (the result of an encryption algorithm) look indistinguishable from “true entropy”. Yet at the same time, there are a number of software tests for randomness: FIPS-140, SP800–90B. What is going on?

To understand the purpose of such “randomness tests”, we are going to review one of the simplest tests: chi-squared (pronounced ‘kai-squared’) goodness of fit. Khan Academy has an excellent video explaining step by step how to use the chi-squared test — check it out to learn more. The chi-square test is fairly simple and consists of the following:

Formulate a theory about whatever you want to test and then calculate what you expect to observe if your idea is correct.

Observe real-world events and record the observed values.

The chi-squared statistic is calculated by summing together across all events the following: (ObservedValue - ExpectedValue)²/ExpectedValue

Use a lookup table or chi-squared calculator to answer the key question: what is the probability that the observed events are the result of your theory?

As you can see, the formula is just saying in math language “I will collect the square of all differences between what you thought was true and what is happening in the real world in the proportion of what you were expecting. If any of your expectations are too far out from reality, then the squared difference will grow very fast! The bigger it becomes, the more wrong you are.”

Let’s work it out for our case — building a uniform entropy extractor. We want to produce random bytes, that can take values from 0 to 255. We want the distribution of our bytes to be uniform, so that the probability of every byte is exactly the same. Observing every byte is an event, so we have 256 events total. Let’s say our software produced 256,000 bytes and we are ready to run the chi-square test on them. Obviously, if the bytes are uniform (our theory) we would expect to see each byte with a probability of 1/256 . Therefore we will see each byte exactly 256,000/256 = 1000 times exactly. That is our expected value for each event. Then we record the observed value (how many times we actually see each byte in our data) and sum all 256 values of (ObservedValue - 1000)²/1000 . That number is our chi-squared statistic. What do we do with it?

Chi-squared tables are organized by two parameters — so-called degrees of freedom — which is a fancy way of saying “a number of independent events you have minus one”. In our case, the degrees of freedom will be 256-1 = 255 . The most interesting part is the second parameter P, a probability that we can calculate from 99.9% (extremely likely) to 0.1% (extremely unlikely).

The only printed table we could find tantalizingly cuts off at 250 — just 5 short from what we need! However, we can use a chi-squared calculator to get the exact values we need.

What this table tells us is the probability that our observation fits our hypothesis — the hypothesis we used to formulate all expected values. In other words, if we get chi-squared around 190, it virtually guarantees that the observed events match our theory about them — the theory that they all are coming from uniform distribution — but there’s still is a 0.1% chance that we are wrong! For a chi-squared at 284, there is a10% chance that our hypothesis is correct. That is not bad actually — if you run the chi-square test 10 times on 10 perfectly random samples, it is likely that you will get one sample around 284. If you run hundreds of random samples, one might have a chi-square value of 310, i.e. a sample with a low probability of 1%. If you willing to test a thousand samples, one might have a chi-squared value of 330 (0.1%).

As you can see, the chi-square test, like all other randomness tests, do NOT tell us anything about the nature of our randomness! They don’t tell us if it is “true” randomness or not, i.e. is it derived from a “true” physical source or generated by software. This test answers the much simpler question: after looking at the given sample, what is the chance that this sample of random values came from a source with uniform distribution?

Since the tests are probabilistic in nature, care must be taken in interpreting the results. It doesn’t mean that something is obviously “wrong” with a chi-square score of, say 310 (which is a low 1% probability). It takes just a few tries of running this script to produce chi-square scores over 300 (as measured by ent) from the state of the art /dev/urandom on Mac OS X. If you keep repeating the test dozens or even hundreds of times, you should expect less and less likely values to manifest.

for i in {1..10}; do dd if=/dev/urandom bs=1k count=1k &> /dev/null | ent -t | cut -d, -f4; done 194.700684 308.307617 260.475586 236.611328 316.724609 306.418945 262.301270 240.013184 205.627930 257.049805

However, the probability is going down fast above 330. The probability of chi-squared 400 is just 0.000001% — it might happen just once per 100 million of samples!

The chi-squared test will be our early warning radar when something breaks down in our extractor. The test tells us “hey, this looks less and less likely that your bits are coming from a uniform source!” if the chi-square score is rising. It states nothing about the nature, validity or source of our entropy, only how likely it is that the bits are produced by a uniform random source.

We will use the chi-squared statistic of entropy block as a real-time quality score, and we will run a more compressive randomness test suite later when we evaluate the final results.

All Together At Once

As we have seen, a good entropy extractor is quite a complicated piece of crypto machinery! Let’s review how we put all the pieces together:

Take two video frames as big arrays of RGB values.

Subtract one frame from another, that leaves us with samples of raw thermal noise values.

Calculate the mean of a noise frame. If it is outside of ±0.1 range, we assume the camera has moved between frames, and reject this noise frame.

range, we assume the camera has moved between frames, and reject this noise frame. Delete improbably long sequences of zeroes produced by oversaturated areas. For our 1920x1080=2Mb samples and a natural zero probability of 8.69% , any sequence longer than 7 zeros will be removed from the raw data.

samples and a natural zero probability of , any sequence longer than zeros will be removed from the raw data. Quantize raw values from ±40 range into 1,2 or 3 bits: raw_value % 2^bits .

range into 1,2 or 3 bits: . Group quantized values into batches sampled from different R,G,B channels, at big pixel distances from each other and in different frames to minimize the impact of space and time correlations in that batch.

Process a batch of 6–8 values with the Von Neumann algorithm to generate a few uniform bits of output.

Collect the uniform bits into a new entropy block.

Check the new block with a chi-square test. Reject blocks that score too high and therefore are too improbable to come from a uniform entropy source.

Whew! That was simple.

Generating Entropy

All right, let’s generate some hardcore “true” entropy! The iOS app is here or you can compile it from the source code and trace every bit as it travels through the extractor. There are two ways to get entropy from the app: AirDrop it to your MacBook (and we included the option to upload entropy block as CSV file as well for manual analysis), or use a Zax relay for secure device-to-device upload, which we will cover later.

If the camera view is good, simply run the extractor at the maximum bandwidth by extracting all 3 bits from each noise value.

However, things break down fast if the picture quality goes bad, by over saturation or having too many dark areas where noise variance goes down. With less variance, taking 3 bits from each datapoint will feed a lot of garbage into the extractor. The Von Neumann algorithm is not magic, and it will not create entropy out of thin air if we are oversampling raw data points[4]. Chi-squared statistics capture that immediately.

3 Bit Oversaturated

The fix is easy — if picture quality is poor, just reduce the number of bits extracted from each datapoint down to 2 bits or even 1 bit. For this view, reducing it down to 2 bits returns our extractor back to normal scores.

2 Bit Oversaturated

In truth, it is far simpler to run this extractor with excellent quality and maximum bit extraction, but it doesn’t make for an exciting video. Here is one of the best conditions we found to set up the camera for a perfect noise capture.

3 Bit MacBook Grey

All this required was the high tech equipment of a pencil that lifts up the camera just a bit above a MacBook’s dull grey surface. It is the perfect scenery to capture our full range of thermal noise[5].