Theory of Unspoofable Device Identification Using NAND Flash Memory

Markus Jakobsson and Karl-Anders Johansson



In 1998, Intel announced the introduction of processor identities. Anti-fraud practitioners celebrated, security experts busied themselves thinking of the research implications, and privacy advocates were terrified.

In the end, Intel cancelled the processor identity plans. Unfortunately, I would say, given how fraud has mushroomed. As a result, machines are identified in other ways – but not so well.

Cookies are used to identify repeat visitors, but as cookies are often erased and often stolen, their value is limited. Many companies identify machines by objects in their browser cache, and publicly readable machine configurations – such as what browser and operating system you use, and what your screen resolution is. Good guys are profiled. But crooks dodge the checks or steal and use other people’s machine identities.

We desperately need a reliable way of identifying devices.

Fortunately, this is possible. And all from software to boot.

Let me explain how. Many laptops and all cell phones and tablets use so-called NAND flash memory for system specific or general storage. NAND flash is a tricky thing: it is quite error prone, and as the memory is used, some good cells turn bad. But bad cells never turn good. They are broken. NAND flash can actually lose data integrity just by reading its contents, but such errors can be corrected using error-correcting codes. When a block gets permanent bit errors, it is simply marked as bad, and avoided onwards. There are actually several of these bad blocks as the chip leaves the factory!

Broken stuff?!? That is great!

Imagine that we select a particular block to store an identity in. Say, block 1024. When a device is first introduced, it may not have any errors in this block, but no problem. We will write and erase it thousands and thousands of times, which takes a few seconds. This creates errors. If we do not get enough, we continue a bit longer. (We then put this block on the Bad Blocks list to make sure it is not used by mistake by the file system or other processes. We will always be able to access the block even though it is on the list.)

We can easily check what the errors are. We set all the bits in the block to zero, then read the block. Some cells will be broken and will result in 1s when we read them. We then set them all to ones and read them. Some will still come out as 0s. We have now found the errors! (No need for error-correcting codes; in fact, we will read and write “raw”, which is possible since all of this will be done on OS level.)

When a machine comes back, what do we do? Same again. Set all bits in block 1024 to zero, read them back. Set them all to one, read them back. That’s the identity. Sure, it won’t encode your name or your phone number, but it will be unique.

In other words, we recognize devices (or rather: their flash memory) by their defects. Very much like humans recognize faces: by their defects (or deviations from the “norm”) … a bigger nose, a bit too bushy eyebrows, bigger cheeks.

The nice twist is that if an attacker manages to read your device identity, he cannot inscribe it into his own device. Yes, he can create errors – like we did. But he cannot control where in the block they occur as this relies solely on microscopic manufacturing defects in the silicon. Nor can the attacker overwrite the device identity of your device even if he runs his software on it. Sure, he can imprint more errors on the block, much like a burglar placing a new fingerprint on top of an existing one (good forensics software will still be able to match both fingerprints). Worst case: The attacker places a thousand “fingerprints” on top of each other leaving the block completely trashed. Tough luck. The identity is gone, but at least the machine has not been imprinted with a false identity.

If we run a secure boot or a reliable software-based attestation scheme before we ID a device, we know that there is no active malware that may modify the report that results from reading the machine identity. So we know that the reading actually comes from the intended block, and that it was done correctly.

We know the identity. No guessing.