An interesting and disturbing paper titled “The Bleak Future of NAND Flash Memory” written by two researchers at the Department of Computer Science and Engineering at the University of California, San Diego, and one Microsoft employee uses current trends with NAND Flash memories and SSD (Solid-State Disk) designs to cast the future of these technologies in a rather dim light. First, the bad news: there’s a definite endpoint where the diminishing returns from existing NAND Flash technology become so small that SSD design based on NAND Flash memories essentially halts at a predicted SSD capacity of 4.6Tbytes for MLC (two-level cell) NAND Flash memories and 14Tbytes for TLC (three-level cell) NAND Flash memories. It takes a lot of empirical measurement, math, and graphing in the paper to derive that number and if you want to see it in all the gory detail, then read the paper. Meanwhile, I’ll summarize the arguments here if you want the short(er) version.

NAND Flash memory makers optimize for volume and the volume driver for NAND Flash at the moment is USB memory drives. These USB drives completely replaced floppy disks about a decade ago and, consequently, NAND Flash memory chips have some very specific characteristics peculiar to removable media.

The important figures of merit for Flash-based USB drives are capacity and price/bit. Further, these solid-state USB drives are not written often. They serve as backup and data-transfer devices. Because of these figures of merit, NAND Flash vendors have been pushing capacity and cost/bit at the expense of other parameters. NAND Flash vendors ride the scaling curve of Moore’s Law as hard as they can, with 25nm and “20nm-class” commercial devices now in production. NAND Flash manufacturing is way ahead of production DRAM and logic lithographies. At the same time, NAND Flash vendors have started to pack two or more bits into each logic cell using the inherent analog nature of the NAND Flash charge-storage mechanism to correlate the amount of stored charge with a mutlibit value.

As a result of these optimizations, several important characteristics of NAND Flash memories degrade as bit density and cost/bit improve. In particular, performance (read/write speed) and reliability suffer. Taking these trends to an extrapolated conclusion, the authors write “…it will be extremely difficult to design SSDs that reduce cost per bit without becoming too slow or too unreliable (or both) as to be unusable in enterprise settings. We conclude that the cost per bit for enterprise-class SSDs targeting general-purpose applications will stagnate.”

Grim news indeed.

Or is it? After all, it’s only a prediction, so far.

Popular online summaries of this paper dwell on the inevitable winding down of SSDs in the future—perhaps over the next ten or twelve years. But before we start to run around like Chicken Little and proclaim that the sky is falling, let’s take a look at the assumptions the authors have made because they exert a significant bias to the conclusions. Perhaps things are not as bleak as the paper’s title might have us believe.

One of the key elements used to create the paper’s dire forecast is the creation of a formal model of an SSD called the “SSD-CDC” or “SSD with a constant die count.” Here’s the reasoning: Current commercial SSD controller chip architectures implement 24 NAND Flash control channels with each channel handling a maximum of four NAND Flash die. So the SSD-CDC can accommodate no more than 96 NAND Flash memory die. The “SSD-CDC’s architecture is representative of high-end SSDs from companies such as FusionIO, OCZ and Virident,” write the authors.

The 96-die limitation is an important limiting assumption in this paper. Given a physical limitation with respect to die count per SSD, the only way to increase drive capacity is to increase die capacity. You can ride Moore’s Law just so far and in this paper the authors ride Moore’s Law all the way to 6.5nm process geometries from a 34nm baseline.

Another capacity dimension is the number of bits stored in each NAND Flash memory cell. The authors ride the bit/cell dimension to three (three-layer cell (TLC) NAND Flash die). With the number of die limited to 96 and the number of bits/cell limited to three, the authors hit a 14Tbyte ceiling for SSD capacity when the geometries hit 6.5nm.

Well, 14Tbytes isn’t a bad capacity, at least not today, but it’s the latency and bandwidth limitations that worry the authors more. They write, “Reaching beyond 4.6 TB pushes write latency to 1 ms for MLC-2 and over 2.1 ms for TLC. Read latency, rises to least 70 μs for MLC-2 and 100 μs for TLC… Either SSD-CDC’s capacity stops scaling at ~4.6 TB or its read and write latency increases sharply because increasing drive capacity with fixed die area would necessitate switching cell technology from SLC-1 or MLC-1 to MLC-2 or TLC-3… SSDs offer moderate gains in bandwidth relative to disks, but very large improvements in random IOP performance. However, increases in operation latency will drive down IOPs and bandwidth.”

Do you believe this? Is it true?

Given the authors’ assumptions, it might well be true, if all factors stay constant.

However, all factors are not constant no matter what kind of technology you’re talking about.

Perhaps the biggest factors to discuss here—although not the only ones—are 3D IC assembly and 3D IC manufacturing. First, the NAND Flash vendors know they are in a situation of diminishing returns and have started to seriously consider 3D IC manufacturing to allow a Z dimension in the construction of individual NAND Flash memory cells. (See “3D Thursday: A look at some genuine 3D NAND cells, courtesy of Micron” and “The End of NAND Flash as we Know It: Micron’s Dean Klein and Samsung’s Tony Kim Look at Life After Flash”)

Multi-level and triple-level NAND Flash cells are not the only way to skin the cat, so to speak, and 3D IC manufacturing could revolutionize NAND Flash manufacture as early as next year. Micron, Samsung, and Toshiba have all discussed 3D NAND Flash cell architectures at public events. The onset of 3D NAND Flash memory cells will certainly affect some of the latency and bandwidth assumptions in the “Bleak Future” paper.

Next, consider 3D IC assembly. The notion of limiting each NAND Flash controller channel to four die is based, in part, on a pcb real-estate calculation that doesn’t take 3D IC assembly into account. You can save a lot of real estate using 3D IC assembly techniques so the assumed 4-die/channel limitation might therefore fall by the wayside.

You could also call into question the 24-channel limitation itself. Again, more reliance on 3D IC assembly techniques might well throw the channel limitation into limbo as well. There’s nothing inherently “right” about 24 channels. It’s not even a power of two.

Finally, NAND Flash might not be the technology of the future at all for SSD storage. There are other technologies at various stages of production readiness set to challenge NAND Flash semiconductor storage. MRAM (magnetic RAM) is in early-stage production at Everspin and a handful of players including Everspin seem poised to introduce STT (spin-torque transfer) MRAM, which could well prove to be a real challenger to NAND Flash memory. (See “The return of magnetic memory? A review of the MRAM panel at the Flash Memory Summit”) In addition, there’s lots of news lately about memristor and memristor-like memory, although nothing’s yet reached production. (See “HP’s memristor finds a commercial semiconductor vendor: Hynix”.) The characteristics of these new memory technologies will also alter the landscape with respect to the assumptions discussed above. There’s no law that says SSDs must be built with NAND Flash memory.

So, in short, the “Bleak Future” paper is effective in pointing out some imminent pitfalls but we don’t know how long SSDs will really last because technological disruptions prevent the smooth evolution of storage devices and stymie this sort of analysis.

Let me leave you with three lessons from memory and storage history:

Magnetic core memory reigned as the random-access memory of choice for 20 years—from about 1953 when the MIT Whirlwind computer became the first electronic computer to use magnetic cores through the early 1970s. Within two years of the Intel 1103 DRAM introduction in 1970, magnetic core memory production dropped off the cliff. That’s how fast a memory revolution can take place if the new replacement technology is sufficiently compelling. (Note: magnetic memory may make a comeback if commercial MRAM succeeds.) The first commercially available 8-in floppy disk drive—from IBM—appeared in 1971. For decades, floppy disks regularly grew in capacity and shrank in physical volume—only to be far outpaced by size of media files, the growth in software program footprint, and the capacity increases enjoyed by hard disk drives. After three decades, the gap between floppy capacity and the immediate needs for removable storage caused the mighty floppy disk drive to fall by the wayside and solid-state USB drives got their chance to shine as NAND Flash memory technology finally came into its own—after 15 years of development. The demise of hard disk drives has long been predicted. Analysts said that the rate of capacity increase for a hard disk drive was unsustainable. They were wrong. Often. Consistently. The magneticians continued to find and exploit amazing new magnetic properties so that 3.5-inch hard drives now have Tbyte capacities. Some of these amazing developments included giant magnetoresistance and PRML (Partial Response Maximum Likelihood) coding.

The lesson from these three examples is that technological revolutions are impossible to predict accurately, as are their effects.

Want to hear more about these topics? Well, Memcon 2012 is coming in September and the registration page is now open. Click here.