When Bad Things Happen To Good Flash

In November of last year, I wrote Disaster Strikes: How Is Data Recovered From A Dead Hard Drive?, chronicling the process as some of my personal storage was brought back from the dead by Seagate Recovery Solutions. Of course, these days we have to worry about more than the loss of important files from mechanical disks. Whether it’s the USB drive in your pocket, the eMMC in your phone, or the SSD in your notebook, flash storage is now just as critical to personal and professional interests, and prone to failure, just like hard drives.

As a follow-up to our coverage last year, we connected with Flashback Data, an Austin-based rescue lab that handles all types of storage devices but carries special expertise in flash media. We even got a special kick when calling the company’s main office and finding Queen’s “Flash” as the on-hold music. Flashback agreed to take us on a tour behind the scenes and show what it takes for a top-level recovery lab to salvage your precious NAND-based bits from the jaws of ruin.

A Range Of Reading

When Flashback first got into the recovery business, most of its activities focused on swapping out faulty chips. Over time, this became increasingly difficult as vendors started sourcing different components for different manufacturing runs of the same drive model. Encryption also began to appear on some drives, making matters even more complex. This required Flashback to be able to read the drive’s memory directly, which in turn meant that the lab needed a dizzying host of ways to read chips from across the breadth of the flash industry.

Note that when Flashback refers to “encryption,” this state is generally unknown to the user. Since about 2006, for example, SanDisk has been encrypting data on all of its flash drives, according to co-founder and vice president Russell Chozick. As with self-encrypting hard drives, the controller encrypts all data as it proceeds to flash memory. However, since no password is employed to lock the encryption, data is decrypted as it gets fetched from the media. So in the case of a broken PCB, Flashback attempts to move the controller and memory chips to a new board. “If the controller chip fries, though, it’s going to be almost impossible to get that data back. The controller keeps all of the information about how to decrypt the data. Lose that and you’re pretty much...well, you’ve got issues, big time.”

Flash Types

These dark gray chips are of the TSOP48 (48-pin) variety. They were fairly standard on USB flash drives, SSDs, SD cards, and CF cards for years, although they have started to give way to other formats recently. The bottom specimen is the underside of a TLGA chip. Notice how, instead of pins on the sides, the TGLA has pads on its underside. TLGAs are common in all types of flash and are in newer iPhones, as well.

During recovery, Flashback used to plant TSOP48 chips into reader sockets, but TLGAs had to be soldered onto a board. Obviously, this made analysis and data retrieval much harder. Life hasn’t gotten any easier as smartphones have pushed flash memory into newer, smaller package types that make these older “monolithic” formats look simple in comparison.

Flash Types, Continued

These SD cards and LaCie USB-based thumb drive are also monolith chips. Whereas most flash cards have a separate controller chip and memory chip, a monolith has both components contained in one tiny package. Obviously, breaks in such devices can happen at any of many different points. If the controller itself fails, technicians can still access the data through other means than the regular access pins where the device would connect to a card reader or camera/phone. To illustrate, this photo shows the LaCie drive with its traces partially exposed. Recovery technicians have to remove the black solder mask over the traces, to find out where the points are that they need to connect to a logic analyzer. Once all of the points are identified, the card can be wired up similarly to images later in this article.

To remove the mask, Flashback uses a surprisingly mundane approach: rubbing compound and a buffing wheel. It is possible to use chemicals for the same purpose, but Chozick says that Flashback has better luck with slow, careful buffing. With sanding, it’s too easy to damage the flash product’s very fine traces. We asked Chozick if Flashback could wire up the LaCie drive to illustrate, but we changed our minds upon learning that such work can take a technician an entire day.

Typical Flash Drive Failures

We’ve all seen pictures of hard drive damage, most of which tends to involve head crashes with circular grooves plowed into the magnetic media. With SSDs and flash products, nearly all of the damage that Flashback sees is invisible. In rare cases, there might be a burn mark on a PCB, but by and large, broken controllers or burned fuses leave no visible evidence. As a result, technicians have to go through the drive testing each resistor in a long, laborious process of trial and discovery. In comparison, a clean connector break like the one shown here is a cake walk for repair techs.

What About Wear-Out?

We’ve written previously about the tug-o’-war endurance battle between improving leveling algorithms and higher capacities versus shrinking fab processes. In particular, we’ve worried that flash drives and smaller SSDs that have been in service for several years now might start to exhibit wear-out.

Fortunately, Chozick says that most of the SSDs Flashback receives are less than a year old and haven’t had time to show NAND wear-out. In fact, actual wear-out cases are extremely rare. With USB flash drives, though, especially older ones with lower-grade leveling algorithms, wear-out is a bit more common. Technicians can read the chips just fine, but when they check the data, there are so many ECC bit errors that no data can be extracted. The four red dots in a later image show ECC problems. Major wear problems would be the opposite, with maybe four green dots.

Chozick says he has seen cases in which techs would do one analysis, take the chip out to, say, clean the solder pads, bring it back, and the data would be even worse because of the additional reading. So yes, wear-out is a real danger, but it’s not the ever-looming crisis some might fear.

Get It Hot

Many times, chips will need to be removed from PCBs with the help of a solder rework station. One of the first tools in this process is hot air. In this image, technicians are removing a TLGA chip from a USB drive. Technicians control the temperature and air pressure, heating the device just enough to melt the solder points so that the chip can be removed. These reworking stations also contain soldering irons, flux, ohmmeters, and other diagnostic equipment. Several of these stations occupy Flashback’s main lab, which spans roughly 5000 square feet.

Removing Memory

This SSD’s controller is toast, so Flashback technicians begin the gentle tearing down of the drive. Each memory chip is hand-numbered for tracking and easier reassembly of the data.

“Sometimes we won’t know exactly which components are bad,” says Chozick. “We just know that this is the type of drive where we see this or that firmware failure. Or this type of drive often has this kind of failure, so we need to remove the chips to start working on it. Obviously, our customers are often in a hurry, so many times you don’t get to know the exact reason why something fried or what is fried. But you do know that you’re not going to get it to read through this controller, and it’s not encrypted, so we can just start pulling chips, get them read, and do a rebuild.”

Pulling Chips

Flash drives and SSDs aren’t the only devices to get the heat treatment. Flashback sees a steady stream of cell phones come through its labs, such as this HTC Evo Android-based phone that drew its last breath in a swimming pool. Flash data recovery services run from the hundreds into the thousands of dollars, so it’s a safe bet that this phone’s contents weren’t your typical kid and kitty videos. Chozick says that it’s not uncommon to see phones come in containing the last known images of a departed friend or loved one. They also receive phones regularly as part of criminal investigations. A perp might try to crush his smoking gun underfoot, so to speak, but if the flash memory remains intact, the data can usually be retrieved for judge and jury.

The Evo is a couple of years old now. Newer phones, such as the Samsung Galaxy series and several others from HTC, often contain eMMC technology, which features the controller built into the memory module, as on an SD card. This can make retrieval considerably easier.

Hard Drive Vs. Flash Memory

The service area of a hard disk contains information that lets the drive communicate with itself. For the heads to be able to translate data into the read/write channels, the device must have information about where bad sectors are, how many heads there are, which are turned on or off, and so on. This information resides on the platters in a special service area separate from the user-addressable space.

With flash, manufacturers also leave room for a service area. This contains all the information about error correction codes, whether there’s a bit error in any given sector, where those sectors are located, etc.

Whereas a hard drive would be comprised mostly of 512-byte sectors, flash memory typically uses 528-byte sectors, where 512 bytes would be data and 16 bytes would be the service area. SSDs end up translating down to that user-accessible 512-byte size. But when Flashback reads the raw data, technicians get both pieces. The data area gets mixed in with the service area, so the resulting dump looks like data, service area, data, service area, alternating over and over. When technicians reassemble the image into workable information, all of the service area parts have to be removed.

Image: http://commons.wikimedia.org/wiki/File%3ADisassembled_HDD_and_SSD.JPG. By Rochellesinger at en.wikipedia [CC-BY-SA-2.5], from Wikimedia Commons.

Getting Up Close

Sometimes, technicians need to conduct a close visual examination of flash chips and their fragile innards. The best tool for this job is the Mantis microscope from Vision Engineering. Each unit costs $2000 or so, but they allow recovery workers to go hands-free and examine circuitry in 3D (via twin light paths projecting through a single viewing lens) at up to 20x magnification. The more natural experience and comfort of the Mantis helps technicians discover problems they might otherwise miss with conventional eyepiece microscopes. It also greatly helps with solder work, both in disassembly and repair.