In between work on other projects, I’ve been running exploratory tests on PCIe SSDs to see how their performance characteristics differ from the SATA drives we’ve been reviewing for the past few years. This research is part of a larger effort to come up with a new collection of tests for our next-gen storage suite.

Our old suite dates back to 2011, and though it’s been tweaked here and there, it’s long overdue for a major overhaul. The tests we conceived three years ago aren’t ideal for the latest SATA drives, let alone the coming wave of PCIe SSDs. So, I’ve been testing some newer drives to see what it takes to exploit their full potential.

SSDs have been bottlenecked by the Serial ATA interface for quite some time. The 6Gbps host connection is the obvious limitation, but it’s not the only one. SATA is also constrained by the AHCI protocol, which was conceived in an era of much slower mechanical drives.

Fortunately, the table is set for a PCI Express revolution. Windows 8.1 offers native support for NVM Express, a newer protocol designed specifically for PCIe SSDs. Intel’s 9 Series chipsets have their own provisions for PCIe drives, and compatible M.2 slots can be found on most enthusiast-oriented motherboards from that camp.

The PDC P3700 (top), M6e (right), and 850 Pro (left)

There aren’t a lot of purebred PCIe SSDs on the market right now, but we can get a sense of what’s possible from Intel’s DC P3700. This datacenter drive comes on a half-height expansion card with a beefy heatsink. It has four-lane Gen3 interface, and it’s based on the NVM Express protocol. It’s also extremely expensive; our 800GB sample sells for a whopping $2,600 at Newegg.

Although the P3700 works in standard desktop systems, we haven’t been able to boot Win8.1 on the thing. The motherboards we’ve tried don’t even show the drive as a boot option in the firmware. That’s not entirely surprising given the P3700’s target market, but it does highlight the fact that not all PCIe SSDs are fully supported in the PC arena.

Plextor’s M6e is a whole other story. This drive has a tiny M.2 2280 form factor, and it’s only $220 for 256GB. The dual-lane Gen2 interface is a good fit for 9-series motherboards, and we haven’t had any issues booting Windows. However, the M6e uses AHCI instead of NVMe, so it’s not a truly next-gen product.

For comparative reference, we’ve also been running preliminary tests on Samsung’s 850 Pro 512GB. It’s the fastest SATA SSD we’ve encountered to date, making it a good control of sorts.

Our first batch of results comes from Iometer, which lets us tweak the queue depth and the number of workers (threads, basically) hammering the drive with I/O. The number of concurrent I/O requests is the product of the number of workers and the queue depth. For example, one worker at QD32 produces 32 concurrent requests—the same as for a four-worker config at QD8.

Even with the lightest load, the P3700 more than doubles the sequential speeds of the other SSDs. Its read performance scales up aggressively as the queue depth rises, but there’s less improvement with writes. Interestingly, the P3700’s sequential speeds drop in our four-worker tests, at least versus single-worker configs with the same number of simultaneous requests.

In the write speed test, the four-worker setups produce similar slowdowns on the M6e and 850 Pro. There’s less of an impact in the read speed test, where the performance of those drives is fairly consistent across our six load configurations. Neither the M6e nor the 850 Pro hits substantially higher speeds under heavier loads.

Additional workers do help with random I/O, at least for some of the SSDs. Check out the peak 4KB random write rates:

The P3700 and M6e both get a boost from additional workers. The gains are bigger with heavier loads, especially on the Intel SSD. Check out the 50% jump in IOps from one worker at QD32 to four workers at QD8.

Curiously, the 850 Pro doesn’t respond well to loads spread across multiple workers. Its random write rate drops substantially when we switch from one worker to four, even when the total number of concurrent requests remains the same. That’s a shame, because the 850 Pro actually outperforms the M6e with a single worker.

Those random write peaks are much higher than the sustained rates that each SSD achieves. Here’s a closer look at how the drives compare across a 30-minute test. Click the buttons below the graph to switch between the various worker-and-queue combos.

All the drives peak early before trailing off as the clock ticks. The speed and shape of the decline is different for each one, in part because of the large differences in overprovisioned area.

The M6e 256GB and 850 Pro 512GB allocate roughly the same percentage of flash to overprovisioned area, but since the Samsung has a higher total capacity, it has more of this “spare” area to devote to accelerating incoming writes. The P3700 800GB has an even higher total capacity, but that’s not all. Like most server-grade gear, it also sets aside a much larger percentage of its flash as overprovisioned area.

The P3700 is a beast, and so is this test. I can’t think of a client application that generates an uninterrupted stream of random I/O for any considerable length of time. One of the biggest challenges with developing this new suite is balancing our desire to push drives to their limits with the need to present performance data that’s actually relevant to desktop workloads.

Sequential speeds don’t waver over longer tests, so there’s no need to draw out those results out over time. The same goes for random read rates. IOps is the most commonly used metric for random I/O, but we think response times can be more instructive.

All the SSDs are in the same ballpark up to four simultaneous requests. The M6e and 850 Pro slow down considerably after that, and they really struggle under our heaviest load. The P3700’s response times get slower, as well, but not by nearly as much.

Thanks to our resident developer, Bruno “morphine” Ferreira, we have another storage benchmark with a configurable load. RoboBench is based on Windows’ robocopy command, which can be run with up to 128 simultaneous threads. With the aid of a RAM drive, we can use RoboBench to test read, write, and copy speeds with real-world files. Here’s a taste of how RoboBench scales when reading files:

The work test uses tens of thousands of relatively small spreadsheets, documents, web-optimized images, HTML files, and the like. Read speeds increase dramatically to start, but the gains peter out as the thread count rises.

The media test comprises much larger movie, RAW, and MP3 files. Four threads are sufficient to reach top speed even on the P3700.

Robocopy defaults to eight threads, so that’s probably a good test to use along with the single-threaded config. It’s more difficult to make a case for testing additional configurations, in part because of the time required to secure-erase and pre-condition SSDs before any test that writes to them.

The above results provide a small taste of what we’re working on for future SSD reviews. I have a tendency to go a bit overboard with testing, but I’m trying to exercise more restraint this time around. We’ll see how that works out. Stay tuned.