Memblaze differentiates itself from the competition with excellent performance and customizable designs. Let's take a closer look at the PBlaze3L PCIe SSD.

Introduction

VIEW GALLERY - 56 IMAGES

The enterprise PCIe SSD market is primarily composed of established players, and enjoys steady growth. The winds of consolidation have blown several PCIe pioneers into larger corporations with direct access to NAND flash. This gives them direct access to fabrication, or at the very least, a guaranteed supply of the most costly component of any solid state device. Several of the NAND fabs also already have homegrown programs that have fielded a number of very competitive offerings. The ability to create their own NAND yields time-to-market and cost benefits, but also makes it difficult for emerging players to carve out a spot in the lucrative PCIe SSD market.

Memblaze has gained traction with their innovative products. Their PCIe 2.0 x8 PBlaze3L and PBlaze3H products offer impressive performance specifications that rival competing offerings, and a unique architecture allows their customers to design custom-capacity units. The Memblaze PBlaze3L we are covering today comes in the HHHL form factor, and the high-performance and capacity Blaze3H variant comes in the FHHL form factor. Both models offer up to 25 different capacity points, and are available with SLC or MLC NAND.

The PBlaze3L provides 615,000 random read IOPS, and 130,000 random write IOPS. Sequential performance is stout at 2.4 GB/s read, and 1.1 GB/s write. It is important to note that these performance specifications come at 80% of capacity utilization. The management utility allows easy configuration for various levels of overprovisioning, which boosts performance and endurance while reducing performance variability. We test the Pblaze3L in full capacity mode, and with the recommended 80% utilization high-performance mode.

One key weakness of PCIe SSDs can be inflexible capacity configurations. Spare capacity can lead to an underutilized and isolated pool of storage, which is far from ideal considering the price premium. Memblaze allows custom capacity configuration to maximize ROI, and tailor the SSD to specific requirements.

The key to the varying capacity points lies in Memblaze's Pianokey Technology. This unique design utilizes a modular design with four different modules of varying NAND capacity. Memblaze integrates modules into the final product according to customer requirements. This allows users to select up to 25 different capacity points for both MLC and SLC models, and capacity is adjustable in either 25GB or 50GB increments. The Pianokey architecture manages to keep performance specifications static, regardless of capacity, which is an advantage at smaller capacity points.

Both models offer plenty of capacity - up to 2.4TB on the PBlaze3L, and 4.8TB on the PBlaze3H. The PBlaze3L utilizes one FPGA controller, and the PBlaze3H is a dual-FPGA solution. Both cards utilize DRAM in varying capacities, and polymer capacitors protect against host power-loss events.

Memblaze also touts their device-based architecture as a key advantage over competing solutions. Many host-based architectures, such as Virident and Fusion-io products, utilize the host for FTL, wear leveling, garbage collection, and flash management tasks. This leads to computational overhead and RAM usage, but also expands functionality when utilizing software-based enhancements. Memblaze offloads all of these features onto Pblaze SSDs. Others, such as Micron and Intel, also favor this approach. Both approaches have their pros and cons, and Memblaze favors enabling full utilization all of the compute and memory capacity for application performance. This lowers system RAM usage to 1MB.

Memblaze also offers a suite of proprietary enhancements for internal management. MemWA minimizes Write Amplification through refined GC and static/dynamic data rotation. MemSmooth reduces latency variation through efficient wear leveling, and MemSecure provides full end-to-end data path protection, cross-die device-level RAID, power loss protection, and randomization for data protection. MemCare adaptively manages NAND settings to reduce wear and extend endurance, and all of these features combine to provide enhanced endurance, and a two-million hour MTBF. The PBlaze3L is covered by a three-year warranty.

The Memblaze Pblaze3L checks all of the boxes on enterprise features, and offers compelling differentiators. Let's take a closer look.

Internals and Specifications

Memblaze 1.2TB PBlaze3L Internals and Specifications

The PCIe Gen 2.0 x8 Memblaze PBlaze3L comes in the familiar HHHL (Half-height Half-Length) form factor. The rear of the SSD includes several DRAM packages.

The bracket houses three status lights for monitoring and beacon purposes.

The top plate covers a unique NAND arrangement that is the cornerstone of the Pianokey architecture. There are thermal pads to cool the NAND packages, and two metal blocks for NAND placement.

Four individual PCBs, connected via a ribbon cable and connector, snap into the main PCB of the PBlaze3L. These NAND packages are customizable to varying capacities for granular capacity control. The sample utilizes Toshiba NAND, but Memblaze can deploy several types of NAND in their products.

The large XILINX Kintex-7 XC7K325T FPGA resides under the large heat sink and polymer capacitors for power-loss protection, and more DRAM rings the FPGA. Memblaze refers to the FPGA with customized firmware as the MemBrain.

Specifications

MemSphere Management Utility

Page 3[MemSphere Management Utility]

Memblaze offers GUI, CLI, telnet, and SSH management. The Memblaze MemSphere Manager is a fully functional GUI management utility that provides a tremendous amount of easily accessible monitoring and management features.

The Information panel lists the minimum and maximum addressable capacity, along with the current configuration capacity. A real-time endurance monitor measures remaining NAND endurance with incredible granularity. The utility also lists the amount of data written and read from the SSD. Write amplification, and several latency and throughput metrics are also included. Thermal monitoring also indicates if the device is within the safe zone.

The Performance tab provides easy monitoring of bandwidth and IOPS performance. The panel also includes a real-time latency QoS breakdown in the I/O Delay Behaviors panel.

The Endurance tab includes more specific endurance information, including the amount of data used and remaining.

The Ecological Data tab provides in-depth temperature monitoring statistics, including minimum and maximum values.

The Control tab allows users to signal drive location through the beacon on the rear of the SSD. Users can also secure erase the drive, and update the firmware from this panel.

Test System and Methodology

We designed our approach to storage testing to target long-term performance with a high level of granularity. Many testing methods record peak and average measurements during the test period. These average values give a basic understanding of performance, but fall short in providing the clearest view possible of I/O QoS (Quality of Service).

While under load, all storage solutions deliver variable levels of performance. 'Average' results do little to indicate performance variability experienced during actual deployment. The degree of variability is especially pertinent, as many applications can hang or lag as they wait for I/O requests to complete. While this fluctuation is normal, the degree of variability is what separates enterprise storage solutions from typical client-side hardware.

Providing ongoing measurements from our workloads with one-second reporting intervals illustrates product differentiation in relation to I/O QoS. Scatter charts give readers a basic understanding of I/O latency distribution without directly observing numerous graphs. This testing methodology illustrates performance variability, and includes average measurements, during the measurement window.

IOPS data that ignores latency is useless. Consistent latency is the goal of every storage solution, and measurements such as Maximum Latency only illuminate the single longest I/O received during testing. This can be misleading, as a single 'outlying I/O' can skew the view of an otherwise superb solution. Standard Deviation measurements consider latency distribution, but do not always effectively illustrate I/O distribution with enough granularity to provide a clear picture of system performance. We utilize high-granularity I/O latency charts to illuminate performance during our test runs.

Our testing regimen follows SNIA principles to ensure consistent, repeatable testing, and utilizes multi-threaded workloads found in typical production environments. We tested the 1.2TB Memblaze PBlaze3L with 80% of available capacity (High Performance Mode), and with a full LBA span, against the 1.6TB Intel DC P3700, the 1.4TB Micron P420m, and the 2.2TB HGST FlashMAX II.

The first page of results provides the 'key' to understanding and interpreting our test methodology.

Benchmarks - 4k Random Read/Write

4k Random Read/Write

We precondition the 1.2TB Memblaze PBlaze3L for 15,000 seconds, or four hours, receiving performance reports every second. We plot this data to illustrate the drives' descent into steady state.

This dual-axis chart consists of 30,000 data points, with the IOPS on the left, and the latency on the right. The red dots signify IOPS, and the grey dots are latency measurements during the test. We place latency data in a logarithmic scale to bring it into comparison range. The lines through the data scatter are the average during the test. This type of testing presents standard deviation and maximum/minimum I/O in a visual manner.

Note that the IOPS and Latency figures are nearly mirror images of each other. This illustrates that high-granularity testing gives our readers a good feel for latency distribution by viewing IOPS at one-second intervals. This should be in mind when viewing our test results below. This downward slope of performance only happens during the first few hours of use, and we present precondition results only to confirm steady state convergence.

Each level tested includes 300 data points (five minutes of one second reports) to illustrate performance variability. The line for each OIO depth represents the average speed reported during the five-minute interval. 4k random speed measurements are an important metric when comparing drive performance, as the hardest type of file access for any storage solution to master is small-file random. 4k random performance is a heavily marketed figure, and one of the most sought-after performance specifications.

The Memblaze PBlaze3L at 80% utilization averages 579,522 IOPS at 256 OIO (Outstanding I/O). With 100% utilization, it scores nearly identical with 578,387 IOPS. The Intel DC P3700 averages 467,055 IOPS, the HGST FlashMAX II averages 350,532 IOPS, and the Micron P420m averages 724,958 IOPS. A key consideration is the superb scaling at 32 - 128 OIO, where it easily leads the rest of the pack. Many workloads hover in this range, and the excellent low-load scaling is impressive.

Our Latency vs IOPS charts compare the amount of performance attained from each solution at specific latency measurements. Many applications have specific latency requirements. These charts present relevant metrics in an easy-to-read manner for readers who are familiar with their applications requirements. The arrays that are lowest and furthest to the right exhibit the most desirable latency characteristics.

The PBlaze3L scores 559,623 IOPS at .2ms, and the nearest competitor is 462,000 IOPS from the P3700. The swing low and to the far right highlights the impressive latency profile with random read workloads, and the results are so similar between 80 and 100% utilization modes that the 100% results hide behind the red line.

The Memblaze PBlaze3L at 80% utilization averages 148,593 IOPS at 256 OIO, and the difference between the two modes crops up in random workloads. At 100% utilization, the Pblaze3L averages 70,358 IOPS. The Intel DC P3700 averages 147,846 IOPS, the FlashMAX II averages 119,662 IOPS, and the P420m averages 98,304 IOPS. One impressive aspect is the clearly defined performance envelope in both modes.

The Memblaze3L is very competitive in latency-v-IOPS measurements with a 4k write workload.

Our write percentage testing illustrates the varying performance of each solution with mixed workloads. The 100% column to the right is a pure 4k write workload, and 0% represents a pure 4k read workload. Mixed I/O is a constant reality In VDI, and other intensive applications, resulting in the I/O blender effect. The PBlaze3L cuts an impressive swath across the mixed workload spectrum, and leverages its tremendous read performance to take an easy win in this test.

The 80% utilization mode delivers incredibly consistent latency performance.

Benchmarks - 8k Random Read/Write

8k Random Read/Write

Many server workloads rely heavily upon 8k performance, and we include this as a standard with each evaluation. Many of our server workloads also test 8k performance with various mixed read/write distributions.

At 80%, the PBlaze3L averages 285,564 IOPS at 256 OIO, and scores slightly higher with 100% utilization at 288,563 IOPS. The Intel averages 275,367 IOPS, the FlashMAX II averages 207,859 IOPS, and the Micron P420m takes a lead to top the chart at 387,785 IOPS. The Memblaze3L rivals the excellent scaling from the P3700 in the middle ranges.

The Micron P420m takes the performance crown under the heaviest workloads, but the PBlaze3L distinguishes itself with great performance in our Latency-v-IOPS chart.

The Memblaze Pblaze3L averages 62,023 IOPS at 256 OIO with 80% of capacity utilization, and 32,717 IOPS with 100% utilization. The Intel DC P3700 averages 66,082 IOPS, the P420m hits 79,633 IOPS, and the FlashMAX II tops out at a speed of 59,279 IOPS. The PBlaze3L offers very consistent performance in both performance modes.

The PBlaze3L offers solid performance in this test.

The PBlaze3L takes a big lead in mixed 8k random workloads.

The PBlaze3L provides excellent latency and performance in mixed random workloads.

Benchmarks - 128k Sequential Read/Write

128k Sequential Read/Write

128k sequential speed reflects the maximum sequential throughput of the SSD, and is indicative of performance in OLAP, batch processing, streaming, content delivery applications, and backup scenarios.

The Memblaze PBlaze3L isn't quite as fast at sequential workloads with a read speed of 2,235 MB/s at 256 OIO with 80% utilization, and a near mirror image 2,254 at 100% utilization. The Intel DC P3700 averages 2,392 MB/s, the HGST FlashMAX II averages 2,664 MB/s, and the P420m leads with 3,186 MB/s.

The PBlaze3L falls behind competing drives in the sequential read testing.

Sequential write performance is important in tasks such as caching, replication, HPC, content delivery applications, and database logging. The Memblaze PBlaze3L averages 997 MB/s at 80%, and 982 MB/s at 100% utilization. The Intel DC P3700 scores 1,818 MB/s, the HGST FlashMAX II weighs in with 1,091 MB/s, and the P420m brings in the rear with 647 MB/s.

The PBlaze3L redeems itself with a strong second place showing in sequential write performance. The Intel P3700 is hard to beat in this test, and the PBlaze3L comes in a distant second.

The PBlaze3L comes roaring back in our mixed sequential workload testing, leading the majority of the write mixtures.

Database/OLTP and Webserver

Database/OLTP

This test consists of Database and On-Line Transaction Processing (OLTP) workloads. OLTP is the processing of transactions such as credit cards and high frequency trading in the financial sector. Databases are the bread and butter of many enterprise deployments. These demanding 8k random workloads with a 66 percent read and 33 percent write distribution bring even the best solutions down to earth.

The Memblaze PBlaze3L with 80% capacity utilization averages an amazing 149,624 IOPS at 256 OIO, and with full capacity, averages 87,338 IOPS. The Intel DC P3700 averages 132,238 IOPS, the HGST FlashMAX II averages 121,240 IOPS, and the Micron P420m averages 102,265 IOPS. The PBlaze3L offers two distinct performance profiles, with full capacity utilization posting the lowest score, and the high-performance mode dominating the chart.

The PBlaze3L at 80% utilization delivers the best performance-to-latency ratio of the test pool.

Webserver

The Web Server workload is read-only with a wide range of file sizes. Web servers are responsible for generating content users view over the Internet, much like the very page you are reading. The speed of the underlying storage system has a massive impact on the speed and responsiveness of the server hosting the website.

The PBlaze3L at 80% averages 134,020 at 256 OIO, and posts an identical score with full 100% utilization. The P420m takes the lead at 256 OIO with 184,845 IOPS, the Intel DC P3700 averages 138,073 IOPS, and the FlashMAX II averages 110,747 IOPS.

The Intel P3700 and Micron P420m take the lead in this test.

File Server

The File Server workload tests a wide variety of file sizes simultaneously with an 80% read and 20% write distribution. The wide variety of simultaneous file size requests is very taxing on storage subsystems.

The PBLaze3L at 80% averages 140,420 IOPS at 256 OIO, and 91,648 at full capacity usage. The Intel averages 109,585 IOPS, the HGST FlashMAX II averages 91,820 IOPS, and the P420m averages 70,574 IOPS. The PBlaze3L at 80% capacity utilization easily leads this test.

At 80% capacity, the PBlaze3L distances itself from the competition.

Final Thoughts

Memblaze is working on several new products; including the new NVMe-powered PBlaze4 series of PCIe SSDs. Memblaze also has a new FlashRAID controller family in the works. This PCIe 3.0 controller will aggregate the performance of up to 12 NVMe SSDs into a single storage device with 4TB, 8TB, and 16TB configurations. This flexible configuration will provide global wear-leveling capability to maximize endurance and data protection. Memblaze has already enjoyed tremendous success in overseas markets, and their expansion into the U.S. market will expose them to more competition.

The first key to being competitive lies in product differentiation. The PBlaze3L offers a unique Pianokey architecture that provides almost unlimited capacity customization. This flexible platform minimizes wasted capacity by allowing customers to tailor the size of the device to their needs. Memblaze also provides incredibly granular control and monitoring with their MemSphere management utility. The easy-to-use GUI features real-time endurance statistics and performance monitoring, and the CLI and other management features round out expansive management options.

Memblaze specs their SSD at 80 utilization, and users can configure the PBlaze3L for any level of overprovisioning during the secure erase process. This adds another layer of customization to help tailor performance and endurance for applications. The high-performance mode provides enhanced performance, but comes at the expense of capacity.

In our testing, the 80% utilization (high performance) mode provided stellar performance that rivals the leaders in the PCIe SSD market. In mixed random read the PBlaze3L only trailed the Micron P420m, but that was at the highest 256 OIO. In the middle range from 32-128 OIO, where most real-world workloads reside, the PBlaze3L delivered the highest performance, and an excellent IOPS-to-latency ratio. We did not observe mentionable differences in 100% random read workloads with extra overprovisioning.

The extra overprovisioning kicked in for random write workloads, and the PBlaze3L scored a solid second place. The PBlaze3L also stood out with its stunningly consistent random write performance in low-performance and high-performance mode. The clearest advantage came in mixed random workloads, where the PBlaze3L demonstrated remarkable performance that easily beat the rest of the test pool.

Pure sequential read/write workloads were not as impressive. The PBlzae3L trailed competing SSDs in sequential read performance, but mustered second-place in sequential write performance. The real advantage came in mixed sequential workloads, where the PBlze3L commanded the majority of read/write mixtures. In server workload testing, the PBlaze3L scored a resounding win in OLTP and fileserver tests, but fell to third in the read-only category.

Memblaze utilizes a proprietary suite of optimizations to deliver static performance specifications over a large range of capacities. In many cases, lower capacity results in lower performance, but Memblaze guarantees performance at all capacities. Power-loss protection and a cross-die NAND parity scheme protect data, but the three-year warranty is a bit lacking in a space where five-year warranty periods are the standard. The Memblaze PBlaze3L left us impressed with its range of differentiating features and excellent performance, meriting the TweakTown Best Features Award.