Opinion NVMe-over-Fabrics (NVMeF) shared storage access could kill the legacy storage array business – unless vendors get inventive and somehow continue to supply charged-for data management services alongside NVMeF data access.

How do we work this out?

An NVMeF setup works with applications in servers requesting a storage IO, and the server and target storage system using an RDMA transfer to transfer the data directly to/from server memory and the storage drive – inevitably a fast-access solid state drive.

The reason why this is worth bothering with is that the virtualised, multicore, multithreaded servers find they are waiting for IO to complete because the networked SANs and filers they use can't respond fast enough. Replacing disk drives in these storage systems with SATA and then SAS flash drives (SSDs) sped things up but the two networks involved – the SATA or SAS one inside the array, and the block-access Fibre Channel/iSCSI or file-protocol one between the array and accessing servers – still take too much time for IO requests data to transit them.

The internal array network problem can be fixed by using NVMe drives, which are faster than SAS and SATA, and an NVMeF network. Then data to/from the drives is RDMA-transferred into the storage array controller's memory. There it is processed through the controller software stack and comes in/goes out of the array across the external network.

NVMeF scheme

Both those things take time as well. The NVMeF scheme is to replace traditional block-access networks with a quasi-extended PCIe bus and have an end-to-end NVMe protocol, with vastly increased parallelism over SCSI, running as an RDMA-transfer between accessing servers and the target storage array. This cuts down the physical network transit time and cuts the storage array controller's software stack out of the equation with direct access to the drives.

OK, some of the array controller software stack is inherent in block access protocols, such as dealing with LUNs and mapping them to drives. But other parts, such as RAID schemes, are not, and these are in the data path. No array controller means no array-controller data management services.

We're seeing flash drives get large enough that networked access to shared storage to access data sets larger than physical drives is becoming less important. Seagate has a 64TB SSD and Samsung is talking about a 128TB one.

The NVMeF access and much increased server direct-attached storage (DAS) capacity points both mean no array controller is needed, which could mean the death of the all-flash dual-controller and monolithic arrays we see. Instead the array becomes, essentially, just a bunch of flash drives (JBOF) forming a remote DAS structure, with an NVMe frontend where some skeletal shared access is needed, or goes away altogether in hyperconverged systems with large DAS capacities.

The array suppliers – meaning Dell, HDS, HPS, IBM, NetApp, Tegile, Tintri and others – could do... well, what?

Migrate controller data management into application stack

One possibility is to migrate some array controller functionality into the accessing servers and have that work alongside, somehow, the NVMe data access process. If that was feasible then they could charge for that.

Data management services have been provided at the server application stack level in the past. For example:

Veritas Volume Manager – VxVM and VxFS

Veritas Volume Replicator

OS with built-in logical volume manager

Oracle DataGuard

That means, though, that the NVMe drives are not seen directly but accessed through, for example, a volume manager, and that access takes time.

Some of that time could be cancelled out by doing data management in hardware. RAID has already been implemented in hardware as has compression and erasure coding is another relatively low-level activity that could head into an ASIC or FPGA.

But higher-level services, like deduplication, need CPU cycles and memory and can't be cancelled out with hardware.

We could take the approach that says an array controller with an internal NVME fabric and drives could respond to data requests in 200 microseconds while an NVMe access to a drive takes, say, 10 microseconds. By making the data management stack more efficient and doing low-level things in hardware then that 200 microseconds could be cut, say, under 100 microseconds, and give us some NVMeF acceleration without compromising data management services.

And those data management services could be done either in the array controllers or in the application servers.

Dual-access arrays

Another approach could be to have dual-track arrays by bolting a JBOF for primary data on top of or alongside the current array and, somehow, bleeding data to/from the JBOF into a data-managed storage domain for secondary data where data is protected, replicated, deduplicated, whatever.

That would help customers transition to the coming NVMeF era by parallel running NVMeF data access and legacy block data access streams.

Providing data management services in an NVMeF era

We need to point out that data management services are not simply nice to have. Data protection, replication, deduplication, etc. are all good ideas to deal with failed drives and server systems, and costly storage. Some data management functionality needs to run in the accessing servers and protect against (DAS) drive failures and server failures.

Who will provide this? Firstly, server suppliers could, with OS extensions, and, secondly, array suppliers might, with array software components transformed into server plug-ins.

This whole area is, currently, problematic for array suppliers who have, somehow, either to plot a path to an NVMeF future that potentially renders their current kit useless for storing primary data, or find a way of accessing and storing data that is equal to or better than NVMeF. That looks like a tall order.

It's a potential opportunity for server and server system software suppliers rather than a looming problem. Could Veritas Volume Manager and similar products get a new lease of life?

Server system software and array controller software engineers can both be fiendishly clever. Let's see what they come up with. ®