Copy offload

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

At the 2014 LSFMM Summit, held in Napa, California March 24-25, Martin Petersen and Zach Brown gave an update on the status of copy offload, which is a mechanism to handle file copies on the server or storage array without involving the CPU or network. In addition, Hannes Reinecke and Doug Gilbert took the second half of the slot to discuss an additional copy offload option.

The Petersen/Brown talk was titled "Copy Offload: Are We There Yet?" and Petersen tried, unsuccessfully, to short-circuit the whole talk by simply answering the question: "Yes, thank you", he said and started to head back to his seat. But there was clearly more to say about a feature that allows storage devices to handle file copies without any involvement of either the server or the network—at least once the copy has been initiated.

Petersen said that he had been working on the feature for some time. He rewrote it a few times and had to rebase it on top Hannes Reinecke's vital product data (VPD) work. That last step got rid of most of his code, he said, and led to a working copy offload. The interface is straightforward, just consisting of target and destination devices, target and destination logical block addresses (LBAs), and a number of blocks. Under the covers, it uses the SCSI XCOPY (extended copy) command because that is "supported by everyone". It does not preclude adding more complicated copy offload options later, Petersen said, but he just wanted something "simple that would work".

Depending on the storage device, copy offload can do really large copies instantly, by just updating some references to the data, Ric Wheeler said. Someone asked what Samba's interface would look like. To that, Brown said that a new interface using file descriptors and byte ranges is the next step. It will be a single-buffer-at-a-time system call that handles descriptors rather than devices. It can return partial success, so user space needs to be prepared for that, he said. While he didn't commit to a date, Brown said that the interface would be much simpler now that Petersen had added XCOPY support.

LID1 and LID4

Moving on to the token-based copying, Gilbert noted that there are two big players in the copy offload world: VMware, which uses XCOPY (with a one-byte length ID, aka LID1), and Microsoft, which uses ODX (aka LID4 because it has a four-byte length ID). Storage vendors all support XCOPY, but ODX support is growing.

LID4 added a number of improvements to LID1, but it adds lots of complexity and ugly hacks too, Gilbert said. ODX is a Microsoft name for the "lite" portion of the original T10 (SCSI standardization group) document "XCOPYv2: Extended Copy Plus & Lite". ODX is a two-part disk-to-disk token-based copy, he said. It uses a storage-based gather list to populate a "representation of data" (ROD), which can be thought of as a snapshot ID. It also generates a ROD token that can be used to access the data assembled.

Wheeler noted that anyone who has the token value (and access to the storage) can copy the data without any security checks. "If you have the token, you have the data" is the model, Fred Knight said. That bypasses the usual operating system security model, though, which is something to be aware of, Wheeler said.

The lifetimes of the tokens (typically 30-60 seconds) will help reduce problems, Reinecke said. But Knight cautioned that lifetimes vary between implementations. In addition, Reinecke noted that the token is not guaranteed to work throughout the entire lifetime.

Gilbert said that ODX is a "point in time" copy, which sounds something like snapshots, but the 30-60 second lifetime makes them not particularly useful as snapshots. He then gave a demo that created a gather list, wrote a token to a file, used scp to copy the token file to another host, then used the token with his ddpt utility to retrieve the data. As Reinecke summed up, the main idea is to avoid data transfer via the CPU whenever possible. If that can be done efficiently, then Linux should look at supporting it.

[ Thanks to the Linux Foundation for travel support to attend LSFMM. ]

