The Infrastruggle

The Rise, Fall, and Rise, of Virtual Tape Libraries

VTLs have a storied history. Are they about to make a comeback?

Readers of my work know that I am something of a tape evangelist. Simply put, the zettabytes of data that are amassing in the digital universe over the next few years will require tape storage: there is just not enough combined manufacturing capacity in either the disk or solid state industries to handle more than a ZB-and-a-half per year, and analysts are projecting between 10 and 60 ZBs of new data requiring storage over the next five years.

So there's that.

My research has also found tape to be the only storage media choice for an economically-valid archive strategy. This position has made me keep the Active Archive Alliance at arm's length and made me reluctant to add my full-throated support for some of the disk-based archive kits entering the market recently.

And, to be honest, my bias toward tape has also made me suspicious of Virtual Tape Libraries (VTLs) that use (typically) a disk-based repository as a surrogate for tape in a backup role. Let me back into this one, since I am striving to formulate a more balanced view.

Evolution Of The Virtual Tape Library

VTLs have been around a long time, as shown in. When tape was slow and backup windows were shrinking in mainframe datacenters three decades ago, somebody smart came up with the idea of "front-ending" the tape system with a cache of disk drives. You could use this cache as a "virtual" tape library, enabling backup data to be written to the cache very rapidly and allowing the production system to get back to work.

Another advantage was that tape jobs could be "stacked" on the VTL, prior to writing the data to tape. This addressed a problem in mainframes that resulted in tape media being used very inefficiently, enabling write jobs to use the entire reel or cartridge rather than writing data in dribs and drabs across many pieces of media.

Once data was copied to the VTL disk, as an "off-line" or "near-line" process, the cached data would then be written to the tape system where it could be used to make local and off-site copies of the backup data. This multi-stage process leveraged VTLs to eliminate what we now call "data friction": production latencies brought about by slow data copies. It is worth noting that the original VTLs were software-based. That is, software running on the mainframe was used to designate some of the disk storage (DASD) for use as the VTL cache. Later, vendors introduced disk arrays that were explicitly designed to be used as VTLs, but the first VTLs, like storage infrastructure generally, were software-defined.

[Click on image for larger view.] Figure 1. Virtual Tape Libraries (VTLs) have undergone at least three generations of evolution and innovation. Source: Data Management Institute, 2016.

The second wave of VTLs were targeted at the needs of the distributed computing environment, where they sought to address several challenges. For one, the bandwidth of early distributed LANs was too constrained to handle backup data traffic from many servers simultaneously, and scheduling multiple backups was problematic, even with the best backup software.

Reinventing VTL

Thus the concept of the VTL was re-invented. VTL appliances could be installed in the same subnetwork as the servers whose data they would back up. Plus, in addition to serving as a write cache for a tape library, each VTL appliance became a full-fledged emulation of a physical tape environment, capable of representing multiple "virtual" robots and tape drives to expedite data capture, whether or not the actual hardware existed in the physical tape environment.

Behind the scenes, the idea was to do something with a VTL appliance that could not be done simply with backup software. By the late 1990s, tape had fallen into disfavor both as a function of disinformation by disk vendors and of a mismatch between tape capacities and data growth rates. Smart backup software vendors, following the lead of server operating system vendors, began to support "backup-to-disk" functionality in their products. To differentiate VTLs from this generic implementation of disk-based backup caching, vendors added functionality like drive emulation and, later, deduplication, to their appliances.

Even then, the ultimate strategy was to unload data cached on the VTL onto tape to capitalize on tape's portability, the "air gap" it provided between accessible (and prone to corruption, deletion or unauthorized disclosure) production data and a tape-based safety copy, and the ability of tape-based data to be restored to any target, regardless of the vendor brand-name on the kit (a key limitation of proprietary disk and flash storage arrays). A VTL was a stop-gap, resolving certain transport issues, but not a replacement for tape, in the final analysis (Figure 2).

[Click on image for larger view.] Figure 2. Features of tape remain highly valued. Source: Data Management Institute, 2016.

It is worth noting that, in the early 2000s, the chief reason that companies were giving to explain their interest in SAN technology was to "enable the sharing of enterprise tape libraries" with distributed systems in their environments. VTLs had not eliminated tape; they had become complimentary.

Deduplicating VTLs Disrupt the Traditional Storage Model

However, the rise of de-duplicating arrays in the mid-2000s seemed to change the outlook. For a long time, the industry had been fairly adept at characterizing storage platforms in three basic flavors (). Tier 1 or "primary storage" arrays were expensive, proprietary, low capacity and high performance platforms supposedly targeted at data that was frequently accessed and modified. Tier 2 or "secondary storage" arrays were somewhat less costly and comprised high capacity with fairly nimble accessibility. Tier 3 or "tertiary" storage encompassed VTLs, tape systems, optical disc, and other storage aimed at hosting data very inexpensively that did not require fast access or frequent updates. By the mid-2000s, however, vendors were blurring the lines between these archetypes.

Some primary storage vendors began offering array products that combined fast storage and capacity storage, with tiering software built onto array controllers. But the real assault on the storage order came from vendors of de-duplicating arrays and arrays featuring drive spin-down (for lower energy costs -- see Figure 4), who sought to substitute their products, which were built mainly on Tier 2 components, for tertiary storage altogether, eliminating the third storage category completely.

[Click on image for larger view.] Figure 3. Traditional categories of storage defined by performance, capacity and price. Source: Data Management Institute, 2016.

The curious thing about this model was the huge increase in acquisition cost for de-duplicating VTLs in particular. One vendor proffered a de-duplicating VTL comprised of a dedicated disk subsystem with VTL and de-duplication software running on a proprietary controller for an MSRP of about $410,000.

Peeling back the onion, one could see that the chassis, disk drives and other hardware components of the system, if purchased directly from their vendors, totaled about $3K. That meant that the vendor was charging $407K for the software, claiming that de-duplication made every drive in the system deliver storage capacity equivalent to 70 disk drives, and that this 70:1 reduction ratio justified the steep sticker price.

Moreover, while the initial products in the de-duplicating VTL appliance (Figure 5) supported a "tape back-store" for customers who wanted to make a safety copy of their data on the VTL, later iterations dropped support for this feature and encouraged consumers to de-commission tape altogether. A tchotchke from the vendor's booth at trade shows was a bumper sticker that exclaimed, "Tape Sucks! Move on."

The fact that this vendor (and many competitors) were unable to deliver anything like the promised data reduction ratios, plus the economic downturn in the late 2000s, and the dramatic increase in the capacity and performance of tape technology helped to quash this trend. However, the concept of personalizing generic low cost disk to provide a backup buffer remained a valid one, especially with the advent of so-called software-defined storage.

'Software-Defined' Meets VTL

Software-defined storage (SDS) has many roots, with one clear path tracing directly back to System Managed Storage (SMS) in IBM mainframes. Beginning in the late 1970s/early 1980s, direct attached storage devices (DASD) connected to the backplane of the mainframe, and were mainly controlled by SMS software running in the mainframe OS environment. Eventually, part of SMS control was ceded to on-array controllers, but the approach to separating the management and provisioning of storage from the underlying hardware was on full display in the mainframe datacenter.