Linux and the storage ecosystem

An open platform for flexible storage

Linux is many things, and its power lies in its ability to flexibly support vastly different usage models. But one of Linux's most important strengths is serving as the workhorse of the storage domain. Thinking about Linux and storage commonly conjures an image of direct-attached disks or the latest file system, but there's much more to storage and Linux than meets the eye. Elements in the Linux are not only stable but also cutting-edge.

This article explores the various storage technologies that keep Linux at the center of the storage universe. Let's start at the bottom—namely, storage architectures—and then work up the stack to features, file systems, and futures (see Figure 1).

Figure 1. Storage stack for exploration in this article

Storage architecture

How the storage attaches to the platform is key to the overall storage architecture. Three general architectures cover the vast majority of models:

Direct-attached storage (DAS)

Storage area networks (SAN)

Network-attached storage (NAS)

Of course, Linux supports all three and has evolved with the changes that are occurring with these models.

Other storage media This article focuses mainly on disk storage, but it's difficult to ignore the breadth of other devices supported in Linux. From the quickly disappearing floppy drives, CD-ROM and DVD readers and writers, and even enterprise tape systems, Linux can't be beat when it comes to mass storage device support.

Figure 2 illustrates the models, with a focus on the location of the file system and storage. The DAS model covers the direct attachment of storage to the platform and represents the vast majority of storage use. The SAN separates the storage from the platform and makes it accessible over one of a number of block storage protocols. Finally, NAS provides a similar architecture to the SAN but operates at the file level.

Figure 2. Major storage architectures

Direct-attached storage

Linux supports a large variety of DAS interfaces, including old standards like parallel Advanced Technology Attachment (ATA)—Integrated Drive Electronics [IDE]/ATA—parallel SCSI, and Fibre Channel as well as new storage interfaces like serial attached SCSI (SAS), serial ATA (SATA), and external SATA (eSATA). You'll also find advanced storage technologies such as USB3 (Extensible Host Controller Interface [xHCI]) and Firewire (Institute of Electrical and Electronics Engineers 1394).

Storage area network

The SAN provides consolidation of block-level storage so that it can be shared among a number of servers. The storage appears local to the servers, where the endpoint storage device may implement additional services for the client devices (such as backup and replication).

Protocols and interfaces for SANs are wide and varied. You can find the typical SAN protocols in Linux such as Fibre Channel as well as its extension over IP (iFCP). Newer protocols, such as SAS, Fibre Channel over Ethernet (FCoE), and Internet SCSI (iSCSI), are also present, as are more domain-specific protocols like iSCSI Extensions for remote direct memory access (RDMA—iSER) and the SCSI RDMA Protocol (SRP), which extends SCSI over RDMA for Infiniband.

The emergence of Ethernet as a storage protocol has been fully realized in Linux, as it illustrates the power and flexibility of these approaches. Further, 10-gigabit Ethernet (10GbE) is fully supported in Linux, permitting construction of high-performance SANs. You can also find protocols like ATA over Ethernet (ATAoE), which extends the ATA protocol over the ubiquitous Ethernet protocol.

Network-attached storage

Last but not least is NAS. NAS is a consolidation of storage over a network for access by heterogeneous clients at the file level. Two of the most popular protocols, which are fully supported in Linux, are Network File System (NFS) and Server Message Block/Common Internet File System (SMB/CIFS).

Although the original SMB implementation was proprietary, it was reverse-engineered to be supported in Linux. The later SMB revisions were openly documented to allow simpler development in Linux.

Linux has continued to evolve with the various enhancements and extensions made to NFS. NFS is now a stateful protocol and includes optimizations for data and metadata separation as well as data access parallelism. You can read more about the evolution of NFS using the links in Related topics. As with Ethernet-based SANs, 10GbE support in Linux enables high-performance NAS repositories.

Other storage architectures

Not all storage architectures fit cleanly in the DAS, SAN, and NAS buckets. Because Linux is open, it makes it easy to develop new technologies within it, which is why you can find the newest bleeding-edge technologies in Linux.

One interesting storage architecture, which is not new but worthwhile to mention, is the object storage architecture. Object storage architectures split a file from its metadata and store them independently (on their respective data and metadata servers). This split provides certain advantages, such as minimizing the metadata bottleneck (because interactions with this server are only required to locate and open a file). Performance can also be enhanced by striping the data over multiple data servers for parallel access. Object storage is implemented in a variety of ways within Linux, including support for the Object Storage Device (OSD) specification as well as within the Linux clUSTER (Lustre) and Extended Object File System (exofs).

A similar technology exists called content-addressable storage (CAS) that uses a hash of the data to identify its name and address. This technology, also known as fixed-content storage (FCS), is advantageous, because it's easy to identify duplicate data: The hash (if strong enough) will be the same and permit simple de-duplication. The Venti architecture supports this approach and exists within Linux (in addition to the Plan 9 distribution of Bell Labs).

Storage services: logical volume management

Storage virtualization was once a feature unique to high-end storage systems, but it is now a standard feature of Linux. One of the most important services available in Linux is the Logical Volume Manager (LVM). The LVM is a thin layer that sits above physical storage available in the underlying storage architecture (with accompanying user-space tools) and abstracts it to one or more logical volumes that are simpler to manage. For example, while a physical disk cannot be resized, a logical volume can be resized to add or remove space from it.

With the ability to abstract physical devices into logical devices, LVM creates a number of other storage capabilities, such as read-only and read-write snapshots of volumes, data striping across volumes for performance (redundant array of independent disks [RAID]-0), data mirroring across volumes (RAID-1), and migration of volumes (even while online) between physical devices.

For data protection beyond mirroring, Linux includes md (which stands for multiple disks) and provides a rich set of RAID functionality. This element implements software RAID functionality, supporting RAID-4 (striped data with a parity block), RAID-5 (striped data with a distributed parity block), RAID-6 (striped data with distributed and dual-redundant parity blocks), and RAID-10 (striped and mirrored data).

The LVM relies on another storage component called the Device-mapper, which provides (among other features) the ability to multipath. For example, in a SAN environment, there are commonly multiple storage interfaces into the SAN fabric. Multipathing is a feature that protects against the failure of a given path, ensuring that storage remains available as long as a path exists to communicate with the endpoint.

Storage features

In the past few years, two relatively simple features have been added to the storage stack that address the evolution of the storage ecosystem:

Data integrity

Support for solid-state disks (SSDs)

Data integrity

The first change addresses the use of commodity drives in enterprise storage settings. Although enterprise-class drives (such as SAS drives) are reliable, SATA drives are built with different requirements and with cost as a major factor. For this reason, SATA drives can suffer from a problem known as silent data corruption, where errors can be introduced and not detected when the data is read from the disk. To solve this problem and support SATA drives in enterprise settings, data integrity codes are added to blocks on the disk (where the disk uses 520-byte sectors instead of the traditional 512-byte blocks). In addition, the drive itself can validate the data being written, so that its integrity code matches the data. In this way, errors can be caught as they're written to the disk, instead of detecting the error later when nothing can be done about it.

This mechanism is called the data integrity field (DIF), as shown in Figure 3, and represents an 8-byte trailer that includes a cyclic redundancy check (CRC) over the block of data, a reference tag (typically a portion of the logical block addressing [LBA]), and an application tag that the application defines. The reference tag is useful for catching mis-writes of data to an incorrect block, where the application tag can be used to catch other errors in the software stack. For example, if a PDF document is written, the application tag could be set to a value indicating a special PDF tag. When the PDF is read, each block's application tag can be inspected to ensure that all specify the PDF tag. DIF is supported within Linux as of kernel version 2.6.27.

Figure 3. DIF structure for a 512-byte sector

Growing support for SSDs

The introduction of SSDs is changing the storage ecosystem in a number of ways. These disks remove some of the large latencies found in spinning disks and therefore provide a way to maintain data flow to and from the CPU. But SSDs are different from hard disk drives (HDDs) in that they are consumable. The storage within an SSD can be written a finite number of times (depending on the technology); therefore, it's important to be as efficient as possible when writing data. To make matters worse, the SSD must internally shift data to minimize the introduction of errors in a process called garbage collection or wear-leveling. This process results in writes to the consumable storage and should therefore be minimized.

Another issue with SSDs and traditional storage is that an HDD didn't care whether data on disk was valid. If the file system invalidated the data, the data could remain on disk without any downside. This constraint does not exist with SSDs because of the wear-leveling requirement. For this reason, Linux now supports the ability of the file system to communicate discarded blocks to the SSD (as of kernel version 2.6.29). This ability allows the SSD to remove these blocks from wear-leveling processes and helps to increase the endurance of the drive.

File systems

What truly sets Linux apart from other operating systems is its vast library of file systems. In Linux, you can find traditional client file systems like the third extended file system (ext3) and the fourth extended file system (ext4), but you'll also find the state of the art in distributed file systems, cluster file systems, and parallel file systems. You can find new, cutting-edge file systems based around new ideas and addressing new problems in the storage domain, as well.

In terms of cutting-edge file systems today, Linux supports both ZFS and Butter FS (BTRFS). These two file systems compete with one another and share the distinction of copy-on-write semantics (blocks are never written in place). In addition, both file systems support data de-duplication, internal data protection (RAID-like protection), data and metadata checksums, and other storage features (like snapshots).

Linux is home to many distributed file systems, as well. One example is Lustre, which is a massively parallel distributed file system that supports tens of thousand of nodes and scales to petabytes of storage capacity. Ceph provides similar functionality and, in the past year, was introduced into the Linux kernel. Other examples in Linux include GlusterFS and the General Parallel File System (GPFS).

You can find specialized file systems in Linux, as well, including log-structured file systems like the New Implementation Log Structure File System (NiLFS(2)) and object-based file systems like exofs. Because Linux finds itself in many use models, you'll also find file systems for resource-constrained uses (such as embedded systems) as well as low-latency applications such as high-performance computing (HPC). File systems in the embedded area include the Yet Another Flash File System version 2 (YAFFS2), the Journaling Flash File System version 2 (JFFS2), and the Unsorted Block Image File System (UBIFS). File systems in the HPC space include the Parallel NFS (pNFS), Lustre, and the GPFS.

Linux storage ahead

Linux is and will continue to be the target for file systems and general storage research because of its openness and large community of developers.

One of the latest changes in storage is the use of remote services for cost-efficient storage of archive data. Known today as cloud storage, numerous vendors provide efficient and transparent access to remote, centralized storage with varying service level agreements (covering capabilities like protection and bandwidth). Two examples include Ubuntu One and Dropbox. Another service, called SpiderOak, can be used to back up your local user directories to the cloud for a small fee.

What other features might be on the horizon for Linux? Support for large sector sizes (moving beyond 512-byte sectors), thin provisioning to avoid reserved but unused capacity (where advertised storage exceeds the physical capacity), storage de-duplication (to maximize storage availability), and an even more efficient storage stack to exploit new speeds and efficiencies of drives like SSDs, perhaps? Whatever is coming in storage ecosystem evolution, Linux will be there first.

Downloadable resources

Related topics