I went to the Tech Field Day in Boston a few weeks ago. I had a great time, hung out with some really smart people, and got to tour Fenway Park. The whole experience was incredible.

It was not a shopping trip, though. The equipment and technologies that I saw are bleeding edge. They’re not what we’re using this year, or next year, and for a lot of us, probably not within the next five years. That being said, the overarching views on the way that storage and enterprise networks will operate for the foreseeable future was right there in front of me.

If you want to become familiar with the next 5–10 years of IT, get used to the term “unified” and the newer buzzword, “federated”. Everything, from a connectivity standpoint, is going to be unified. From a management standpoint, it’s going to be federated.

Unification means that all of the connections to our machines will happen over one single fabric. In other words, your networking (currently Cat5(e)/6), your storage connectivity (maybe fiber, maybe Cat5, maybe Coax/SPF) will all be over the same (probably 10Gb) cables, and if Cisco’s UCS (Unified Computing System (blog entry forthcoming)) is any indicator, we won’t have one cable per physical computer, we’re going to have several,but they’ll be to the enclosure, and the enclosure itself will deal with presenting things to the machines.

To say that devices will be federated is to say that devices which are physically distinct will be unified through procedural or administrative functions. For instance, if you’ve got an active directory domain, you could conceivably say that your member machines have been federated, since you can essentially administer them through the same panel,and they can be subject to rules, groupings, and policies. It’s not a new concept, but it is a new term for it. Expect to hear it a lot. Lots of people think that it’s going to be the new phrase like “cloud” was/is.

To really see the kind of change that all of this entails in the storage world, we’ve got to examine the way things are right now. I apologize if this is remedial for anyone, but it’s important to establish the current technologies, so we can fully appreciate what’s going to happen. Your patience will be rewarded.

The type of storage that we are all most familiar with is probably Direct Attached Storage (DAS). This is the storage that is connected to your servers directly via one of a number of buses. It might be SATA cables to internal hard drives, USB cables to external drives, or maybe even several SCSI Ultra320 cables attached to an external array. The main consideration for this storage configuration is that there is no network fabric between your servers and your storage. This storage is always (to my knowledge, anyway) block-level. In other words, your host sees the storage as a block device, can use fdisk and fsck (or fdisk and format, for the Windows users out there).

Next up is Network Attached (or sometimes Addressable or Accessible) Storage. The defining factor of this storage type is that it uses the pre-existing network (typically TCP/IP based) to present storage to the host. The access is nearly always file level; that is to say that the machine addressing the storage is unaware of the actual filesystem that the data resides on. Only the files and their metadata are presented. CIFS (formerly known as Samba or Windows File Sharing) and NFS are NAS technologies. That means that your samba server technically acts as a NAS server.

The next level of technical sophistication is the Storage Area Network. This technology utilizes a network to present block-level devices to the target hosts. If your host is connected to a SAN, the parts that it can see can be utilized with fdisk and fsck (or again, format). Typically, specialized hardware known as a host bus adapter (HBA) is used to present the remote storage as a device, but many modern operating systems can emulate an HBA if they have an appropriate pre-existing network fabric, such as an ethernet card in the case of iSCSI.

Above and beyond a SAN, you can have multiple SANs, either in close proximity or separated by some distance, with varying levels of replication between them. Many SAN storage arrays include the ability to replicate block-level information between themselves and another storage array. Without this technology, the hosts themselves would be responsible for transmitting the data between storage environments. A inter-SAN relationship such as this improves the reliability of the overall storage network by reducing the margin of error in configuring hosts to replicate the data, and can sometimes take advantage of in-array technologies like data deduplication which are transparent to hosts.

Once the size of a SAN grows beyond a small number of arrays, it becomes unwieldy to administer the storage. Keeping track of what data exists where is troublesome and wastes time. A technology known as storage virtualization has been developed which constructs a layer of abstraction above the storage arrays. When configuring storage for a particular server, the administrator interfaces with this virtualization solution, and the product itself manages the underlying storage arrays. This provides a large degree of freedom from decision making for the administrator, who no longer worries (or cares) where the data actually resides. Because of performance requirements, data virtualization is typically limited to a close geographical area.

The most recent, and advanced, technology is federated storage. Rather than virtualizing storage in a layer of abstraction, federated storage arrays are all administered through the same interface. Individual storage arrays can be considered nodes within the federation, and an arbitrarily large number of nodes can be added. Through this method, storage networks can reach previously unprecedented sizes over an astonishingly large geographic distance.