Presenting Torus: A modern distributed storage system by CoreOS

• By Barak Michener

Persistent storage in container cluster infrastructure is one of the most interesting current problems in computing. Where do we store the voluminous stream of data that microservices produce and consume, especially when immutable, discrete application deployments are such a powerful pattern? As containers gain critical mass in enterprise deployments, how do we store all of this information in a way developers can depend on in any environment? How is the consistency and durability of that data assured in a world of dynamic, rapidly iterated application containers?

Today CoreOS introduces Torus, a new open source distributed storage system designed to provide reliable, scalable storage to container clusters orchestrated by Kubernetes, the open source container management system. Because we believe open source software must be released early and often to elicit the expertise of a community of developers, testers, and contributors, a prototype version of Torus is now available on GitHub, and we encourage everyone to test it with their data sets and cluster deployments, and help develop the next generation of distributed storage.

Distributed systems: Past, present, and future

At CoreOS we believe distributed systems provide the foundation for a more secure and reliable Internet. Building modular foundations that expand to handle growing workloads, yet remain easy to use and to assemble with other components, is essential for tackling the challenges of computing at web scale. We know this from three years of experience building etcd to solve the problem of distributed consensus — how small but critical pieces of information are democratically agreed upon and kept consistent as a group of machines rapidly and asynchronously updates and accesses them. Today etcd is the fastest and most stable open source distributed key-value store available. It is used by hundreds of leading distributed systems software projects, including Kubernetes, to coordinate configuration among massive groups of nodes and the applications they execute.

The problem of reliable distributed storage is arguably even more historically challenging than distributed consensus. In the algorithms required to implement distributed storage correctly, mistakes can have serious consequences. Data sets in distributed storage systems are often extremely large, and storage errors may propagate alarmingly while remaining difficult to detect. The burgeoning size of this data is also changing the way we create backups, archives, and other fail-safe measures to protect against application errors higher up the stack.

Why we built Torus

Torus provides storage primitives that are extremely reliable, distributed, and simple. It's designed to solve some major problems common for teams running distributed applications today. While it is possible to connect legacy storage to container infrastructure, the mismatch between these two models convinced us that the new problems of providing storage to container clusters warranted a new solution. Consensus algorithms are notoriously hard. Torus uses etcd, proven in thousands of production deployments, to shepherd metadata and maintain consensus. This frees Torus itself to focus on novel solutions to the storage part of the equation.

Existing storage solutions weren't designed to be cloud-native

Deploying, managing, and operating existing storage solutions while trying to shoehorn them into a modern container cluster infrastructure is difficult and expensive. These distributed storage systems were mostly designed for a regime of small clusters of large machines, rather than the GIFEE approach that focuses on large clusters of inexpensive, "small" machines. Worse, commercial distributed storage often involves pricey and even custom hardware and software that is not only expensive to acquire, but difficult to integrate with emerging tools and patterns, and costly to upgrade, license, and maintain over time.

Containers need persistent storage

Container cluster infrastructure is more dynamic than ever before, changing quickly in the face of automatic scaling, continuous delivery, and as components fail and are replaced. Ensuring persistent storage for these container microservices as they are started, stopped, upgraded, and migrated between nodes in the cluster is not as simple as providing a backing store for a single server running a group of monolithic applications, or even a number of virtual machines.

Storage for modern clusters must be uniformly available network-wide, and must govern access and consistency as data processing shifts from container to container, even within one application as it increments through versions. Torus exists to address these cases by applying these principles to its architecture:

Extensibility : Like etcd, Torus is a building block, and it enables various types of storage including distributed block devices, or large object storage. Torus is written in Go, and speaks the gRPC protocol to make it easy to create Torus clients in any language.

: Like etcd, Torus is a building block, and it enables various types of storage including distributed block devices, or large object storage. Torus is written in Go, and speaks the gRPC protocol to make it easy to create Torus clients in any language. Ease of use : Designed for containers and cluster orchestration platforms such as Kubernetes, Torus is simple to deploy and operate, and ready to scale.

: Designed for containers and cluster orchestration platforms such as Kubernetes, Torus is simple to deploy and operate, and ready to scale. Correctness : Torus uses the etcd distributed key-value database to store and retrieve file or object metadata. etcd provides a solid, battle-tested base for core distributed systems operations that must execute rapidly and reliably.

: Torus uses the etcd distributed key-value database to store and retrieve file or object metadata. etcd provides a solid, battle-tested base for core distributed systems operations that must execute rapidly and reliably. Scalability: Torus can currently scale to hundreds of nodes while treating disks collectively as a single storage pool.

"We have seen a clear need from the market for a storage solution that addresses the dynamic nature of containerized applications and can take advantage of the rapidly evolving storage hardware landscape," said Zachary Smith, CEO of Packet, a New York-based bare metal cloud provider. "We're excited to see CoreOS lead the community in releasing Torus as the first truly distributed storage solution for cloud-native applications."

How Torus works

At its core, Torus is a library with an interface that appears as a traditional file, allowing for storage manipulation through well-understood basic file operations. Coordinated and checkpointed through etcd's consensus process, this distributed file can be exposed to user applications in multiple ways. Today, Torus supports exposing this file as block-oriented storage via a Network Block Device (NBD). We also expect that in the future other storage systems, such as object storage, will be built on top of Torus as collections of these distributed files, coordinated by etcd.

Torus includes support for consistent hashing, replication, garbage collection, and pool rebalancing through the internal peer-to-peer API. The design includes the ability to support both encryption and efficient Reed-Solomon error correction in the near future, providing greater assurance of data validity and confidentiality throughout the system.

Deploying Torus

Torus can be easily deployed and managed with Kubernetes. This initial release includes Kubernetes manifests to configure and run Torus as an application on any Kubernetes cluster. This makes installing, managing, and upgrading Torus a simple and cloud-native affair. Once spun up as a cluster application, Torus combines with the flex volume plugin in Kubernetes to dynamically attach volumes to pods as they are deployed. To an app running in a pod, Torus appears as a traditional filesystem. Today's Torus release includes manifests using this feature to demonstrate running the PostgreSQL database server atop Kubernetes flex volumes, backed by Torus storage. Today's release also documents a simple standalone deployment of Torus with etcd, outside of a Kubernetes cluster, for other testing and development.

What's next for Torus? Community feedback

Releasing today's initial version of Torus is just the beginning of our effort to build a world-class cloud-native distributed storage system, and we need your help. Guide and contribute to the project at the Torus repo on GitHub by testing the software, filing issues, and joining our discussions. If you're in the San Francisco area, join us for the next CoreOS meetup on June 16 at 6 p.m. PT for a deep dive into the implementation and operational details of Torus.

"Distributed storage has historically been an elusive problem for cloud-native applications," said Peter Bourgon, distributed systems engineer and creator of Go kit. "I'm really happy with what I've seen so far from Torus, and quite excited to see where CoreOS and the community take it from here!"

Torus is simple, reliable, distributed storage for modern application containers, and a keystone for wider enterprise Kubernetes adoption.

CoreOS is hiring

If you're interested in helping develop Torus, or solving other difficult and rewarding problems in distributed systems at CoreOS, join us! We're hiring distributed storage engineers.