When K&C’s DevOps engineers build a docker cluster with the usage of a physical server (bare-metal), the question arises; where to store persistent data that should be available to all of the cluster’s servers? Without such storage, the whole concept of docker containers disappears because only, in this case, will the cluster function in high availability mode. Moreover, an application placed on the cluster’s worker node should get access to our data storage and proceed to work in case of a dropout, loss, or unavailability of one of the servers in the data storage cluster. Therefore, it becomes clear that we need a persistent storage. One of the most widespread data storage systems that meet our minimum requirements is the NFS – Network File System. It provides transparent access to files and server file systems. It enables any client’s application that can work with a local file, to also work with an NFS-file without any program modification.

NFS overview

From the scheme above, you can see that the NFS server contains data which is available by every server in the cluster. The given scheme works well for projects with a small data volume and without high-speed requirements to input/output. What issues can you face when work with NFS? Problem #1: The whole load goes to the hard drive, which is on the NFS server and to which all other servers on the cluster call and perfors read/record operations. Problem #2: A single endpoint to a server with data. In the case of a data server dropout, the possibility that our application will break as well increases respectively.

What is CEPH?

CEPH is one of the most advanced and popular distributed file systems and object storage system. It is a software-defined remote file system with an open source, which belongs to the Red Hat Company. Main features of CEPH are: -no single entry points; -easily scalable to petabytes; -stores and replicates data; -responsible for load balancing; -guarantees accessibility and system robustness; -free (however, its developers might supply fee-based support); -no need for special equipment (the system can be deployed at any data center).

Implement cutting-edge web apps agile! Angular/React, Microservices, Cloud with full budget control and transparent staffing ok

CEPH keeps and provides data for clients in the following ways: 1)RADOS – as an object. 2)RBD – as a block device. 3)CephFS – as a file, POSIX-compliant filesystem. The access to the distributed storage of RADOS objects is given with the help of the following interfaces: 1)RADOS Gateway – Swift and Amazon-S3 compatible RESTful interface. 2)librados and the related C/C++ bindings. 3)rbd and QEMU-RBD – linux kernel and QEMU block. Here you can see how data placement is implemented in the CEPH cluster with the replication x2:

CEPH cluster with replication x2

And here you can see how data is being restored inside the cluster in case of the loss of the node in the CEPH cluster:

how data is being restored inside cluster in case of loss of the node in CEPH cluster

Famous Use Cases

CEPH has gained a wide audience and among its users are some renowned companies.

CEPH corporate users

How It Works

CEPH’s primary requirement to the infrastructure is the availability of sustainable network connection between a cluster’s servers. The minimal requirement to the network is the presence of 1 Gb/s communications link between servers. With this, it’s recommended to use network interfaces with the bandwidth 10 GB/s. From the K&C experience of building CEPH clusters, it’s worth mentioning that the network infrastructure requirements cause bottlenecks in the clusters. Any problems in the network infrastructure can lead to delays in the receipt of data by customers, as well as slow down the cluster and cause rebalancing of data within the cluster. We recommend to place the CEPH cluster servers in one server rack, and also make connections between the servers with the help of additional internal network interfaces. However, the K&C expertise also includes clusters built with the network channels at 1GB/s. These are not connected with internal interfaces and replaced in different server racks, which in turn are situated in distinct data centers. Even in such a scheme, a cluster’s work can be regarded as satisfactory as it performs SLA 99.9% of data accessibility. Let’s consider the building of a minimal cluster. In the given example, we’ll use a network interface 1 GB/s between servers of the CEPH cluster. Clients are connected through the same network interface. The primary requirement, in this case, is to resolve the problems mentioned above which occur when the data storage scheme with the NFS server is implemented. For clients, the data will be provided as a file system.

CEPH data store

In a scheme of such type, we have three physical servers with three hard drives, allotted to the CEPH cluster’s data. Hard drives are of the HDD type (not SSD), the volume is 6Tb, replication factor – x3. As a result, total data volume amounts to 18 Tb. Each of the CEPH cluster’s servers, in turn, is an entry point to the cluster for end clients. This allows us to “lose” (server down / server maintenance /…) one of the CEPH cluster’s servers per unit time in order to not harm final client’s data and ensure they are available. In case of the given scheme, we solve the problem of NFS as a single entry point to our data storage, as well as accelerate the speed of data operations. Let’s test the throughput of our cluster using an example of the file record (size – 500 Gb) in the CEPH cluster from a client server.

Graph: loading the file into CEPH cluster

The graph shows that loading the file into the CEPH cluster takes a little over five hours. In addition to that, you should pay attention to the network interface downloading: it is loaded at 30% – 300Mbps, not 100% as you may have assumed. The reason is the limitation of the recording speed for HDD hard disks. You can achieve a higher record/read response times in cases of building a CEPH cluster by using SSD drives, but the total cost of the cluster in this case is significantly increased.

Summary