In many cases, developer productivity really depends on how fast and efficiently an application can get access to the pre-production data set. What if a dataset is in TBs and in Millions of records? What if data changing fluidly? And what if network got disconnected, you on the plane or just need to work offline?

Discover EdgeFS data layer Geo-Transparency!

What is EdgeFS? This is a new storage provider addition to CNCF Rook project and you can read more about it at https://rook.io/docs/rook/master/edgefs-storage.html. While it is a scale-out storage cluster, it can still operate in a so-called “solo” mode, a single-node Docker container with an ability to scale out your deployment as it grew by simply connecting more nodes and/or geographically distributed cluster segments to it.

EdgeFS stronghold is its ability to virtualize underlying infrastructure as a scalable, highly-available and geographically distributed storage system. It works similarly to “git” where all modifications are globally immutable, fully versioned, self-validated, distributed and therefore fault-tolerant. As a result, it enables cross-cloud and geographically transparent high-performance distributed access to commonly used storage protocols for Object, File, Block, and NoSQL databases.

When configured on Mac, it can utilize a local directory and present it as a remote access point to a much larger data set. With floating snapshots, on-demand data fetch with E-LRU local cache, you should be able to set up pretty awesome minilab, locally on your laptop, thus minimizing networking I/O dramatically. In some of my experiments, I’ve experienced up to 50x savings in data fetches. This is due to global data deduplication, in-transfer compression and immutability across multiple use cases: File, Block, Object (S3, S3X or SWIFT) and NoSQL.

Let’s set it up!

The use case would be to set up remote access to a dataset accessible via EdgeFS provided interfaces in offline mode. For simplicity, I’ll be focusing on the creation of local S3 service with a bi-directional ISGW(Inter-Segment Gateway) link and local transparent NFS service.

I will be using Docker Desktop for Mac, but please be aware of this bug:

It seems that “edge” version has this fixed. Ensure to switch to “edge” version!

However, be still advised that on heavy load vpnkit-forwarder process may still die under memory pressure, and as such I recommend you to increase VM’s memory to 4GB.

Create /edgefs directory and download docker-compose YAML file from this gist: docker-compose-embedded.yml, rename it to docker-compose.yml. Create a directory structure that looks like below and directories shared via Docker Desktop Preferences:

Create a directory which will hold EdgeFS local data:

mkdir /edgefs/data/store1

And run configuration tool:

cd /edgefs

docker-compose run --rm -e CCOW_LOG_STDOUT=1 target \

config node -n localhost -i eth0 -D /data/store1 -r 1

-o '{"MaxSizeGB":10,"RtPlevelOverride":1,"DisableVerifyChid":true,"Sync":2}'

The command above will do the following:

map local /data/store1 directory into EdgeFS target container (-D);

enable usage of up to 10GB of disk space (MaxSizeGB)

limit directory partitioning to 1 (each additional partition allocates extra memory, so it is important to keep it at the minimum) (RtPlevelOverride);

disable extra networking verification of a data chunk prior for it to be written to stable storage to save some CPU cycles (DisableVerifyChid);

since we only have 1 partition configured, we set default local site replication to 1 (-r);

enable journal sync to provide consistency in case of linuxkit crash (Sync).

If you would want to adjust the suggested configuration, pass “ — help” flag.

Now that you satisfied with the configuration, boot it up:

# docker-compose up -d

Creating edgefs_target_1

Creating edgefs_mgmt_1

Creating edgefs_s301_1

Creating edgefs_ui_1

Creating edgefs_nfs01_1

Creating edgefs_isgw01_1 # docker-compose logs -f

Initialize cluster segment “myspace”

And after status is online, initialize cluster and create system objects:

### initialize local cluster segment

efscli system init ### initialize myspace/work/shared1

efscli cluster create myspace

efscli tenant create myspace/work

efscli bucket create myspace/work/shared1 -s 4M -t 1

EdgeFS uses globally unique system path in the format of NAMESPACE/TENANT/BUCKET/OBJECT. In the above, we’ve created a “myspace/work/shared” bucket with a default chunk size set to 4MB and NFS/S3 object transparency enabled.

We could also connect to the GUI and create/monitor/manage services via nice UI by pointing a web browser to http://IPADDR:3000, default user admin, and the default password is edgefs.

Now that cluster namespace segment, tenant and buckets are created, we can setup S3 and ISGW services definitions:

### NFS service with myspace/work/shared1 export

efscli service create nfs nfs01

efscli service config nfs01 X-MH-ImmDir 1

efscli service serve nfs01 myspace/work/shared1 ### S3 service servicing myspace/work tenant

efscli service create s3 s301

efscli service serve s301 myspace/work ### ISGW endpont link

efscli service create isgw isgw01

efscli service serve isgw01 myspace/work/shared1

efscli service config isgw01 X-ISGW-Remote ccow://REMOTE_IP:14000

efscli service config isgw01 X-Status enabled

You’ll need to know REMOTE_IP in terms of to configure bi-directional ISGW link. There is no limit on how many links can be configured, it is a full mesh, so your local setup can synchronize geo-distributed EdgeFS installations.

At this point, a docker-compose script will pick up changes on next policy restart and services should be available. If for some reason it isn’t picking up, speed up service restart with the following command:

docker-compose logs -f nfs01 # monitor nfs service logs

docker-compose logs -f s301 # monitor s3 service logs

docker-compose logs -f isgw01 # monitor isgw01 service logs docker-compose restart nfs01

docker-compose restart s301

docker-compose restart isgw01

Verification

Let’s start from verification of S3 endpoint. The easiest, of course, is curl:



<?xml version="1.0"?>

<ListBucketResult xmlns=" # curl -k https://localhost:9443/shared1 http://s3.amazonaws.com/doc/2006-03-01/ "> shared1 1000 false

You now can point it to your application and start using it locally. ISGW Link will take care of synchronization of a dataset for you in a transparent way. If for your dataset you need to only synchronize global namespace metadata, set MDOnly=true in a isgw01 service configuration. This will enable efficient on-demand data chunk fetch and would keep cached chunks for 24hrs by default (configurable on per-bucket basis).

Let’s now verify NFS function. On Mac, there is no easy way to directly mount NFS export from NFS server exposed via Docker container. There is a way to a setup SOCKS proxy, but really, what you’ll likely need is to expose it to your application, within Docker itself, and it would work perfectly fine:

# docker run -it --rm -d --privileged=true --link nfs01 \

--net=edgefs_default -e MOUNTPOINT=/mnt -e SERVER=nfs01 \

-e SHARE=/work/shared1 outstand/nfs-client

b6b97c464e4b98899bfb575472a6aed61cb3f71cee80c3c6119be9a6f91a8043

# docker docker exec -it b6b ls /mnt

Advanced usage

Geo-Transparency with globally enabled de-duplication across all the variety of commonly used storage protocols: File, Block, Object and NoSQL. It is a really a data storage layer, that was designed to scale across a set of distributed sites, clouds, on-premises data centers, and Edge IoT devices.

Go on... continue reading!

Connect all your sites with guaranteed consistency

With ISGW Links you can enable complex schemas of data I/O flow synchronization. A global namespace can consist of hundreds of small and large EdgeFS installations, connected all together. In my example here, /work/shared1 is a bucket I want to synchronize between REMOTE_IP and this setup. But I can also easily add another isgw02 service and connect it to REMOTE_IP2 and so on so that my changes will be going out in two directions.

In cases when I do not need to keep replicated content, I do have an option to enable Metadata-Only synchronization with Data chunks fetched on-demand with E-LRU caching semantics enabled:

efscli service config isgw01 X-ISGW-MDOnly true

If I would need to provide some sort of consistency across a dataset, I could use a snapshot consistency group:

# docker-compose exec mgmt toolbox Welcome to EdgeFS Mgmt Toolbox.

Hint: type neadm or efscli to begin root@55fce9350b63:/opt/nedge# efscli object

Objects operations, e.g. create, delete, list Usage:

efscli object [command] Aliases:

object, o Available Commands:

create create a new object

delete delete an existing object

get get a new object

list list objects

put put a new object

show show object

snapshot-add add a new object's snapshot to snapview

snapshot-clone clone snapshot to object

snapshot-list list snapshots of specified snapview object

snapshot-rm remove snapshot from snapview

snapview-create create a new snapview section

snapview-delete delete a snapview object

Snapshoting groups (e.g. snapview) will be “floating” across all synchronized locations and I wouldn’t need to worry about the states of my work.

Build your SAN setups with EdgeFS iSCSI Targets

Block device is another option that can be interesting to explore. If I would need an iSCSI device, it is easy to create a service and subscribe to a bucket with iSCSI devices. ISGW Link would start syncing contents of block devices.

efscli service create iscsi isc01

efscli object create myspace/work/shared1/lun1 -s 128K \

-o volsize=10g,blocksize=4096

efscli service serve isc01 myspace/work/shared1/lun1

Now, restart service with docker-compose restart isc01 and try to find your LUN (notice that I changed isc01 service port to 3261 to avoid conflicting with globalSAN as it is allocating 3260):

EdgeFS iSCSI LUNs really just objects, accessible via NFS, S3, S3X or SWIFT interfaces:

curl -i http://localhost:9982/shared1/lun1

HTTP/1.1 200 OK

X-Powered-By: Express

x-amz-id-2: 90fc4c79c9b21ba4

x-amz-request-id: e9e301fcdfebb694

Content-Length: 0

Last-Modified: Sat, 16 Mar 2019 16:41:44 GMT

Accept-Ranges: bytes

X-volsize: 10737418240

X-blocksize: 4096

ETag: "BED78C44556C9B268C85B3DC1C2E12979CF264F81C06517A705E5137C531FBD70000000000000000000000000000000000000000000000000000000000000000"

Date: Sun, 17 Mar 2019 21:02:52 GMT

Connection: keep-alive

A forest of NoSQL Databases

And finally, the most interesting option to explore is the NoSQL style database. There is an S3X extended API which can be used with applications directly:

The API provides access to advanced EdgeFS Object interfaces, such as access to Key-Value store, S3 Object Append mode, S3 Object RW mode, and S3 Object Stream Session (POSIX compatible) mode.

A Stream Session encompasses a series of edits to one object made by one source that is saved as one or more versions during a specific finite time duration. A Stream Session must be isolated while it is open. That is, users working through this session will not see updates to this object from other sessions. After the session is finalized, a new version of the object is created and eventually replicated via directed ISGW links.

Stream Session allows high-performance POSIX-style access to an object and thus it is beneficial for client applications to use HTTP/1.1 or HTTP/2 Persistent Connection extensions, to minimize latency between updates or reads.

To enable, uncomment s3x01 service in compose script and execute the following commands in the toolbox:

efscli service create s3x s3x01

efscli service serve s3x01 myspace/work

The above will create service definition s3x01 serving tenant myspace/work. Here are a few examples of how to use it:



curl -X POST -H "Content-Type: application/json" \

--data '{"key1":"value1"}' \

"http://localhost:4000/bk1/mydb.json?comp=kv&finalize" # create JSON Key-Value database mydb.json in bucket bk1curl -X POST -H "Content-Type: application/json" \--data '{"key1":"value1"}' \

curl -X POST --data "value1" \

"http://localhost:4000/shared1/mydb.csv?comp=kv&finalize" # create CSV Key-Value database mydb.csvcurl -X POST --data "value1" \

curl " # list keys and valuescurl " http://localhost:4000/shared1/mydb.json?comp=kv&values=1

Databases alphabetically indexed and it is possible to select one or more matching keys and values using key , maxresults and values query parameters:

Output can be in CSV or JSON formats. It is also possible to insert value in binary format as well (binary value cannot exceed 1MB in size):



curl -X POST -F "file=

" # insert binary datacurl -X POST -F "file= @ file.txt" \ http://localhost:4000/shared1/mydb.blob?comp=kv&finalize&key= file.txt"

curl " # read it backcurl " http://localhost:4000/shared1/mydb.blob?comp=kv&values=1&maxresults=1&key= file.txt"

It is also possible to create transactional style key-value streaming behavior: open, modify, modify, modify, commit. This approach can be used for efficient log or record collection mechanisms: stream data directly into EdgeFS with S3X! Read more through API documentation on how this can be done.

Summary

EdgeFS can be a very powerful tool to improve CI/CD or day-to-day Developer’s workflow. With true data geo-transparency, guaranteed consistency for File, Block, Object and NoSQL protocols, EdgeFS data layer can enable some use cases that before you wouldn’t even think could be possible. Data of any type and kind now floating across distributed locations without the need for any centralized metadata orchestration!

Explore! Look at docker-compose commented out sections if you’d like to enable extra features! Or better yet, join our growing community at http://edgefs.io, give us feedback, help us improve the developer experience!