Introducing KubeFuse: the file system for Kubernetes!

Why would you do that?

I’ve always had a soft spot for the “everything is a file” philosophy. Files are easy. We can put them in directories to denote hierarchies and give context, we can use our favourite UNIX-y tools to find, browse and manipulate them and we can copy them around. It’s a world of fun.



KubeFuse was born when I was working with a number of replication controllers and pods in different namespaces. Browsing these resources became increasingly arduous and my sessions looked a bit like this:

kubectl get rc --namespace=dev kubectl describe rc postgres --namespace=dev kubectl get pods --namespace=dev kubectl describe pod postgres-azaa1 --namespace=dev kubectl logs postgres-azaa1 --namespace=dev

You can actually configure the default namespace, which would have alleviated the situation a bit (see the Kubernetes documentation). However, in development I (might) like to move between different namespaces and even ignoring that we can still improve on this.

So it’s not the the end of the world, but that’s a lot of typing. Having played around a bit with FUSE in the past I thought it would be worth trying to implement a Kubernetes view in the filesystem, which sounds crazy but is actually pretty useful. The session above then started looking like this:

cd ~/kubernetes/dev/rc/ cat postgres/describe cd ../pods/ ls cd postgres-azaa1 cat describe cat logs

By building up the context in the file system we can get away with a lot less typing and can even tab-complete the commands. This saved me a lot of time (if I discount writing KubeFuse itself) and is all around quite pleasant to work with.

FUSE

FUSE has been around for a while and is probably best known for things like sshfs and smbfs (now unmaintained) among others. In a nutshell FUSE is an interface that allows you to implement file systems in user space. The API it exposes for this is a bit simpler than the kernel’s so implementing a noddy read-only* file system is actually pretty straight forward. Once implemented we can run our program and have our mount magically appear in our file system — indistinguisable from a regular file system.

KubeFuse was written in Python, because Python is still cool and has a solid FUSE library available for it. The fusepy library has some simple examples showing how to implement loopback and memory file systems.

*KubeFuse also supports writes, but we’ll cover that later.

The KubeFuse model

The file system KubeFuse implements currently looks like this:

/namespace/resource_type/object_id/action

All supported resource types (pods, replication controllers, services, volumes, etc.) have been implemented and action can be one of describe , json , yaml , and for pods we also expose logs . We’ll probably add more specific bits in the future though (for instance replicas in replication controllers, and/or things like meminfo ).

Here are some examples of valid paths:

/default/svc/nginx/json /default/volumes/data/describe /dev/pod/my-cheeky-app-buu43/logs etc.

Behind the scenes all actions map to kubectl commands. For instance when we read the describe file we’re actually getting the output of kubectl describe [resource_type] [object_id] --namespace=[namespace] . For example, this:

cat /default/rc/postgres/describe

Gets translated into this:

kubectl describe rc postgres --namespace=default

A similar thing happens to logs ( kubectl logs ), and json and yaml are translated to: kubectl get [resource_type] [object_id] --namespace=[namespace] -o json/yaml .

Not only the actions are translated to kubectl commands, but we also execute kubectl behind the scenes to get the available namespaces and resources for browsing. Listing the “files” in /default/svc/ with ls /default/svc/ shows the results of kubectl get svc --namespace=default for example.

Caching

Translating read() ‘s and readdir() ‘s to kubectl commands is pretty neat, but can get quite expensive especially with proper error handling (ie. checking that a path exists). For example: running cat /dev/svc/nginx/describe will run three kubectl commands:

# verify that 'dev' is a real namespace kubectl get ns # verify that 'nginx' is a real service kubectl get svc nginx --namespace=dev # run the actual action kubectl describe svc nginx --namespace=dev

We can avoid the first two calls by assuming that the namespace and service exist, which is what happened in earlier versions of KubeFuse, but this opens us up to unexpected behaviour and even security vulnerabilities. This was one reason to implement caching and error checking.

Another reason is that I wanted to be able to show the proper file sizes and modification dates of objects. I ran into some bugs with programs that use the file size to figure out how much they can read or use the modification date to see if they need to reload a file because of changes, which is reasonable enough. However, this is a problem for us, because we don’t know how big our files will be without reading them ourselves first so in early versions of KubeFuse I set all sizes to something high and returned zero bytes at the end to pad things out if the file turned out to be smaller in the end. This sort of worked except that some editors and tools still showed the padding, which obviously doesn’t fly.

However, what does this mean in practice? Running ls /default/pods/postgres-azaa1/ is pretty expensive, because aside from having to verify the namespace and pod identifier, it also has to run all the

supported actions to figure out the file sizes. Luckily, Kubernetes is pretty performant and depending on the network link to your cluster this will run in about a second, which is still faster than you can type the command. It also caches the output of each command for thirty seconds, so that reading the action after an `ls` is basically free.

Writing Files

KubeFuse also allows you to edit the json and yaml files with your favourite text editors (vim) and tools. The idea here is that the changes you make are applied as soon as you save the file, which is useful for tweaks to the number of replicas, resource settings, versions, tags, etc. for example.

If you have ever strace d processes before or implemented low-level file handling then you probably know that writing a file can’t be done in a single syscall. Overwriting a file looks something like this for example:

open() truncate() sync() write() sync() close()

In FUSE we get callbacks for all of these syscalls. This is nice, but in our case also a bit annoying because e.g. open doesn’t really mean anything in our world. Instead we’d like to have a higher-level API that says “this resource now contains this data”, but unfortunately that doesn’t exist so we have to piece this information together ourselves.

To complicate matters further we also have to think about concurrency. What if one process just truncated a file, whilst another is writing to it? What if one process is reading a file, whilst another is updating it? Has it been synced yet? Oh and by the way: what happens to the cache described above?

We have to solve these issues in regular file systems, but somehow it’s slightly worse in KubeFuse, because the “files” we’re manipulating aren’t actually real files and we also need to think about our caches. At the moment of writing KubeFuse handles most cases properly: if we read a file that’s in the process of being written (synced but not closed) we read the new content, when we save a file we apply the changes and invalidate the cache, but the only thing that’s not being done is acquiring a write lock for concurrent writes. This will be added soon, but in practice you probably won’t run into this though, especially if you avoid having multiple processes trying to write the same Kubernetes resource at the same time, which should be pretty rare.

Try KubeFuse!

KubeFuse is easy to install, extensively tested, works on Linux and Mac and doesn’t break your environment. All good reasons to take KubeFuse for a spin!