Last year around this time, we announced the availability of cgmanager, a daemon allowing users and programs to easily administer and delegate cgroups over a dbus interface. It was key to supporting nested containers and unprivileged users.

While its dbus interface turned out to have tremendous benefits (I wasn’t sold at first), there are programs which want to continue using the cgroup file interface. To support use of these in a container with the same delegation benefits of cgmanager, there is now lxcfs.

Lxcfs is a fuse filesystem mainly designed for use by lxc containers. On a Ubuntu 15.04 system, it will be used by default to provide two things: first, a virtualized view of some /proc files; and secondly, filtered access to the host’s cgroup filesystems.

The proc files filtered by lxcfs are cpuinfo, meminfo, stat, and uptime. These are filtered using cgroup information to show only the cpus and memory which are available to the reading task. They can be seen on the host under /var/lib/lxcfs/proc, and containers by default will bind-mount the proc files over the container’s proc files. There have been several attempts to push this virtualization into /proc itself, but those have been rejected. The proposed alternative was to write a library which all userspace would use to get filtered /proc information. Unfortunately no such effort seems to be taking off, and if it took off now it wouldn’t help with legacy containers. In contrast, lxcfs works perfectly with 12.04 and 14.04 containers.

The cgroups are mounted per-host-mounted-hierarchy under /var/lib/lxcfs/cgroup/. When a container is started, each filtered hierarchy will be bind-mounted under /sys/fs/cgroup/* in the container. The container cannot see any information for ancestor cgroups, so for instance /var/lib/lxcfs/cgroup/freezer will contain only a directory called ‘lxc’ or ‘user.slice’.

Lxcfs was instrumental in allowing us to boot systemd containers, both privileged and unprivileged. It also, through its proc filtering, answers a frequent years-old request. We do hope that kernel support for cgroup namespaces will eventually allow us to drop the cgroup part of lxcfs. Since we’ll need to support LTS containers for some time, that will definitely require cgroup namespace support for non-unified hierarchies, but that’s not out of the realm of possibilities.

Lxcfs is packaged in ubuntu 15.04, the source is hosted at github.com/lxc/lxcfs, and news can be tracked at linuxcontainers.org/lxcfs.

In summary, on a 15.04 host, you can now create a container the usual way,

lxc-create -t download -n v1 ‐ -d ubuntu -r vivid -a amd64

The resulting container will have “correct” results for uptime, top, etc.

root@v1:~# uptime

03:09:08 up 0 min, 0 users, load average: 0.02, 0.13, 0.12

It will get cgroup hierarchies under /sys/fs/cgroup:

root@v1:~# find /sys/fs/cgroup/freezer/

/sys/fs/cgroup/freezer/

/sys/fs/cgroup/freezer/user.slice

/sys/fs/cgroup/freezer/user.slice/user-1000.slice

/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope

/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1

/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/tasks

/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/cgroup.procs

/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/freezer.state

/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/cgroup.clone_children

/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/freezer.parent_freezing

/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/notify_on_release

/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/freezer.self_freezing

And, it can run systemd as init.

by Serge Hallyn, Software Engineer at Canonical

[this post was first published on 23rd February 2015 on S3hh’s blog]