From Christian Brauner <> Subject [PATCH 0/8] loopfs Date Wed, 8 Apr 2020 17:21:43 +0200 Hey everyone,



After having been pinged about this by various people recently here's loopfs.



This implements loopfs, a loop device filesystem. It takes inspiration

from the binderfs filesystem I implemented about two years ago and with

which we had overall good experiences so far. Parts of it are also

based on [3] but it's mostly a new, imho cleaner and more complete

approach.



To experiment, the patchset can be found in the following locations:

https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=loopfs

https://gitlab.com/brauner/linux/-/commits/loopfs

https://github.com/brauner/linux/tree/loopfs



One of the use-cases for loopfs is to allow to dynamically allocate loop

devices in sandboxed workloads without exposing /dev or

/dev/loop-control to the workload in question and without having to

implement a complex and also racy protocol to send around file

descriptors for loop devices. With loopfs each mount is a new instance,

i.e. loop devices created in one loopfs instance are independent of any

loop devices created in another loopfs instance. This allows

sufficiently privileged tools to have their own private stash of loop

device instances. Dmitry has expressed his desire to use this for

syzkaller in a private discussion. And various parties that want to use

it are Cced here too.



In addition, the loopfs filesystem can be mounted by user namespace root

and is thus suitable for use in containers. Combined with syscall

interception this makes it possible to securely delegate mounting of

images on loop devices, i.e. when a user calls mount -o loop <image>

<mountpoint> it will be possible to completely setup the loop device.

The final mount syscall to actually perform the mount will be handled

through syscall interception and be performed by a sufficiently

privileged process. Syscall interception is already supported through a

new seccomp feature we implemented in [1] and extended in [2] and is

actively used in production workloads. The additional loopfs work will

be used there and in various other workloads too. You'll find a short

illustration how this works with syscall interception below in [4].



The number of loop devices available to a loopfs instance can be limited

by setting the "max" mount option to a positive integer. This e.g.

allows sufficiently privileged processes to dynamically enforce a limit

on the number of devices. This limit is dynamic in contrast to the

max_loop module option in that a sufficiently privileged process can

update it with a simple remount operation.



The loopfs filesystem is placed under a new config option and special

care has been taken to not introduce any new code when users do not

select this config option.



Thanks!

Christian



[1]: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")

[2]: fb3c5386b382 ("seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE")

[3]: https://lore.kernel.org/lkml/1401227936-15698-1-git-send-email-seth.forshee@canonical.com

[4]:

root@f1:~# cat /proc/self/uid_map

0 100000 1000000000

root@f1:~# cat /proc/self/gid_map

0 100000 1000000000

root@f1:~# mkdir /dev/loopfs

root@f1:~# mount -t loop loop /dev/loopfs/

root@f1:~# ln -sf /dev/loopfs/loop-control /dev/loop-control

root@f1:~# losetup -f

/dev/loop9

root@f1:~# ln -sf /dev/loopfs/loop9 /dev/loop9

root@f1:~# ls -al /sys/class/block/loop9

lrwxrwxrwx 1 root root 0 Apr 8 14:53 /sys/class/block/loop9 -> ../../devices/virtual/block/loop9

root@f1:~# ls -al /sys/class/block/loop9/

total 0

drwxr-xr-x 9 root root 0 Apr 8 14:53 .

drwxr-xr-x 13 nobody nogroup 0 Apr 8 14:53 ..

-r--r--r-- 1 root root 4096 Apr 8 14:53 alignment_offset

lrwxrwxrwx 1 nobody nogroup 0 Apr 8 14:53 bdi -> ../../bdi/7:9

-r--r--r-- 1 root root 4096 Apr 8 14:53 capability

-r--r--r-- 1 root root 4096 Apr 8 14:53 dev

-r--r--r-- 1 root root 4096 Apr 8 14:53 discard_alignment

-r--r--r-- 1 root root 4096 Apr 8 14:53 events

-r--r--r-- 1 root root 4096 Apr 8 14:53 events_async

-rw-r--r-- 1 root root 4096 Apr 8 14:53 events_poll_msecs

-r--r--r-- 1 root root 4096 Apr 8 14:53 ext_range

-r--r--r-- 1 root root 4096 Apr 8 14:53 hidden

drwxr-xr-x 2 nobody nogroup 0 Apr 8 14:53 holders

-r--r--r-- 1 root root 4096 Apr 8 14:53 inflight

drwxr-xr-x 2 nobody nogroup 0 Apr 8 14:53 integrity

drwxr-xr-x 3 nobody nogroup 0 Apr 8 14:53 mq

drwxr-xr-x 2 root root 0 Apr 8 14:53 power

drwxr-xr-x 3 nobody nogroup 0 Apr 8 14:53 queue

-r--r--r-- 1 root root 4096 Apr 8 14:53 range

-r--r--r-- 1 root root 4096 Apr 8 14:53 removable

-r--r--r-- 1 root root 4096 Apr 8 14:53 ro

-r--r--r-- 1 root root 4096 Apr 8 14:53 size

drwxr-xr-x 2 nobody nogroup 0 Apr 8 14:53 slaves

-r--r--r-- 1 root root 4096 Apr 8 14:53 stat

lrwxrwxrwx 1 nobody nogroup 0 Apr 8 14:53 subsystem -> ../../../../class/block

drwxr-xr-x 2 root root 0 Apr 8 14:53 trace

-rw-r--r-- 1 root root 4096 Apr 8 14:53 uevent

root@f1:~#

root@f1:~# stat --file-system /bla.img

File: "/bla.img"

ID: 4396dc4f5f3ffe1b Namelen: 255 Type: btrfs

Block size: 4096 Fundamental block size: 4096

Blocks: Total: 11230468 Free: 10851929 Available: 10738585

Inodes: Total: 0 Free: 0

root@f1:~# mount -o loop /bla.img /opt

root@f1:~# findmnt | grep opt

└─/opt /dev/loop9 btrfs rw,relatime,ssd,space_cache,subvolid=5,subvol=/



Christian Brauner (8):

kobject_uevent: remove unneeded netlink_ns check

loopfs: implement loopfs

loop: use ns_capable for some loop operations

kernfs: handle multiple namespace tags

kernfs: let objects opt-in to propagating from the initial namespace

genhd: add minimal namespace infrastructure

loopfs: start attaching correct namespace during loop_add()

loopfs: only show devices in their correct instance



Documentation/filesystems/sysfs-tagging.txt | 1 -

MAINTAINERS | 5 +

block/genhd.c | 79 ++++

drivers/base/devtmpfs.c | 4 +-

drivers/block/Kconfig | 4 +

drivers/block/Makefile | 1 +

drivers/block/loop.c | 186 +++++++--

drivers/block/loop.h | 8 +-

drivers/block/loopfs/Makefile | 3 +

drivers/block/loopfs/loopfs.c | 429 ++++++++++++++++++++

drivers/block/loopfs/loopfs.h | 35 ++

fs/kernfs/dir.c | 38 +-

fs/kernfs/kernfs-internal.h | 26 +-

fs/kernfs/mount.c | 11 +-

fs/sysfs/mount.c | 14 +-

include/linux/device.h | 3 +

include/linux/genhd.h | 3 +

include/linux/kernfs.h | 44 +-

include/linux/kobject_ns.h | 7 +-

include/linux/sysfs.h | 8 +-

include/uapi/linux/magic.h | 1 +

lib/kobject.c | 17 +-

lib/kobject_uevent.c | 2 +-

net/core/net-sysfs.c | 6 -

24 files changed, 834 insertions(+), 101 deletions(-)

create mode 100644 drivers/block/loopfs/Makefile

create mode 100644 drivers/block/loopfs/loopfs.c

create mode 100644 drivers/block/loopfs/loopfs.h





base-commit: 7111951b8d4973bda27ff663f2cf18b663d15b48

--

2.26.0



