If you are a User looking to know more then hit up the FAQ section, if you are a programmer then the Implementations section and Links section are going to be the most useful. It is recommended that Sys admins read up on the Security and Networking part and perhaps take a look at the different Implementations available.

This page is an attempt to document the ins and outs of containers on Linux. This is not just restricted to programmers looking to implement containers or use container like features in their own code but also Sysadmins and Users who want to get more of a handle on how containers work 'under the hood'.

Rather than take an 'All or nothing' approach to containers (eg FreeBSD / Solaris / OpenVZ ), native Linux Containers support allows you to unshare Specific resources from the host. These can be mixed and matched in various ways to produce interesting combinations for things such as testing network setups, preventing information leakage (eg for shared hosting webservers) or testing out OS builds (eg from debootstrap). It can even be used to provide a more complete fakeroot replacement

Security

Containers are commonly thought of as a security mechanism, much in the same way that chroot is also mentioned. This is the wrong way to think about containers and will not only lead you astray but also to potential compromise. Namespaces (a part of containers) are an isolation mechanism That can be used to prevent information from one namespace leaking into another inadvertently. They do not however prevent intentional leakage (eg same filesystem mounted in multiple containers at once)

One thing to consider with containers is that the Linux kernel is shared between multiple containers, a namespace aware rootkit that compromise one container will be able to infect other containers and do what it wants to them. Containers will not reduce the attack surface in this case but instead give you multiple instances of that attack surface.

Unfortunately there is no 'One size fits all' security solution when implementing containers and as such you will need to mix and match the features from multiple Security subsystems in order to secure containers against attack. The main ones that when combined cover all bases are listed below:

Seccomp mode 2: This can be used to filter out individual syscalls or filter based on the arguments passed to the syscall. the mknod syscall is a good example here as there is no dynamic device support yet in a kernel, a static /dev can be populated and udev disabled. another good example is mount as this can be used in some situations to escape the container.

can be populated and udev disabled. another good example is mount as this can be used in some situations to escape the container. SELINUX: After evaluating multiple options in this space we found that selinux seems to be the clear winner here. The multi-category security can be used to prevent a compromised container from accessing the resources of another container should information from one container leak into another (This is surprisingly simple to set up).

capabilities: While CAP_SYS_ADMIN does tend to be overpowered many of the other capabilities do not make sense in the context of containers and can be disabled. a perfect example of this is CAP_MAC_ADMIN and CAP_MAC_OVERRIDE , as these will allow you to disable the SELINUX or equivalent protections above.

does tend to be overpowered many of the other capabilities do not make sense in the context of containers and can be disabled. a perfect example of this is and , as these will allow you to disable the SELINUX or equivalent protections above. cgroups: While the resource limiting cgroups are handy the device cgroup is handy for restricting what a container can do to a device node for those times where you may want to only hand out read only access to a dev node but root int he container could possibly change the permissions/ this cgroup enforces a second level of checks that can be overridden.

With these security subsystems in Linux you should be able to implement an overlapping security model that is still resilient should one of these features be unavailable (eg seccomp, selinux and cgroups can all be used to limit what device nodes can be created)

Namespace under Linux have not been without their fair share of security vulnerabilities, most notably around the User namespaces feature which can be used to get around some of the restrictions placed on users. Below is a list of most of the CVE's that have been reported against the Linux kernel as well as some preemptive patches which highlight the potential security complications that namespaces can introduce.