What Kubernetes users should know about the rkt container engine

• By Jonathan Boulle

One year of rkt 1.x: pods, Kubernetes, and OCI

Since the release of rkt 1.0 at the beginning of this year, the project has powered ahead with over 20 new stable versions on a regular release cycle. The goal of rkt has always been to provide a container engine that is not only reliable but also composable and standards-driven, allowing easy operation and integration with other best-in-class tools in the container ecosystem. Today we wanted to provide an update on the ongoing work to integrate rkt with two such projects - the Kubernetes cluster orchestration system, and the Open Container Initiative (OCI) container standards - and chart the course for rkt's future in the year ahead.

rkt and pods: why is rkt’s design different?

From the start rkt was built as a pod native container engine. This means that the basic unit of execution is a pod, linking together resources and user applications in a self-contained environment. rkt's pods naturally follow the same pod concept popularised by Kubernetes.

To provide a smooth experience, rkt takes care to set up application context in a way that resembles as much as possible the usual Linux environment. Because of this, and the facilities that a pod-native container engine provides, an application developer doesn’t have to worry about:

Bundling multiple applications in a single container image

Supervising and orchestrating additional helper processes

Running apps as PID1 with unexpected signal semantics and children duties

At the same time, rkt offers a comfortable tool for SREs and systems administrators by supporting daily operational needs:

Logging, cgroups, and service integration with existing tools (eg. runit or systemd)

Isolating unrelated pods and workloads, segregating them as decoupled Linux processes

Customizable, modular networking configuration, decoupled from the container runtime

The technical implementation of rkt is even internally aligned with this concept, as the engine harnesses modular components, like the Container Network Interface (CNI), systemd for service management, and systemd-journald for logging. These modules are independently developed by dedicated communities to handle specific tasks.

A kubelet-first runtime with CRI

Kubernetes is a cluster orchestration system, and upstream Kubernetes is a central component of CoreOS Tectonic. Kubernetes runs all applications in pods of containers, and it achieves this by delegating runtime tasks to a container engine. As Kubernetes has matured, users have requested the ability to use different execution engines like rkt or Hyper in a Kubernetes cluster. Earlier this year we introduced the initial version of rktnetes, a project to add support for rkt as the first alternative container engine. This process involved a considerable amount of work to make the Kubernetes Kubelet codebase less Docker-specific, removing assumptions and special cases from the source to form a truly modular abstraction.

This work combined with community effort and discussion led to the creation of the Container Runtime Interface (CRI), an API specification for low-level interaction between the kubelet and container runtimes. For more details on the history of CRI, see the original proposal. For more about what this means for rkt and Kubernetes, read on.

Introducing pod sandboxes

The introduction of the CRI into Kubernetes brings interesting new possibilities for pods, with granular control over the lifecycle of individual applications. This increased flexibility enables a variety of new use cases, like updating single applications within running pods, or dynamically injecting debugging capabilities. Notably, it implies that pods are now mutable, and that empty pods can both be created, and continue to exist after their applications exit. In CRI terms, this concept is called a “pod sandbox”.

Rkt has already introduced support for pod sandboxes. A new, experimental subcommand (currently called rkt app and enabled by an RKT_EXPERIMENT_APP environment variable) allows the creation and manipulation of mutable pod sandboxes. It can start a new empty environment, and then allow users to add, start, stop, and remove applications within a running pod sandbox. This was first introduced in rkt 1.19.0 and it is currently on its way to stabilization, with more documentation following soon.

Interactivity and attach functionality

Multiple applications are typically run side-by-side in a pod, with each application’s output and error stream (stdout/stderr) used for logging purposes. Historically, rkt has taken care of multiplexing the pod’s I/O to the outside world by using systemd-journald. Because of this, there was only limited support for attaching to applications directly, or redirecting their I/O.

The Kubernetes CRI allows for more sophisticated scenarios, like piping input to applications and attaching to running processes. To satisfy these requirements we contributed streaming support to systemd itself and are in the process of adding the following selectable I/O modes to rkt:

interactive: the application runs under the TTY of the invoking parent process, i.e. an interactive user terminal. Being limited to at most one app per pod it allows the user to interact with the running container.

TTY: the application runs with a newly allocated TTY, with full terminal capabilities. This allows to attach to already running applications.

streaming: the application’s output or input is supervised by a separate multiplexer running in the pod context. This allows for attaching/detaching/piping, even without a dedicated TTY. The TTY demystified post touches into terminal-related topics with more details.

logging: the application’s output is supervised by a separate logging process running in the pod, and its output lines are processed as individual log entries. This is the original default mode for applications which don’t require interactivity.

null: the application stream will simply be closed, and any output discarded.

These modes are configurable per application, and individually for each app’s stdin/stdout/stderr streams, to offer the most flexibility. Look for this new feature in an upcoming rkt release in the near future.

An OCI-first runtime

At CoreOS we believe properly designed and employed standards are key to unlocking the power of open source. Software standards mean developers and teams can write software tools to compose and interoperate in predictable, consistent ways, without being beholden to particular implementations.

The Open Container Initiative (OCI) is a Linux foundation effort to create a truly portable software container. In the last twelve months we have seen the two key OCI specifications, "image-spec" and "runtime-spec", march towards their important 1.0 releases.

OCI image spec: the new standard image format

Pursuing our commitment to open specifications, rkt is currently transitioning its internal architecture to the new OCI standard - starting with the OCI Image Specification. rkt developers are ramping up efforts for native support of OCI images, including fetching, storing, and running. The upstream roadmap details technical adjustments that will happen in the following weeks, and a tracker project maps the ongoing effort status.

However, as we work through this transition internally, users can rest assured that our compatibility guarantees will still apply. Moving forward, we recommend the building and usage of OCI images, but the rkt 1.x series will continue to fully support the retrieval and execution of ACIs the same as today.

The overarching goal is to remove internal ACI translation and embrace OCI natively as soon as the image-spec is finalized and the format stable enough for production usages.

OCI runtime spec: executing Linux containers

In parallel to the image format, OCI is developing the so-called runtime-spec for describing the runtime execution environment that container engines should provide. This specification is being developed in close tandem with runc, a shared community effort at creating a reference implementation of the specification.

To provide the best possible support for the OCI runtime specification, rkt is gaining better integration with runc as its internal application executor. This is made possible by rkt’s modular architecture, which allows runc to be integrated as an alternative stage1 environment. This architecture allows the new implementation to be developed and utilized without any disruptive impact on the users, but it will allow rkt to reduce feature duplication and be better aligned with the rest of the ecosystem on its OCI journey.

rkt is home to innovative thinking about how a container engine works in the world of orchestrated clusters. From the CNI abstraction for making network interaction modular, to the CRI formalizing the way clusters interact with the container engine on each member node to run applications securely, simply, scalable, and reliably, rkt has been a center of cutting-edge code and a source of productive discussion with the wider community. Join us in improving both Kubernetes and the standards that define the containers and pods that package software today. If you’re new to rkt, check out these introductory rkt videos on our blog. If you’re a veteran container cluster admin or developer, take a look at the rkt documentation to start experimenting, or clone the rkt repository on github and start hacking.