Container misconceptions

TL;DR: containers are not VMs; stop calling everything “Docker”; don’t use Kubernetes for tiny projects, use Swarm instead; Kubernetes will only solve your org’s problems if you are willing to go all-in, anything in between will fail the same way it failed before.

Update (Tue, 9 May 2019 18:54 UTC): fixed statement regarding “Docker images”. There is such thing as a Docker image, other container image formats do not support the layered nature of Docker images. Thanks to Esli Heyvaert for pointing it out!





The rise in popularity of container technology has come with many misconceptions, maybe because people tend to describe new concepts with old ones, instead of starting from scratch.

I’m no master either, if there’s a reason I’m writing this is because I’ve fallen into each and every one of these.

Containers are NOT lightweight VMs

If they look like VMs, and behave like VMs, they might as well be… VMs!

When I got introduced to containers, I was told the phrase “containers are lightweight VMs” once a day. It’s easy to explain the behavior of containers using that analogy, but believing that has some pretty bad consquences, like people making their images off full-blown operating system images.

Wrong! Containers are not virtual machines.

Think of containers as stripped-down, lonely processes.

They run like any other process, but their namespace (what they see) is very limited. They have their own filesystem, their own process namespace, their own network namespace, limits on hardware resources… Yet they all share the same kernel, because they are just processes.

This way, you’ll make your image as a single binary, and maybe some configuration, but nothing else. No shell, no package manager, etc.

Docker this, Docker that

Some people call everything “Docker”. I’ve subconsciously fallen into this one many times. Docker image, Docker container, Docker registry…

But no more!

“Docker” should only be used to talk about these:

Docker, Inc , the company behind the Docker Engine;

, the company behind the Docker Engine; the Docker Engine , a client-server API for administering containers;

, a client-server API for administering containers; the Docker daemon : dockerd , the server component of the Docker Engine, the runtime that wraps around the Kernel’s technologies that power containers ( chroot , cgroups, etc.) and exposes an HTTP endpoint for the Docker utility to talk to;

: , the server component of the Docker Engine, the runtime that wraps around the Kernel’s technologies that power containers ( , cgroups, etc.) and exposes an HTTP endpoint for the Docker utility to talk to; the Docker utility : docker , the client component of the Docker Engine that talks to a Docker daemon through its HTTP endpoint;

: , the client component of the Docker Engine that talks to a Docker daemon through its HTTP endpoint; the Docker image format : Docker’s own container image specification;

: Docker’s own container image specification; and the Docker Hub, a container registry hosted by Docker, Inc.

Everything else, is a “container something” or, in some cases, an “OCI something”.

Dockerfiles

There’s two things under the name of “Dockerfile”:

A manifest file for building a container image; and

for building a container image; and the syntax for writing such file.

The Dockerfile file is a file that tells the runtime what to do in order to build a container image. I like it a lot, as it gives me (the service administrator) a summarized list of instructions to build a given package, instead of reading the sometimes too extensive documentation about it.

A container image is just a tarball, though, so while the Dockerfile gives us a very easy way of designing images, the Dockerfile is not an essential component of container technology.

Orchestrators

A container orchestrator is a package that automates the administration of services powered by containers. It handles clustering of compute nodes into a single logical unit, spinning containers up and down based on many different inputs, upgrades and downgrades, provisioning resources, managing the software-defined network fabric, etc.

Kubernetes

Kubernetes, the most popular orchestrator of them all, does all of the above and more. It was born at Google, and later donated to the CNCF. It has the bad reputation of having a very steep learning curve, and having been through that, I believe I know why.

For the last 1-2 years, we have been bombarded with messages telling us “move to Kubernetes, what are you even waiting for?”, as if Kubernetes were the magical solution to all of our problems.

And while it may not be the solution to all of our problems, it can abstract some very complicated tasks from us.

But Kubernetes isn’t targeted to individuals. Some have promoted Kubernetes for personal projects, and while that’s feasable, it’s like killing a fly with a cannon.

Kubernetes is aimed at orgs, of any size, but orgs; that run many different applications, and only those orgs that are willing to move to microservices.

There’s no point in moving to K8s if your application is still a monolith, you’ll just have a distributed monolith instead.

The point is, we have told people of any background that the solution to their problems resides in Kubernetes, making them take the effort of learning Kubernetes from scratch, without any prior distributed systems knowledge, and without any idea whatsoever of the how and why of microservices.

That has gotten K8s the bad reputation of being too hard to understand. And it’s true, Kubernetes is not easy, but it was never meant to be easy, it’s datacenter-scale technology!

Swarm

Swarm is Docker, Inc.’s orchestrator. It started development five years ago. It’s built into the Docker Engine, which makes it the same to run it on development machines as in production servers.

In my opinion, it is much less powerful than Kubernetes, and I would vote against using it in a business environment.

That said, I’m a happy admin of a single-node Swarm running all of my personal services at home. But that’s it. I wouldn’t use it for anything with more than 1-2 nodes, but for those applications, I feel is the right tool for the job.

Nomad

Nomad is Hashicorp’s horse in the orchestrator race.

I haven’t had the time to use it, but for what I’ve read it’s much different to the other two.

I can’t comment on it, but I’m waiting for the day it pops up as the first thing in my to-do list.





There’s more orchestrators, I know, but I feel those three represent the three categories of orchestrators we have in the market right now: the big ones, the small ones and the weird ones.