tl;dr - Check out Kubernetes features like PodSecurityPolicy , NetworkPolicy. There are also Fantastic fun analogy-laden Talks from Kubecon 2017 (Austin) and Kubecon 2018 (Copenhagen). CIS standards for Kubernetes clusters exist. There are also companies like Aqua that produce tools like kube-bench that let you test your clusters CIS benchmarks. It’s also important to remember to secure the machine as well as the Kubernetes cluster – so the usual Unix server administration advice applies.

While Kubernetes’s various setup methods kubeadm , kops , kubespray are doing all they can to create secure-by-construction clusters, Kubernetes is an ever-evolving platform and cluster operators must do their part to keep it as safe as possible.

This is by no means a definitive guide, but rather a stepladder of features (similar to the “operational intelligence” concept I discussed a while ago) that I think good security conscious cluster operators will have considered.

Level 0: Get educated

There are some amazing talks on Kubernetes security given by people smarter/more experienced than me, and you should listen to what they have to say:

In addition to this, the Kubernetes Documentation and a firm grasp on the concepts and how they interconnect, and what pieces make up the Kubernetes platform is obviously necessary.

Level 1: Pre-Cluster (setup, basic machine security)

What this looks like depends heavily on which operating system you’re using to run your cluster, but the idea here is to secure the underlying operating system as much as you can.

One corner you can cut while actually improving security and one of them is to use a minimal (and if possible security-focused) distribution like Container Linux or maybe even Alpine Linux. For a lot of reasons (one of the biggest being kubeadm support), you might choose Ubuntu/Debian/Fedora, which are fine too of course – the attack surface is a bit bigger, but they can also be hardened. There are lots of guides related to hardening ubuntu, so it’s a bit hand-wavy but at least skim through those articles ASAP to at least get an idea.

It should be obvious, but no matter which OS you use, ensure that password-auth SSH is disabled ASAP. Just about the only port you should need open at the beginning (at the point before you set up a cluster) is probably SSH’s default port 22. I’m not a big believer in changing the default port for SSH (smacks of security-by-obscurity)but you could do that as well, to something like 2223 or something. Usually within seconds of bringing a machine online, you’ll have bad actors trying to probe the server to figure out what kind of software it’s running and if it’s vulnerable-by-default.

Ensure your system receives proper security updates. Ideally this should be automated, but I generally just login and keep the system updated every once in a while (which obviously doesn’t scale, but I’m not managing a large amount of servers just yet).

Ensure your TLS certs are setup properly. Kubernetes by default performs all it’s communication over TLS-protected channels, and this is fantastic, but if you’re doing it manually make sure you don’t mess up the steps (likely things won’t run if you do).

Level 2: In-Cluster

Now that you’ve got your hopefully secure setup going, now it’s time to do what you can inside the ecosystem to ensure

Keep Kubernetes up to date. Kubernetes moves fast, and unfortunately this often means going through messy upgrades. I seem to remember most articles I’ve seen espousing the migrate-machines at a time with commands like kubectl drain and bring new nodes into the cluster one at a time. Some tools like kops upgrade / kubeadm upgrade will offer a nice upgrade path, so they are easy to use but otherwise, this can be very painful. There’s of course the Kubernetes documentation on how to do upgrades.

Use Kuberentes’ RBAC authorization systems, and heavily restrict permissions for different users/accounts. While this might be harder/impossible if you’re on an old version of Kubernetes, if you’re upgraded past ~v1.8 where RBAC reached GA (Generally Available) status, you should be able to enable it easily (if you’re on a version where it isn’t the default already).

Use NetworkPolicy to restrict intercommunication inside the cluster. It’s always a good idea to practice “defense in depth”, which is just a fancy way to say “make sure to have multiple backup blocks for attackers” – assume your. It might also make sense to make sure to restrict access to the platform specific services (ex. AWS’s EC2 Metadata Service). This is often pretty easy if you just set up a deny-all rule in all your namespaces on the cluster, to make communication-enablement explicit.

Here’s an example of a NetworkPolicy I use with my canal (Calico + Flannel) enabled cluster to deny all traffic in a namespace:

--- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny namespace: <your namespace here> spec: podSelector: matchLabels: {} policyTypes: - Ingress - Egress

Use PodSecurityPolicy to restrict what pods can do. There are a lot of tweakable options you can employ to allow Pod s to perform certain actions and/or access sytem resources.

Configure securityContext for your pods.

Limit access to your internal API server. Obviously, you’re going to want to make sure that not every pod can access the internal API server – then they can actually attack it directly after compromising some external-facing service you were running.

Ensure the default service token isn’t used. You can prevent the mounting oof the default access token by setting automountServiceAccountToken: false in your Pod configurations (or using an AdmissionController to add it all the time).

Use/create advanced AdmissionController s. AdmissionController s give you the ability to help solve a piece of the security policy by integrating with the platform itself, check out the documentation for more information.

Use a tool like Notary (with IBM’s Portieris) to ensure your container pipeline isn’t producing compromised containers/data. An easier way to achieve the same result might be to write a smaller custom AdmissionController that actually only whitelists certain containers, and maybe hook it up to your registry.

Consider using alternate runtimes for sketchy workloads. Sometimes it might make sense to just run potentially dangerous workloads (from services that might be quite dynamic) in more isolation than normal. Depending on which container engine you’re using, there are many options:

Consider enabling Mutual TLS authentication between your services. This can be very complicated to set up, but you do have some options, you can use side-car proxies like Envoy or bigger proxies like LinkerD, or tools that tie everything up like Istio (make sure to take a look at SPIFFE identity standard & Spire). While this might seem like a LOT of trouble for very little value, it can make all the difference depending on the industry you serve (for example if HIPAA compliance is mandatory), and in the case where an attacker is inside the cluster already.

Level 3: Outside-Cluster

This, IMO, is pretty high up in the tree (far from the low hanging fruit), but is actually really easy to get to, thanks to automation.

Take a look at the CIS benchmark for Kubernetes. Unfortunately it’s not as straightforward to download the latest benchmark from their site, but hopefully you can figure it out. There’s information in those

Run advanced toolsets that implement the CIS benchmark. There are a bunch of tools that actually run the CIS benchmark against your cluster:

Take a look at advanced tools made by security companies like Aqua or Twistlock. If your cluster is important enough, it may make sense to pay some of these companies. I’m not at all associated with them but have just seen them present and contribute to Open Source security efforts in the past so figured I’d note them here.

Run simple tools like nmap , ensure that your Ingress / Service s are not exposing more than they should. Some mis-entered/copy-pasted configuration might be exposing your Prometheus cluster to the world rather than just on a local network.

Level 4: Humans again

Do everything you can to protect your clusters from human error, and attacks from outside your cluster as well. This can mean so much that I consider it level 4 – and I’m not sure it ever actually ends. I’m using this as a catch-all, but some examples:

Get your developers to write security conscious code

Use tools like distroless to lessen the attack surface of containers

Encourage your devleopers to learn about and use GnuPG where appropriate

Make it easy to rotate credentials for your services (automation is great, maybe take a look at my previous post on the “MakeInfra” pattern, or use an actual well-vetted/excellent solution like Ansible)

Level 5: ???

There’s always more to do. Becoming secure and staying secure is a moving target. Sorry :(

Set up intrusion detection services like TripWire (or something else?)

???

Wrap-up

Hopefully you’ve enjoyed some of these pointers. I do ask that you take this post with a grain of salt – while I am confident what I’ve written is accurate to the best of my ability, I don’t run any production safety-critical systems, and I think you should reserve most of your without-salt consumption for people who can give more concrete experience/case-studies.