Istio is a service mesh, which, in general, exist as a compliment to container orchestrators (e.g. Kubernetes) in order to provide additional, service-centric features surrounding traffic management, security, and observability. Istio is arguably the most popular service mesh (using GitHub stars as a metric). This blog post assumes working familiarity with Kubernetes and microservices, but does not assume deep familiarity with Istio.

This blog post takes a critical look at a subset of Istio’s features. In particular, we will focus on Istio’s security features that are used to control services’ network interactions: restricting egress traffic, restricting ingress traffic, and requiring a TLS client certificate from a service’s callers (i.e. mutual TLS). We will walk through lab-based examples drawn from Istio’s documentation in order to concretely illustrate the limits of each of these features, possible misconfigurations, and what can be done to ensure operators’ intended security goals are met. For example, this post highlights an aspect of Istio’s AuthorizationPolicy object whose behavior is changing in the forthcoming Istio 1.5 release, which will allow operators to express mesh-level authorization rules that cannot be overridden at the namespace or service-level.

Table of Contents

Preliminaries

Istio Primer

Before digging into these security-related features, a brief overview of Istio’s internals is required. The figure below, which is from Istio’s documentation, depicts Istio’s core components:

The main takeaway from the figure above is that Istio is separated into two logical pieces: the control plane and the data plane, which exist as Kubernetes objects (e.g. Pods).

The data plane, which is where this blog post will focus, is made up of services (i.e. applications) deployed to the service mesh. Istio joins services to the data plane at deploy-time by injecting a sidecar proxy ( istio-proxy ) and init container ( istio-init ), which are separate containers within a service’s Pod, in addition to a service’s container that actually executes the service’s application code. Pods in Kubernetes are comprised of one or more containers that share Linux namespaces, such as the network namespace. Init containers are containers that run and exit before others in the Pod. In order to “hook” network traffic coming in to or out of a service, istio-init uses a Bash script and the CAP_NET_ADMIN capability to set iptables rules in the service’s Pod’s network namespace, such that incoming and outgoing TCP traffic is forwarded to the istio-proxy container. The forthcoming Istio 1.5 release will switch to a Golang-based iptables setup program, rather than the current Bash one.

The istio-proxy container is based on the Envoy proxy, and it communicates with the control plane, which programs the proxy at runtime to realize various Istio features, such as path-level authorization rules (an AuthorizationPolicy ), egress restrictions, ensuring those calling the proxy’s associated service present a TLS client certificate, etc.

The control plane comprises the core components of Istio itself, which primarily exist to coordinate and configure the istio-proxy sidecars that accompany each application container (i.e. the data plane). For example, Istio’s Pilot component realizes the specific Envoy configurations for the istio-proxy sidecars, and propagates this to the relevant sidecars. See the following figure from Istio’s documentation:

Reference Cluster

This post uses a local Istio environment for its examples, by way of minikube , a VM-based tool for creating local Kubernetes clusters. The instructions below can be used by readers who wish to follow along with interactive examples.

All installation instructions assume an Ubuntu 18.04.4 LTS workstation capable of virtualization, from which the KVM-based minikube driver is used, though minikube works with other operating systems and hypervisors, and so do the command line tools needed for Kubernetes and Istio. See the following pages for installing KVM-related tools, minikube , kubectl , and istioctl on Ubuntu:

In addition, execute minikube config set vm-driver kvm2 to ensure the KVM-based hypervisor is used. And check that all tools have been added to your PATH by ensuring you can execute kubectl version , minikube version , and istioctl version .

For clarity and not necessity, the Kubernetes cluster and Istio mesh will be rebuilt several times during this post, such that a clean slate is achieved. The following steps can be performed to create the reference cluster each topic uses, which are based on this walkthrough in Istio’s documentation. The walkthrough guides operators through setting up automatic mutual TLS, and enforcing various policies over mixed workloads. It will also be noted inline when a new reference cluster should be created for an example:

Create a new Kubernetes cluster: minikube start --memory=16384 --cpus=4 --kubernetes-version=v1.14.2 Install Istio via its Operator: kubectl apply -f https://istio.io/operator.yaml Apply IstioControlPlane object:

kubectl apply -f - <<EOF apiVersion: install.istio.io/v1alpha2 kind: IstioControlPlane metadata: namespace: istio-operator name: example-istiocontrolplane spec: profile: demo EOF

Wait 120 seconds for Istio components to be created. Apply the Istio demo profile, and enable automatic mutual TLS

istioctl manifest apply --set profile=demo \ --set values.global.mtls.auto=true \ --set values.global.mtls.enabled=false

Create the “full” namespace, which will house Pods that have Istio sidecars and TLS client certificates: kubectl create ns full Create two Pods in the “full” namespace, and wait for them to be created. Sample Pods from Istio are used.

kubectl apply -f <(istioctl kube-inject -f istio-1.4.5/samples/httpbin/httpbin.yaml) -n full kubectl apply -f <(istioctl kube-inject -f istio-1.4.5/samples/sleep/sleep.yaml) -n full

Exec into the “sleep” Pod within the “full” namespace

kubectl get pods -n full kubectl exec -it SLEEP_POD_NAME /bin/sh -n full -c sleep

curl the “httpbin” Pod within the “full” namespace from the “sleep” Pod in the same namespace, and note the X-Forwarded-Client-Cert response header, populated with the “sleep” Pod’s client certificate metadata (i.e. auto mTLS is working).

/ # curl httpbin.full:8000/headers { "headers": { "Accept": "*/*", "Content-Length": "0", "Host": "httpbin.full:8000", "User-Agent": "curl/7.64.0", "X-B3-Parentspanid": "1fc3b2728d362128", "X-B3-Sampled": "1", "X-B3-Spanid": "a535e44fdde2ffa5", "X-B3-Traceid": "27388be1c233991e1fc3b2728d362128", "X-Forwarded-Client-Cert": "By=spiffe://cluster.local/ns/full/sa/httpbin;Hash=c4b21a3e248da71ce5053cb225132758b6276eb63a272955c81384908d076be5;Subject=\"\";URI=spiffe://cluster.local/ns/full/sa/sleep" } }

Create the “legacy” namespace, which will house Pods that do not have Istio sidecars or mutual TLS ceritificates: kubectl create ns legacy Create two Pods in the “legacy” namespace

kubectl apply -f istio-1.4.5/samples/httpbin/httpbin.yaml -n legacy kubectl apply -f istio-1.4.5/samples/sleep/sleep.yaml -n legacy

Within the “sleep” Pod in the “legacy” namespace, curl the “httpbin” service in the “full” namespace, and note the lack of a X-Forwared-Client-Cert response header, indicating that the “sleep” Pod within the “legacy” namespace does not have a client certificate.

kubectl get pods -n legacy kubectl exec -it -n legacy SLEEP_POD_NAME /bin/sh

/ # curl httpbin.full:8000/headers { "headers": { "Accept": "*/*", "Content-Length": "0", "Host": "httpbin.full:8000", "User-Agent": "curl/7.64.0", "X-B3-Sampled": "1", "X-B3-Spanid": "22684ed7b0cce0da", "X-B3-Traceid": "0d58bf15f67e67d422684ed7b0cce0da" } }

Security Analysis

The following subsections have security analysis of the following Istio features:

IPv6

Mutual TLS

Egress restrictions on Pods

The overall perspective is geared towards a security concious operator who is interested in Istio’s security-related features for limiting their workloads’ network interactions. This post attempts to highlight features’ limits, issues that may be encountered in practice, and mitigation steps.

IPv6 Inbound Packet Processing Issue

IPv6 support in Istio is considered “experimental”, and appears to lack tests.

In at least one case, istio-proxy ‘s treament of IPv6 traffic is different than that of IPv4 traffic in a security-critical way. If a wildcard is specified for a Pod’s ports to-be-redirected to Istio, which is the default set during Istio installation, the IPv4 iptables logic for incoming packets will:

Jump packets from the PREROUTING chain (i.e. packets have just been received) to an an intermediate table, ISTIO_INBOUND . Apply rules for hardcoded (i.e. SSH) and user-specified exclusions that jump those packets to RETURN (i.e. stop Istio-related packet processing). Apply a catch-all rule that redirects all remaining inbound traffic to istio-proxy , by jumping to the ISTIO_IN_REDIRECT table, which has rules to redirect traffic to the proxy’s port.

The IPv6 logic takes those first two steps, but the IPv6 logic does not redirect all remaining traffic to Istio; in this case, the rules within the intermediate ISTIO_INBOUND table are exhausted, and the packet will continue through and beyond the PREROUTING chain (i.e. it is not processed further by Istio).

The figure above shows the source code from Istio’s Bash-based iptables setup script, which takes the steps described above. On the left is the IPv6 inbound packet logic, and on the right is the IPv4 logic. As shown and described, if a wildcard is specified for ports to-be-redirected to istio-proxy , the IPv6 code on the left applies the hardcoded SSH and user-supplied port exceptions, and then moves on to installing rules related to outbound packets. In the same branch, IPv4 code on the right applies the hardcoded SSH and user-supplied port exceptions, but then installs a catch-all rule that redirects all remaining traffic to istio-proxy .

Thus, users may expect all inbound IPv6:TCP traffic to be handed to istio-proxy , but the default value of a wildcard for INBOUND_PORTS_INCLUDE does not have this effect for IPv6:TCP traffic. Do note that if users specify a list of ports to be intercepted (not the default), these rules are properly installed by the IPv6 logic; this is the non-default else branch here, as the default is the if branch because a wildcard for INBOUND_PORTS_INCLUDE is the default.

Mutual TLS

One popular aspect of Istio is its ability to automatically provision TLS certificates and allow operators to enforce the use of TLS mutual authentication between services. See the “High-level architecture”, “Istio identity”, and “PKI” sections at https://istio.io/docs/concepts/security/#authentication-architecture for a primer on the components Istio uses during certificate issuance and use.

For this section, we will deploy two simple services into two different Kubernetes namespaces: full , which will host two Pods joined to the Istio mesh and with TLS client certificates; and legacy , which will host two Pods not joined to the Istio mesh (i.e. lack an istio-proxy sidecar) and without TLS client certificates. See the reference cluster section for setup instructions. These examples mirror the following page in Istio’s documentation: https://istio.io/docs/tasks/security/authentication/mtls-migration/.

We will then walk through four ways in which we can unexpectedly reach services inside the full namespace from the legacy namespace (i.e. lacking a TLS client certificate), despite various, associated mutual TLS-based restrictions on such interactions. And we will describe one way in which we can unexpectedly reach Istio’s control plane services from the legacy namespace, despite enabling mutual TLS for control plane services.

AuthenticationPolicy Pitfall

As discussed in the Istio documentation page we have been following (to create the reference cluster), an operator may, after some testing, desire to enforce mutual TLS. That is, Pods in the “full” namespace should only receive mutual TLS connections.

To do so, the documentation page we have been following says: “Run the following command to enforce all Envoy sidecars to only receive mutual TLS traffic”:

kubectl apply -f - <<EOF apiVersion: "authentication.istio.io/v1alpha1" kind: "MeshPolicy" metadata: name: "default" spec: peers: - mtls: {} EOF

Further, this command is described elsewhere in Istio’s documentation:

This policy specifies that all workloads in the mesh will only accept encrypted requests using TLS. As you can see, this authentication policy has the kind: MeshPolicy. The name of the policy must be default, and it contains no targets specification (as it is intended to apply to all services in the mesh).

After running this kubectl command, we can return to the “sleep” Pod in the “legacy” namespace and observe it can no longer access the “httpbin” Pod in the “full” namespace.

curl httpbin.full:8000/headers curl: (56) Recv failure: Connection reset by peer / #

Operators may call it a day here, so it is worth noting that this does not prevent access to TCP ports that are not specified in “httpbin”‘s associated API objects (i.e. what we added to the mesh with kubectl apply -f <(istioctl kube-inject -f istio-1.4.5/samples/httpbin/httpbin.yaml) -n full ).

Exec into the “httpbin” Pod within the “full” namespace

kubectl get pods -n full kubectl exec -it HTTPBIN_POD_NAME /bin/bash -n full -c httpbin

Inside “httpbin”, start a new copy of its web service on a random TCP port

root@httpbin-76887cf49c-w5xvb:/# ps auxxwww USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.1 85980 24900 ? Ss 06:58 0:00 /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent root 8 0.0 0.2 130360 34716 ? S 06:58 0:00 /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent root@httpbin-76887cf49c-w5xvb:/# /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:1337 httpbin:app -k gevent& [1] 20 root@httpbin-76887cf49c-w5xvb:/# [2020-02-20 07:26:55 +0000] [20] [INFO] Starting gunicorn 19.9.0 [2020-02-20 07:26:55 +0000] [20] [INFO] Listening at: http://0.0.0.0:1337 (20) [2020-02-20 07:26:55 +0000] [20] [INFO] Using worker: gevent [2020-02-20 07:26:55 +0000] [23] [INFO] Booting worker with pid: 23

Note the “httpbin” Pod’s IP address. Traffic to the “unspecified” port 1337 will not be properly routed if we use the service’s name. It is common for insecure workloads, such as legacy instances in a cloud VPC, to have a direct route to a Pod’s IP.

kubectl get pods -owide -nfull NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES httpbin-76887cf49c-w5xvb 2/2 Running 0 32m 172.17.0.5 minikube <none> <none> sleep-65b6f697c4-pflm5 2/2 Running 0 32m 172.17.0.7 minikube <none> <none>

Return to a shell in the “sleep” Pod within the “legacy” namespace, and note you can interact with the newly started service in the “full” namespace’s “httpbin” Pod, despite applying an AuthenticationPolicy whose documentation states that its effect is “all workloads in the mesh will only accept encrypted requests using TLS”.

kubectl exec -it SLEEP_POD_NAME -n legacy /bin/sh / # curl 172.17.0.5:1337/headers { "headers": { "Accept": "*/*", "Host": "172.17.0.5:1337", "User-Agent": "curl/7.64.0" } } / #

In order to prevent this behavior, an AuthorizationPolicy must be applied to the “httpbin” Pod. If an AuthorizationPolicy is applied to a Pod, traffic is denied by default, so traffic to “unspecified” (i.e. not in the workload’s API objects) is blocked. The following AuthorizationPolicy , noted in Istio’s documentation, will only allow requests from authenticated principals (in our case callers with TLS client certificate):

kubectl apply -f - <<EOF apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: httpbin namespace: full spec: selector: matchLabels: app: httpbin rules: - from: - source: principals: ["*"] to: - operation: methods: ["GET", "POST"] EOF

Once applying this AuthorizationPolicy , we can see that requests to the the “full” namespace’s “httpbin” Pod’s unspecified TCP port from the “sleep” Pod in the “legacy” namespace fail:

curl 172.17.0.5:1337/headers curl: (52) Empty reply from server / #

In this section we saw how following Istio’s prescribed AuthenticationPolicy for enforcing mesh-wide mutual TLS among services fails to protect unspecified TCP ports in an application container (e.g. an application server’s debug interface). To block unauthorized access (i.e. those lacking a TLS client certificate) to such unspecified TCP ports, you must also apply to workloads an AuthorizationPolicy specifying a principals block; a wildcard for principals will allow any caller with a valid TLS client certificate.

AuthorizationPolicy Pitfalls

For the sake of discussion, assume we are now discussing a mesh without any AuthenticationPolicy objects, and in a new reference cluster.

It is worth noting that AuthorizationPolicy objects only have an ALLOW action, and are additive in effect. Concretely, say an operator deploys the following AuthorizationPolicy to ensure that all requests are authenticated with mutual TLS:

kubectl apply -f - <<EOF apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: allow-mtls namespace: istio-system spec: action: ALLOW rules: - from: - source: principals: ["*"] EOF

This is mesh-wide in effect, since it specifies Istio’s root namespace in its metadata field. After applying this AuthorizationPolicy , accessing the “httpbin” Pod in the “full” namespace from the “sleep” Pod in the “legacy” namespace fails:

/ # curl httpbin.full:8000/headers RBAC: access denied/ #

However, the following AuthorizationPolicy is subsequently added, which targets the “httpbin” Pod in the “full” namespace, and unintentionally lacks a rule saying the request must be from an authenticated principal:

kubectl -n full apply -f - <<EOF apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: allowget namespace: full spec: selector: matchLabels: app: httpbin rules: - to: - operation: methods: ["GET"] EOF

The result is that the “sleep” Pod in the “legacy” namespace can now access the “httpbin” Pod in the “full” namespace, because there is a policy that allows the traffic.

curl httpbin.full:8000/headers { "headers": { "Accept": "*/*", "Content-Length": "0", "Host": "httpbin.full:8000", "User-Agent": "curl/7.64.0", "X-B3-Sampled": "1", "X-B3-Spanid": "12836faa4a645f69", "X-B3-Traceid": "7f4e7dcaa944040612836faa4a645f69" } } / #

Technically, in this specific case, the AuthenticationPolicy object listed earlier in AuthenticationPolicy Pitfall would prevent this unintentional exposure because it will be evaluated as well. However, this is not a general mitigation, because AuthorizationPolicy objects are not just used in the context of mutual TLS enforcement (i.e. authentication)—they can be used to express, for example, path- or port-level authorization rules, so the inability to prevent certain namespace or Pod-level exemptions at the mesh-level is something to keep in mind.

As a more holistic fix, the forthcoming Istio 1.5 release will have a DENY action in the AuthorizationPolicy object, which will let operators express DENY rules that will be evaluated before ALLOW rules, and if matched on will cause the traffic to be dropped before any ALLOW rules are evaluated. In this case, an operator can, for example, install DENY action AuthorizationPolicy objects in Istio’s root namespace (defaults to istio-system ), and these will not be “overridden” by AuthorizationPolicy objects with ALLOW rules targeting a specific namespace or workload, like we saw above. Operators may wish to use Kubernetes RBAC over Istio’s CRDs to ensure such mesh-level DENY rules cannot be modified by unauthorized users; for example, the Istio root namespace, which defaults to istio-system , is where AuthorizationPolicy objects with mesh-level effects live (they are still a namespace-scoped CRD), and so unauthorized users should not have a RoleBinding to privileges within this namespace. See this post from Istio for more details on the DENY action that will be available in Istio 1.5.

AuthenticationPolicy objects follow this “more specific” precedence, too, so even if there is a MeshPolicy (cluster-scoped CRD) enforcing mutual authentication, it can be overridden at a specific workload (with a namespace-scoped AuthenticationPolicy CRD). This is of interest in multi-tenant scenarios, where a tenant (e.g. business unit A) may have Kubernetes RBAC bindings to modify objects in their Kubernetes namespace (e.g. namespace-scoped AuthenticationPolicy CRDs), but where operators wish to enforce a mesh-wide baseline that cannot be overridden by any tenant.

To see a concrete example of the “more specific” AuthenticationPolicy behvior, in a new reference cluster, apply the following AuthenticationPolicy , which enforces mutual TLS authentication for all workloads in the mesh:

kubectl apply -f - <<EOF apiVersion: "authentication.istio.io/v1alpha1" kind: "MeshPolicy" metadata: name: "default" spec: peers: - mtls: {} EOF

In the “sleep” Pod within the “legacy” namespace, note you can no longer interact with the “httpbin” Pod in the “full” namespace:

curl httpbin.full:8000/headers curl: (56) Recv failure: Connection reset by peer / #

Then, apply the following AuthenticationPolicy , which targets a specifically named workload (“httpbin”) in a specific namespace (“full”), and does not require callers to be authenticated principals:

cat <<EOF | kubectl apply -n full -f - apiVersion: "authentication.istio.io/v1alpha1" kind: "Policy" metadata: name: "overwrite-example" spec: targets: - name: httpbin EOF

Now, in the “sleep” Pod within the “legacy” namespace, send a request to the “httpbin” Pod within the “full” namespace, and note this is successful, indicating that the MeshPolicy requiring mutual TLS authentication for all workloads (a cluster-scoped CRD) has been overridden by a more-specific AuthenticationPolicy object (a namespace-scoped CRD):

/ # curl httpbin.full:8000/headers { "headers": { "Accept": "*/*", "Content-Length": "0", "Host": "httpbin.full:8000", "User-Agent": "curl/7.64.0", "X-B3-Sampled": "1", "X-B3-Spanid": "94070a3cfead3064", "X-B3-Traceid": "f200c41073fdc90194070a3cfead3064" } }

In this section we saw how precedence in evaluation of AuthorizationPolicy and AuthenticationPolicy objects can result in an operator’s intended mesh-wide security baseline being overridden at the namespace or service-level, and the latter requires less broad Kubernetes RBAC permissions.

UDP Pitfall

As we discussed in the beginning of the blog post, Istio only “hooks” TCP traffic, and it does not support UDP traffic, so UDP traffic will be passed to services inside the Pod, unless NetworkPolicy objects are added to drop this traffic. What may not be clear to users is this is the case even if the aforementioned mutual TLS-related AuthorizationPolicy and AuthenticationPolicy objects are defined:

In a new reference cluster, apply the following:

kubectl apply -f - <<EOF apiVersion: "authentication.istio.io/v1alpha1" kind: "MeshPolicy" metadata: name: "default" spec: peers: - mtls: {} EOF

kubectl apply -f - <<EOF apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: allow-mtls namespace: istio-system spec: action: ALLOW rules: - from: - source: principals: ["*"] EOF

Based on the examples we walked through earlier, this combination of an AuthenticationPolicy and AuthorizationPolicy applied to all workloads in the mesh should be our best bet for ensuring access to any workload requires a valid TLS client certificate.

Then, inside the “httpbin” Pod within the “full” namespace, install netcat , start a UDP listener, and keep the terminal window open:

kubectl get pods -n full kubectl exec -it -n full HTTPBIN_POD_NAME /bin/bash -c httpbin apt update; apt install -y netcat nc -luv -p1111

Note the “httpbin” Pod’s IP address:

kubectl get pods -n full -owide kubectl get pods -n full -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES httpbin-76887cf49c-sh2hz 2/2 Running 0 4m23s 172.17.0.6 minikube <none> <none> sleep-65b6f697c4-2qwjh 2/2 Running 0 4m18s 172.17.0.7 minikube <none> <none>

Inside the “sleep” Pod within the “legacy” namespace, connect to the “full” namespace’s “httpbin” Pod’s UDP service started in step two:

kubectl get pods -n legacy kubectl exec -it SLEEP_POD_NAME -n legacy /bin/sh nc -vu 172.17.0.6 1111

Note that the “httpbin” Pod in the “full” namespaces receives what is typed into the “legacy” namespace’s “sleep” Pod’s netcat :

controlPlaneSecurityEnabled Bypass

Istio has a configuration directive, controlPlaneSecurityEnabled , which will be enabled by default in the forthcoming Istio 1.5 relase. This enables, but does not enforce, mutual TLS among control plane components (e.g. Pilot, Galley). And in fact, plaintext connections from a Pod lacking an istio-proxy sidecar to control plane components are allowed despite also applying (to e.g. Pilot) an AuthenticationPolicy and MeshPolicy object enforcing the use of mutual TLS (as we walked through earlier for “httpbin”), indicating that AuthorizationPolicy objects have no effect on the control plane components. For example, a Pod without an istio-sidecar proxy or TLS client certificate is still able to interact with Pilot’s debug endpoint, which allows retrieving various information from the cluster, including the Envoy configuration of istio-proxy sidecars in the mesh. See this GitHub issue for more details and reproduction steps.

Egress Restrictions

Another common ask of Isito is how to limit egress traffic. That is, how can I prevent my Pods from sending traffic outside of the mesh (e.g. the Internet)? Istio has several blog posts on this topic. They all have some variation of the following language:

A malicious application can bypass the Istio sidecar proxy and access any external service without Istio control. — https://istio.io/docs/tasks/traffic-management/egress/egress-control/#security-note

Istio gives detailed advice for how this can be addressed—namely, Kubernetes NetworkPolicy objects must be used in conjunction with Istio’s own egress restrictions, and Istio’s documentation gives concrete examples of the NetworkPolicy objects to apply.

However, continuing with the theme of this post, the curious reader may be wondering: how difficult is it for a “malicious” application container to bypass the istio-proxy sidecar? Several examples come to mind, and we can illustrate them by creating a new reference cluster and then drawing on examples from this Istio guide; that is, we will enable in-Istio egress restrictions and will not apply NetworkPolicy objects.

Create a test Pod with: kubectl apply -n default -f <(istioctl kube-inject -f istio-1.4.5/samples/sleep/sleep.yaml) Ensure Istio’s global.outboundTrafficPolicy.mode is set to ALLOW_ANY , which does not enable any restrictions in the istio-proxy sidecar around external traffic:

kubectl get configmap istio -n istio-system -o yaml | grep -o "mode: ALLOW_ANY" mode: ALLOW_ANY mode: ALLOW_ANY

Send some traffic to the Internet with curl from the “sleep” Pod in the “default” namespace, note that is successful, and also install bind-tools :

kubectl get pods -n default kubectl exec -n default -it SLEEP_POD_NAME /bin/sh -c sleep curl -L https://google.com apk update; apk add bind-tools

Change Istio’s global.outboundTrafficPolicy.mode to REGISTRY_ONLY , which instructs the istio-proxy sidecar to limit traffic to known services (such as this example). It may take about one minute to propagate to istio-proxy .

kubectl get configmap istio -n istio-system -o yaml | sed 's/mode: ALLOW_ANY/mode: REGISTRY_ONLY/g' | kubectl replace -n istio-system -f -

Attempt to send traffic to the Internet from the “sleep” Pod in the “default” namespace again, and note this is blocked:

/ # curl -L https://google.com curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to google.com:443 / #

Next, we will see various way this REGISTRY_ONLY egress restriciton can be bypassed by a “malicious” application container.

UDP-based Bypass

Use dig to perform a DNS query (i.e. UDP traffic) from the “sleep” Pod in the “default” namespace, and note that this traffic successfully reaches the Internet, despite the REGISTRY_ONLY value for global.outboundTrafficPolicy.mode . This is the same istio-proxy sidecar behavior discussed earlier—it will not “hook” UDP traffic, and thus it is not limited by the REGISTRY_ONLY filtering performed by the istio-proxy sidecar because this traffic is not rerouted to the istio-proxy sidecar.

dig @8.8.8.8 www.google.com ...snip... ;; ANSWER SECTION: www.google.com. 279 IN A 216.58.195.68 ;; Query time: 6 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Thu Feb 20 10:19:42 UTC 2020 ;; MSG SIZE rcvd: 59

ICMP-based Bypass

Use ping to send ICMP traffic from the “sleep” Pod in the “default” namespace, which is not handled and is passed through, just like the UDP example above, bypassing REGISTRY_ONLY .

/ # ping -c1 8.8.8.8 PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: seq=0 ttl=53 time=7.289 ms --- 8.8.8.8 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 7.289/7.289/7.289 ms / #

UID 1337-based Bypass

The istio-proxy sidecar container runs in the Pod under UID 1337 , and since it is common for containers to have the CAP_SETUID capability (Docker, Kubernetes’ default container runtime, does not drop it), we can change to UID 1337 from our application container, for which no restrictions are programmed by the proxy:

In the same “sleep” Pod, where a curl from the application container is blocked by REGISTRY_ONLY (see step five here), add a new user whose UID is 1337 (create a bogus password to satisfy adduser ):

/ # adduser --uid 1337 envoyuser Changing password for envoyuser New password: Retype password: passwd: password for envoyuser changed by root / # su - envoyuser sleep-65b6f697c4-hbcch:~$ curl -L https://google.com <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="tex ...snip...

In most cases, we need to be root in order to use non-dropped capabilities, because Docker does not support ambient capabilities, but running processes in Pods as root is common, and one could argue it is the default, as, unless otherwise specified, Kubernetes will use the container image’s metadata to pick the genesis UID and Docker images default to root .

Besides oft-granted capabilities, a Pod’s spec can specify a UID to run its containers under, and operators can place restrictions around specified UID s with Pod Security Policies. A common pattern is to use the MustRunAsNonRoot directive, which enforces the use of a non-zero UID . However, with this in place, a user can still specify a UID of 1337 to bypass Istio’s REGISTRY_ONLY egress mode. The behavior is the same as above, but no capability is required, as the user relies on the runtime to run the workload under the 1337 UID that is not beholden to REGISTRY_ONLY egress restrictions.

Namespace and Capability Esoterica

There are a number of ways where Linux capabilities and namespaces accessible to containers in a Pod could allow for a bypass of Istio’s REGISTRY_ONLY egress enforcement.

If an application container has the CAP_NET_ADMIN capability, it can re-program iptables rules in its network namespace, thus bypassing the istio-proxy ‘s REGISTRY_ONLY enforcement, because it can reprogram the iptables rules noted earlier in order to have TCP traffic bypass the istio-proxy sidecar proxy (where REGISTRY_ONLY is enforced).

The CAP_NET_ADMIN capability is not granted by Docker or Kubernetes by default, so an operator would have to add this capability to their application container. The CAP_NET_ADMIN capability would not allow a Pod to bypass Kubernetes NetworkPolicy objects, since those reside in the Node’s network namespace, which cannot be accessed unless a Pod has enough privileges to break out entirely (e.g. CAP_SYS_ADMIN is required to issue setns(2) ).

The directive shareProcessNamespace , whose Kubernetes API server support is enabled by default, can be specified in a Pod spec (it is not used by default), which puts all containers in a Pod into the same PID namespace. Usage of this directive cannot be restricted with Pod Security Policies, and, if specified in a Pod’s spec, can allow containers to read each others’ filesystem objects and environment variables. Kubernetes’ documentation points out a specific case relating to the /proc/$pid/root link, which “means that filesystem secrets are protected only by filesystem permissions.” More concretely, processes that share a PID namespace can reference (but not necessarly access) each others’ entries in procfs . However, procfs has varying permissions checks on its entries. For example, the /proc/pid/root entry is a symbolic link to a given process’s root file system, which can be used to access files whose permissions are broad (i.e. a reference to another container’s filesystem is achived). However, the /proc/pid/root entry itself has more strict permissions requirements, which are detailed in the proc(5) man page: access to the /proc/pid/root link requires PTRACE_MODE_READ_FSCREDS , whose requirements are detailed in the ptrace(2) man page:

A general Kubernetes container will not have the CAP_SYS_PTRACE capability with respect to step three in the figure above (it is not granted by default), so in order to access a sidecar’s filesystem through the /proc/pid/root link, it must run under the same UID as the sidecar’s processes, by specifying it in a Pod’s spec as described earlier, or by changing to the sidecar’s UID with CAP_SETUID (granted by default in Docker and Kubernetes). A container lacking CAP_SETUID (through set evaluation or no_new_privs ) and running as a different UID than its sidecar (even UID 0 ) generally cannot access said sensitive entries (e.g., /proc/pid/root , /proc/pid/environ ) in procfs . So shareProcessNamespace is not an interesting way to bypass the istio-proxy sidecar’s egress restrictions, because in order to explore the sidecar’s filesystem, for example, we already need to exist under its UID 1337 through CAP_SETUID or through the RunAsUser Pod spec directive, and if this is the case, no egress rules are applied anyway, as we saw earlier when we switched to UID 1337 . Do note that /proc/pid/cmdline does not require such ptrace(2) -related access checks, and can be read by any other process in the PID namespace. This is useful to keep in mind, as tokens and passwords are often passed as flags at a program’s invocation, so they will show up in /proc/pid/cmdline .

Conclusion

In this blog post we used lab-based examples to illustrate common misconceptions and pitfalls encountered when using Istio to limit workloads’ network traffic:

“unsupported” traffic, such as UDP and ICMP, makes it through to workloads ostensibly protected by Istio’s most strict mutual TLS configuration, and operators have no indication this is occuring (e.g. an Istio log message on inbound UDP packets when mutual TLS is enforced); Kubernetes NetworkPolicy objects must be used to drop such traffic.

IPv6 is not well-supported in Istio, and appears to have security-relevant issues; for the time being it should be classified as “unsupported” traffic.

Setting just the prescribed mesh-wide AuthenticationPolicy for mutual TLS leaves TCP ports not specified in a workload’s API objects accessible; you need an AuthorizationPolicy to touch the workload for these “unspecified” ports to be blocked.

Operators cannot express mesh-wide AuthorizationPolicy rules that cannot be superseded by namespace or Pod-specific exemptions, but support for this case with the DENY action is coming in Istio 1.5.

AuthenticationPolicy objects also have a precedence in their order of operations, and namespace or workload-scoped policy is always able to override cluster-scoped policy; if an operator wishes to prevent such overriding exemptions, prevent access to AuthenticationPolicy CRDs altogether with Kubernetes RBAC.

To robustly enforce external egress controls, Kubernetes NetworkPolicy objects must be used, since they will handle all types of traffic, and cannot be bypassed by a misbehaving Pod, as they exist in the Node’s network namespace, which cannot be accessed through privileges commonly granted to Pods

The shareProcessNamespace Pod spec directive can have unintended effects if security is enforced in a sidecar container, and especially when care is not taken to restrict usage of capabilities (e.g. CAP_SETUID , SYS_PTRACE ), which can allow dangerous interactions with the sidecar container’s procfs entries visible to the application container due to the shared PID namespace; usage of the shareProcessNamespace directive cannot be restricted by a Pod Security Policy, but it can be with an Admission-related tool, such as k-rail, though in general the security boundary between containers in a Pod is weak, since various Linux namespaces are shared by design.

Istio’s controlPlaneSecurityEnabled directive does not behave as most users would expect, and, when enabled, still allows plaintext, non-authenticated traffic to various control plane endpoints; operators can mitigate this by using NetworkPolicy objects to limit connections from untrusted workloads into the Istio control plane, but note that ports in-use by Istio’s control plane are not always fully documented (e.g. Pilot will use port 15012 in Istio 1.5, but this is not documented).

An overall point to recognize is Istio, generally, must be paired with Kubernetes NetworkPolicy objects in order to achieve a secure result.

Like many tools with a large set of flexible features, Istio receives security review and support, but its features’ behaviors may not be intuitive, or behave as one would expect. As a general point, it is always good to evaluate new infrastructure features (and especially security-related ones) with a critical eye, and to understand the internals of such features’ implementations, such that their limits and potential applications can be correctly understood.

Share this: Twitter

Reddit

LinkedIn

Facebook

