CONFERENCE SUMMARY

KubeCon NA 2019: Top Ten Takeaways (Part 1)

Engineering workflows, platforms, and related tooling

The Datawire team have all returned home after another great KubeCon NA, and we thoroughly enjoyed our time in sunny (or rainy?) San Diego. We all attended lots of great sessions, had many insightful conversations at the booth, and also presented several sessions. At our evening team dinners we shared our reflections and insights on each day, which I have distilled into a top 10 list of key takeaways from KubeCon NA 2019:

Kubernetes needs a Ruby on Rails moment Vendors have moved higher up the stack (“build versus buy” analysis is worth doing) The edge(s) are becoming increasingly important The cluster is a unit of deployment: cattle vs pets moving up the stack Workflows may be diverging? GitOps vs UI-driven config Reducing SRE toil is important, and directly impacts the bottom line, productivity, and fun Multi-cloud is big business “Deep systems” (microservices) create new problems in understandability, observability, and debuggability Cloud platforms and tooling are embracing event-driven computing The CNCF community is amazing, but the tech is challenging to navigate as a beginner

I’ll cover points 1–5 in this post, and will focus on the takeaways related to engineer workflow, platforms, and related tooling.

I will publish the remaining takeaways in a follow-up post, and this will focus on the operations, new architecture paradigms, and end user perspective.

Abhay Saxena rocking the KubeCon keynote stage

1. Kubernetes needs a Ruby on Rails moment

In Bryan Liles’ thought provoking Thursday morning keynote, “In Search of the Kubernetes ‘Rails’ Moment”, he perfectly channelled the theme of many conversations I engaged in and overheard at KubeCon. Bryan, a senior staff engineer at VMware, argued that Kubernetes’ “complexity is necessary complexity”, but we need to make it easier for folks new to the space to get started, and equally important, make it easier to do the right thing. This can include reducing the amount (and complexity) of YAML configuration required, enabling self-service configuration and — as fantastically argued by Ian Coldwater in their keynote, “Hello From the Other Side: Dispatches From a Kubernetes Attacker” — providing sane defaults around security.

On a related theme, at this KubeCon there was definitely a focus on sharing end user stories, which was great to see. However, a lot of these stories came from large organisations with an early adopter skew in regard to cloud native technologies. My Datawire colleague, Alex Gervais, and I were discussing this, and we couldn’t help but recognise that a lot of these end users were building on CNCF technology such as Kubernetes and Envoy, but doing so from first principles. For example, Tinder had implemented their own service mesh using Envoy proxy, Walmart created their own fleet management control plane, and several organisations had created their own continuous delivery pipelines and logging and metrics capture platforms.

Don’t get me wrong, I very much appreciate these organisations sharing their stories and learnings, and I also understand that many of them simply had to build this tech, as it wasn’t available or the existing options didn’t scale to their requirements. However, I’m not sure that enough of the rationale for building versus buying (or adapting from OSS) was provided. I also didn’t hear very much detail about the people, effort, and resources required for designing and implementing their own tech. My concern is that people new to the space might have walked away from the event thinking that they need to do the same, when now there are many great higher-level tools available within the open source and commercial space. Which nicely leads into my next takeaway…

2. Vendors have moved higher up the stack (“build versus buy” analysis is worth doing)

Dan Kohn’s opening keynote framed this takeaway perfectly: the CNCF is home to foundational components within the cloud native ecosystem, but it is up to end users to build, buy (or adapt from OSS), and integrate the technologies in order to create a platform. And on this topic, it’s worth mentioning that every organisation needs a platform, whether they realise it or not. I believe that many of us that have been part of this ecosystem since the beginning think this is obvious, but it isn’t — particularly for folks new to the cloud native space. I know this next point is anecdotal, but I’ve almost lost count of the amount of times over the past three years that someone has conflated Kubernetes and PaaS in conversations with me.

As I wandered around the vendor booths at KubeCon I was struck at the evolution from the past two NA events. Back in Austin in 2017 there were a lot of vendors offering storage, networking, and security components for Kubernetes. At this KubeCon there were a lot of vendors focusing across a range of domains that were offering complete solutions. For example, there were products providing: control planes (for cloud, networking, storage, security etc); continuous delivery pipelines; observability suites (with some focusing on the elusive “single pane of glass”); serverless platforms; machine learning pipelines; and more.

I had a series of great chats while touring the sponsor booths, and many vendors appeared to be truly interested in understanding needs and working alongside people to determine if their products are a good fit. Of course, there were a few that were not interested in my requirements, were over-promising with their solutions, or were simply not engaged with the community, but this is an easy(ish) signal for qualifying them out. Matt Klein also made a similar observation in regards to the dangers of vendor FUD on Twitter.

On this note, I believe that some of the most important skills for a team or organisation migrating to the cloud native space are the ability to identify their top level requirements (e.g. IaaS vs KaaS vs PaaS, multi-cloud or not, current priorities etc), recognise which of their workflows may need to change (e.g. a move from “gated” promotion of application deployments to the self-service release of functionality), and determine the tradeoffs for “build versus buy” in relation to components and solutions.

For folks interested in this topic, there was a great discussion on the build versus buy trade offs in a talk by Dave Sudia and Toni Rib, “Balancing Power and Pain: Moving a Startup From a PaaS to Kubernetes”. My colleague, Rafi Schloming and I also had a fascinating chat with Dave at the Datawire booth after his talk. He mused that “there tends to be two phases for scaling infrastructure at a startup: 1) throw money at it!! And 2) stop throwing money at it!!”. Dave and Toni were happy to buy services when migrating their applications at GoSpotCheck to Kubernetes, when spending time and money building something similar was not core to the business. However, they did state that the pricing had to scale in a rational and cloud-friendly way.

The key takeaway from my conversation with Dave was to prioritise the implementation of components and solutions within the stack. Citing examples of organisations in 2018 building their own service mesh and delivery pipeline platforms, he mentioned that the cloud native ecosystem moves very fast, and often if you can wait six months, you can build upon better-formed tech or buy a more complete solution e.g. Linkerd and Harness.

3. The edge(s) are becoming increasingly important

From Walmart’s keynote about running Kubernetes clusters at the edge, to the Futurewei team’s presentation of the edge IoT platform KubeEdge, and to my colleague Flynn’s presentation about lessons learned testing the Ambassador Edge gateway, there were two things we could all agree on at this KubeCon: the term “edge” is somewhat overloaded, but it is very important.

Broadly speaking, I heard three definitions of the edge at KubeCon:

Device edge e.g. IoT, tablet, phone etc

Point of Presence (PoP) edge

Kubernetes edge, or more traditionally: the data center, cluster, or network edge

The device edge tends to focus on IoT devices in the field (the remote “edge”) connecting into the cloud or a Kubernetes cluster. The PoP edge is all about running Kubernetes clusters close to end users, and I’ll cover this more in the next takeaway. The Kubernetes edge is focused on the traditional space of the networking edge stack, and is concerned with getting end user traffic to the backend business services, and doing so in a safe and reliable way. This includes technologies like an OSI layer 3–7 load balancer, web application firewall (WAF), edge cache, reverse proxies, API gateway, and developer portal.

At Datawire we have recognised there are two primary challenges with network edge technologies and API gateways when adopting Kubernetes. We’ve also presented several strategies for managing APIs and the Kubernetes network edge. Anecdotally I believe there is quite a bit of confusion around the high-level networking concepts within the cloud native ecosystem, and in several chats I had at the booth, folks were conflating service mesh with API gateway, and often Istio was involved in the conversation. Granted, the Istio service mesh does have a concept of an Ingress gateway, and so does somewhat supports both use cases. However, the design of Istio is skewed towards service mesh principles and service-to-service comms, and as it currently stands it only offers the gateway as a single component of a typical edge stack.

My colleague Rafi and I agreed that there will most likely be a convergence between the ingress (north-south) and service-to-service (east-west) Kubernetes communication technologies as the requirements are broadly aligned, but at the moment the control planes are typically focusing exclusively on the edge or inter-service communications. This is one of the reasons why we’re focusing primarily on the edge use case with the Ambassador Edge Stack, and choosing to integrate closely with service meshes like HashiCorp’s Consul, Buoyant’s Linkerd, and Google’s Istio.

4. The cluster is a unit of deployment: cattle vs pets moving up the stack

Several organisations talked about running Kubernetes clusters at the point-of-presence (PoP) edge. At previous KubeCons we’ve seen Chick-fil-a present the details of how they run a Kubernetes cluster at each restaurant location (which also means that they are running 2000+ Ambassador API gateway deployments). I’ve already mentioned Walmart’s keynote session several times, where they demonstrated their federated architecture that ran a Kubernetes cluster in each store in order to manage the point of sale (checkout) devices. Rafal Kowalski from Grape Up also talked about running Kubernetes clusters in a car, via the talk “Kubernetes in Your 4x4 — Continuous Deployment Directly to the Car”. One common theme throughout these talks was the need for increased automation for cluster management and the ability to rapidly rebuild clusters from scratch.

If we take a step back for a moment, we can see the “cattle versus pets” meme has been around since the emergence of cloud-managed VMs. At this KubeCon the meme appears to have spread to the notion of Kubernetes clusters. For those new to the space, the general idea behind this meme is that instances should be treated as replaceable “cattle”, rather than irreplaceable “pets” e.g. if a VM gets corrupted you simply shut it down and replace it, rather than try to nurse it back to health.

The key to the creation of this meme was the emergence of the cloud vendor-based APIs/SDKs that allowed scripted VM creation and deletion, and automated configuration management tooling like CFEngine and Puppet that support automated provisioning. In a similar spirit with Kubernetes, the evolution of the Cluster API and release of Helm 3 are providing automated cluster creation and effective application provisioning. These topics were well covered at KubeCon in “SIG Cluster Lifecycle (Cluster API)” and the “Helm 3 Deep Dive” sessions.

5. Workflows may be diverging? GitOps vs UI-driven config

There appears to be a bit of a workflow schism emerging within the ecosystem. This takeaway was driven largely from booth and hallway conversations, but I heard lots of folks extolling the virtues of GitOps and declarative configuration (with the usual suspects from Weaveworks, and also user stories from Fidelity and CERN), and an almost equal amount of folks wanting a UI to drive configuration. Now obviously these two approaches aren’t mutually exclusive, but they each do favour different types of workflow.

I partly wonder if this is driven by the arrival of more engineers new to the cloud native ecosystem; in response to a question asked in the opening keynote, at least 50% of attendees put up their hand to indicate that this was their first KubeCon. A fair number of conversations I had at the booth were with folks new to cloud native technology and Kubernetes — they were attending the conference to learn more — and one thing I noticed was that people were applying their existing workflows, patterns, and mental models to this new space. This is quite understandable, but it does mean that there can be a tendency of using new technology in old ways, which often doesn’t realise the full potential — you’ve got to know how to break the existing rules effectively in order to get the most benefit.

As an example, if we look at network proxies, one comment my colleague Alex made was that engineers were wanting to use the CNCF Envoy Proxy in the same ways as they currently used something like NGINX or HAProxy. Some people we chatted to saw Envoy as a drop in “cloud native” proxy replacement, and they wanted to use the same control planes and workflow as they do now, which often don’t support important cloud native principles such as self-service, standardised APIs, and integration with other cloud native standards and tech.

One way this divergence could be addressed is by using UIs as an alternative way to generate declarative config that can be fed into a GitOps pipeline, while still supporting the creation of config manually or via other automated processes.

Wrapping up the top 10 takeaways: part 1

This first part of the KubeCon NA 2019 takeaways series is focused on engineering workflows, platforms, and related tooling. Workflows can be somewhat of a hidden part of the cloud native story, but they have a massive impact on engineering productivity and delivering value to end users. The platforms and tooling you choose should be based on the most appropriate workflow for your organisation, and not the other way around. KubeCon was full of user stories, and there were many tooling announcements. Hopefully this article has shone some light on where to look.

The second part of the top 10 takeaways from KubeCon NA 2019, is available now! You can read it here.

You can learn more about the Ambassador API gateway at www.getambassador.io and you can sign up to get notified of the upcoming release of the Ambassador Edge Stack.

If you have any questions, please join the team in the Datawire OSS Slack.