Today on The InfoQ Podcast, Wes talks with Joe Beda . Joe is one of the co-creators of Kubernetes. What started in the fall of 2013 with Craig McLuckie, Joe Beda, and Brendan Burns working on cloud infrastructure has become the default orchestrator for cloud-native architectures. Today on the show, the two discuss the recent purchase of Heptio by VMWare, the Kubernetes Privilege Escalation Flaw (and the response to it), Kubernetes Enhancement Proposals, the CNCF/organization of Kubernetes, and some of the future hopes for the platform.

Show Notes

What did you get called away from QCon last year?

02:35 I had to make a confidential trip to Barcelona to announce the acquisition of Heptio by VMWare.

03:00 VMWare acquiring Heptio was the stars aligning in the right way at the right time.

03:10 The biggest thing we looked at: we view ourselves as a cloud native company, and one of the promises of cloud native is taking advantage of cloud to the fullest, in the most flexible way.

03:35 You can decouple moving to cloud from taking advantage of cloud patterns, regardless of where you are running.

03:45 We saw Heptio as the perfect partner to continue this mission.

03:50 VMWare had dreams of being their own public cloud; they’ve since adjusted strategy and see themselves as a partner for enterprises as they go on the journey to cloud.

04:20 I was blown away with the amount of energy and dedication - that community of VMWare users and admins is really vibrant.

04:35 To a large degree, there’s a lot of overlap between that and the folks at KubeCon.

04:45 One of the exciting opportunities for both me and for VMWare is to create a bridge between these worlds.

04:50 In doing so, we bring some of the empathy for the real problems that enterprises face into the cloud native world.

05:00 We also bring some of the efficiencies, the velocity, the relationship between the infrastructure and developers, to the VI admin world.

What does the near future look like for Kubernetes for VMWare customers today?

05:25 There’s a set of vendors that are supporting Kubernetes on top of VMWare today.

05:40 VMWare itself is a platform, so others can build tools to run Kubernetes themselves.

05:50 We want to make VSphere a great place to run Kubernetes, and this is happening across a bunch of different dimensions.

06:00 We have VMWare developers working upstream leading some of the efforts of breaking some of the cloud providers out of Kubernetes so that they can be extensible.

06:15 We are taking the VCenter provider and continuing to enhance it and make it a great complement to running Kubernetes on VSphere.

06:20 We’re also brainstorming other ideas - Heptio coming in to the company has been a catalyst for thinking about how VSphere can shine through into Kubernetes.

What was it about Docker that made you think we needed an orchestrator?

07:30 So much of this was rooted in how Google ran its internal system Borg.

07:45 Craig and I started Google Compute Engine, and it was controversial inside the company - because it’s obvious now that EC2 is the bedrock that so much of the cloud gets built upon.

08:00 At the time, it was controversial because nobody at Google used VMs – everything happened in containers.

08:10 Google was instrumental in creating the kernel capabilities for isolation of containers using cgroups.

08:25 As we did GCE it was controversial because Google didn’t need it - so we decided to build something better that transcended VMs.

08:35 We built out GCE and I was showing it to a manager where you could create something and run it, and his response was “So, now what?”

08:50 That’s been the story of cloud in so many ways - we’ve got to the point of getting to a VM, and there’s so many choices of now what.

09:00 Google had an opinion of the “now what” due to experience with the Borg, so a big piece of Kubernetes and the early motivation was about delivering a Borg-like experience.

09:20 If Kuberentes had only worked on the Google Compute Platform then it wouldn’t have had the effect it has if it was able to run on multiple platforms.

09:35 The way that Google ships and runs software isn’t the same as everyone else, and so we looked as Docker and saw it was as a piece of the puzzle.

09:50 At that time, Docker swarm wasn’t a thing, but the real value was when we used it across a cluster of computers.

10:00 We felt an enormous amount of pressure to get the ideas out there before someone else did, because it seemed so obvious.

Did you expect it to become the standard container orchestrator?

10:15 I don’t think anybody expected it to be as successful as it has been.

10:25 The ingredients to make something like Kubernetes happen were way beyond the initial set of committers.

10:35 It’s the community that makes it happen.

Why was Kubernetes 1.0 released with the CNCF?

10:55 The first thing to recognise is that we released Kubernetes as open-source a year before it reached 1.0.

11:05 We learned a lot in that year - the power of partnerships, feedback about how tied it was to Google.

11:20 In order for Kubernetes to hit its potential, it needed some sort of vendor-neutral governance.

11:30 It was mostly Craig who drove the creation of the CNCF.

11:40 There was friction between a lot of the big players, and we really wanted to create a way of having folks come together and discuss cloud native problems larger than just Kubernetes.

11:50 It doesn’t guarantee that everyone’s going to get along, but it does reduce some of those friction barriers.

What do you think helped Kubernetes become the orchestrator of choice?

12:15 I think that the orchestration wars had multiple factors.

12:25 Having community ownership of the project and a vibrant community was a critical piece of the puzzle.

12:30 It was partly due to the fact that the CNCF existed.

12:35 If you compare Mesos, which was an Apache project, the community of vendors supporting it was a single company.

12:45 When you go and talk to customers, Mesos and MesosSphere were really seen as synonymous.

12:55 Kubernetes hit the Goldilocks in a number of ways: the right abstractions between a raw container, a VM and a PaaS.

13:05 We also hit the Goldilocks between something that was a big project to install and manage versus something that was so developer focussed and hits problems promoting to production.

13:20 It’s hard to pinpoint one thing, but there was a lot of things that went into it.

13:25 A lot of things seem obvious in hindsight, but it wasn’t along the way.

13:35 There were a lot of watershed moments along the way - for me, early when we launched Kubernetes we reached out to Amazon, so we wanted them to come and support it as a cloud neutral.

13:55 Their response was that they support things that customers ask for: a polite rejection.

14:00 At re:invent, where Amazon’s EKS was announced, it was clear that enough customers were asking that the couldn’t ignore it any more, and they had to follow through.

14:15 That was the end of the beginning - we were entering a phase after that where all of the cloud vendors were on board.

What do you think about the community and platforms that are building on top of Kubernetes now?

14:40 That was always the plan: some of the early friction in the container community was between Kuberentes and Docker.

14:55 Some of that was whether Docker was aimed as end users, or something that was to be built upon.

15:00 What we’ve seen over time is part of Docker has been extracted out into different parts, like containerd, oci - so we’ve seen it being broken apart a bit.

15:25 We saw Kubernetes as something to be built upon, not necessarily the end goal.

15:35 The fact that the YAML configuration is explicit and verbose that makes it easier to build on top of Kubernetes.

15:50 Brendan pushed earlier on for doing key extensibility features in Kubernetes.

16:00 The custom resource definitions became part of the extensibility mechanism of Kubernetes.

16:15 We’re seeing people building on Kubernetes patterns to automate more and more things.

What was Kubernetes 1.13 privilege escalation fix?

16:45 We’re learning lessons along the way - that particular flaw was narrower than it could have been, but it was very serious.

16:55 The community response to it was good - there was a lot of work from a lot of folks setting up the security response on this.

17:15 Everything went as planned for dealing with a serious security issue.

17:30 A big part of the feature set of upcoming releases is that we’re seeing a slow down in new features coming in.

17:40 In some ways, this is deliberate: we want Kubernetes to be boring.

17:55 Boring is good for enterprise - it means that it’s predictable, you can rely on it.

18:05 We also want to make sure that Kubernetes as an ecosystem can still move fast.

18:10 When you look at the feature burn-down for 1.13 or 1.14, a lot of the features are extensibility features that people can build on top of.

What did the security response look like for the flaw?

18:35 We have committees, special interest groups (SIGs) and working groups.

18:40 Committees, which we have a few of, has closed membership and conversations - everything else is very open.

18:45 The three committees we have is the steering committee, code of conduct, and security.

19:05 There’s a mailing list to send mail to, and a small set of folks that evaluate and co-ordinate the reaction, and a trusted set of vendors and highly trusted users that get some early warnings.

19:20 A lot of based of this on other open source projects, but this has been an evolving story over time, but when something happens everything snaps into place and we can mount a reasonable timely response.

What does the Kubernetes release process look like?

19:50 We’re continuing to refine what things look like.

20:00 There’s running the release itself, such as what the release dates are going to be, what the feature freeze date is, how do we cut the tests, generating the release notes.

20:15 They are either true volunteers or folks working on behalf of company.

20:20 There is a value within the Kubernetes community of chopping wood and carrying water.

20:35 We want to avoid the tragedy of the commons that happens in some open-source projects; driving the release process is a lot of work, but it’s a great example of people working together.

20:45 Longer term, we have introduced the SIG architecture, which is an overarching architectural review that looks at changes that sweep across Kubernetes.

21:00 As part of that, we’ve been introducing a Kuberentes Enhancement Proposal process (KEPs) which is a way for folks to make a change with a process and a review.

21:20 That’s a work in process, but it’s a way of making the project boring.

Does it borrow from Java’s JEPs?

21:30 There’s a lot that we looked at: JEPs from Java, PEPs from Python, RFCs from Rust - there’s a lot of prior art.

21:40 Each community has its own foibles and its own uniqueness, so you adapt it to something that makes sense.

What’s coming up in 1.14?

21:55 I’ve not been tracking it as closely as I probably should - a lot of it has been extensions, stability, refactoring.

22:05 We’re going to see more progress on breaking out cloud providers into being external components, because that’s one of the things that is built in and one of the reasons someone might want to fork the codebase.

22:15 We’re continuing to see investment into Custom Resource Definitions to extending Kubernetes.

22:25 One example of what Heptio/VMWare have been working on for a couple of releases are audit hooks, so you can get a list of all the actions that have been going on in the API server.

22:35 This is currently possible by changing the command line flags on the API server, but that means it’s not accessible to those running with managed services, because they generally don’t give you access to modify the command line flags.

22:45 Being able to dynamically register these things from within the cluster is going to enable a whole new ecosystem taking that audit stream and being able to do useful things with it.

What is the relationship between CNCF’s Technical Oversight Committee and Kubernetes?

23:25 There’s a confusing number of acronyms and relationships.

23:30 The CNCF is the overarching foundation, supported in large part by members who pay dues and events such as KubeCon.

23:35 There’s a governing body, which elects people to the Technical Oversight Committee (TOC).

23:45 The goal of the CNCF in general is to curate a set of projects and provide support to those projects.

23:50 The TOC decides which projects are part of CNCF and also has some critical input as to how those projects are supported.

24:00 It’s the first set of elections since the CNCF was formed, so the first turnover of the board with incoming ideas.

24:15 The TOC is not as diverse as we think it should be: all men, concentrated in North America and more vendor-heavy than users would like to see.

24:30 That’s something we’re actively discussing and looking to improve that.

24:40 A couple of seats on the TOC are appointed by the TOC itself, and so we’re looking at how we can tackle diversity to fill those posts.

What have you seen people struggle with when moving to production with Kubernetes?

25:15 As you move to production, it’s important to weigh what your requirement are, what your skillset is, or whether you’re going to spend money or equity to make it work.

25:30 The benefit of Kubernetes being open-source is that there’s plenty of folks out there using Kubernetes in production without any help from anybody outside of community resources.

25:40 At KubeCon, I meet people who are coming out of a retailer in a South American country and who are running Kubernetes in every store, or for back-end systems.

25:50 As we look world-wide, there’s a democratisation of technology that open-source provides.

25:55 If your needs are relatively simple, we’ll tell people to buy an off-the-shelf service, like GKE.

26:10 Those types of offerings are great if you find your needs are cookie-cutter.

26:15 What we find is that as you deal with more and more complex enterprises, they have all evolved like the Galapagos islands with a unique ecosystem.

26:25 As they look to adopt a technology like Kubernetes, often the off-the-shelf solutions won’t suffice, and they’ll need some flexibility to adapt it.

26:35 Sometimes this comes in consistency between clouds, sometimes there are needs for specific network requirements - and in those cases, engaging with a vendor makes a lot of sense.

26:50 From VMWare/Heptio’s point of view, we have the Heptio HKS, which we’re looking to fold into the VMWare portfolio - a toolkit for deploying and managing Kubernetes, along with expertise of customer reliability engineering team and field engineering that we can provide.

27:15 If you’re looking for something more opinionated, VMWare PKS is a solution that brings in a lot of VMWare’s opinionated technology.

27:30 There’s a lot of different vendors that have offerings with different levels of support, automation and customisability.

27:45 The challenge is that there’s so many choices; the advantage is that there are so many choices.

27:50 It’s good that there will be something there that works for you, but it may take some time to evaluate those options.

What does the next three years look like for Kubernetes?