Someone I know had worked at an organization that needed to scale out their OpenShift clusters/footprint, they were constrained by the speed of their procurement department and were wondering if they could get by with vanilla Kubernetes. Following on from this I posted a thread on twitter as to why Kubernetes on its own is not enough, much to my pleasant surprise it generated a lot of interest, as such I wanted to do this subject justice in the form of a blog post.
My thesis is that ‘Vanilla’ Kubernetes is not enough to satisfy the requirements for a container orchestration platform on its own in isolation, there are extra things you need in order to complete the ecosystem, namely:
- Storage orchestration
Kubernetes comes with in-tree drivers, i.e. storage plugins built directly into the Kubernetes code base, these are no longer supported, and the Kubernetes storage special interest group is driving people towards plugins that support the Container Storage Interface API. The long and short of this is that you do not get enterprise grade CSI plugins out of the box with vanilla Kubernetes.
- Overlay network software
A Kubernetes cluster cannot function without an overlay network, if you provision clusters as I do with Kubespray you end up with Calico by default, Flannel; the very first overlay network CNI for Kubernetes is also popular.
- Package management
If you want to move away from YAML, you broadly have two choices use what is the defacto package manager for Kubernetes or adopt techniques that allow code to be pushed directly to a cluster (next point . . .)
- CI/CD Tooling and general “Developer experience”
Pushing code from source code repositories directly to Kubernetes clusters is fast becoming popular, both the Flux operator from Weaveworks and JenkinsX enable this, and again this is not something you automatically get out of the box with vanilla Kubernetes.
- Load balancers
Most Kubernetes-as-a-service platforms in the public cloud provide services with load balancing endpoints, most on-premises Kubernetes clusters require a load balancer, otherwise you are reliant on NodePorts, products from Citrix and F5 are popular in this space.
- Ingress control
An ingress controller routes traffic (North-south) to services based on rules, examples of Ingress controllers include Traefik, Envoy, Istio etc . . .
- Monitoring and observability
Kubernetes comes with a built in Prometheus exporter which makes metrics available such that they can be scraped and visualized, you still have to make provision for the visualization element of this which is where Grafana usually comes in.
- Secure image registries
The internet can be a dangerous place, per the Solar Winds supply chain attack in which hackers put back doors into open source components used by Solar Winds, giving people free reign to pull down whatever images they want from wherever they want can be dangerous. For this very reason, it is recommended that a secure registry is used.
- Data protection solutions
Kubernetes may be a relatively new platform, but data protection is still as important as ever and you have more to worry about than just persistent volumes, secrets for example. There are free tools such as Velero available, however the finest level of backup/restore granularity that this goes down to is namespace level, also the same credentials/keys need to be used per object store across the organization. So, for example if you used AWS S3 as a backup target, you have to use the same S3 secret and secret key across your organization, which is something that you might not want if say you are a bank.
- Linux distros
You ideally need a supported and hardened Linux distribution. Red Hat mandates the use of RHEL CoreOS for control plane nodes and RHEL or RHEL CoreOS for compute nodes. VMware Tanzu clusters use virtual machine image templates in OVA format that use hardened versions of Linux and I would suspect that all major cloud providers use hardened versions of Linux.
- Kubernetes cluster lifecycle management
Something is required to upgrade and scale out your cluster. For clusters deployed using kubeadm the hard way or Kubespray, in-situ upgrades are not recommended, in that should an upgrade fail for any reason, and they do, you can be left with nodes on different versions of Kubernetes. For this very reason it is recommended that a new cluster is stood up at the desired version and applications are migrated from the old clusters and onto the new one.
Which “Extra bits” Do I Actually Need ?
For anyone planning on building a database-as-a-service platform on top of Kubernetes (including Azure Arc enabled Data Services) because ingress controllers tend to deal with http and gRPC, protocols that databases tend not to use, you ideally need all of the above from the bullet point list of components. For those building a complete platform, you are more likely to need everything listed above.
What’s Wrong with Building The “Full stack” Out Of Open Source ?
First off, if you work in any kind of regulated industry, you can only run software in production that comes with a commercial support contract. Secondly, although you can build the entire stack yourself, consider all the components that you need to test and validate. If your organization is a development intensive SAAS company, this might work for you, however if your focus is consuming platforms rather than building them, a product in which the vendor has done all the integration heavy lifting for you might be a better way to go.
But Surely the Public Cloud Takes All Of This Pain Away ?
The public cloud does remove some pain points, most notably having a solution that is more Kubernetes-full stack-like and cluster lifecycle management, however you still have to worry about:
- Cross availability zone high availability for stateful applications
- Data protection
- Overlay network CNI – is this “Production grade” out of the box ?
- Managing persistent volumes at scale
What Does Open Source Actually Give Me ?
Providing the correct notices are observed you can can use the software, this is the Apache 2.0 license for Kubernetes:
I Shouldn’t Use Kubernetes Then ?
No, I’m not saying this at all, what I am saying is that you need to be mindful of the lifecycle of applications running on your cluster, and that of the cluster itself and of course security, in short “Day two” considerations. TL;DR I am advocating an eyes wide open approach to adopting Kubernetes.
A Good Point
What Should I Do ?
So as not to fail the “It depends test”, we need to put into context what you want to do and where you are coming from. Again, if you work in any kind of regulated industry, you can only run software in production that is commercially supported, straight off the bat this rules out unfiltered opensource software. Secondly, you need to consider what you want:
- Do you use/like GCP and want a Google flavored hybrid/on-premises development experience ?, you like things such as Cloud Run (Knative and then some), then Google Anthos is a good choice.
- You are a large VMware shop and you like the idea of a single-pane-of-glass solution to rule them all, both virtual machines and Kubernetes clusters, you might then gravitate towards VMware Tanzu.
- You work in a regulated industry and want a robust and mature platform-as-a-service with a good developer experience that auditors like from a governance standpoint, Red Hat OpenShift fits the bill here.
- You like/are heavily invested into AWS, you want a hybrid/on-premises AWS-like Kubernetes experience, then EKS anywhere might be for you.
- You already use one of the major cloud vendors, each has its own Kubernetes-as-a-service offering: AKS, GKR and EKS
- etc etc . . .
Are You A Builder or Consumer ?
To quote Kelsey Hightower:
If you are a platform builder, you are more likely to lean into a Kubernetes based platform/distro that furnishes a good “Developer experience”. If you are more of a platform consumer only interested in deploying applications/platforms and getting them up and running as expediently as possible, you are more likely to lean into a distribution that is big on ease of management.
The Skills Gap
I would speculate that it’s difficult to get quality employees in most areas of IT, therefore if this is a concern when adopting Kubernetes:
- Consider the platform options available to you and their respective learning curves
- Leverage the tribal knowledge in the Kubernetes/CNCF community
- Prefer the use of vendors that are going to be partners and foster consultative client/supplier relationships.
- When you pick vendors, ascertain what their level of knowledge is around the broader ecosystem that their products/services fit into and what their community presence is like.
- Be pragmatic, accept that with the relentless pace of change in technology that there is right now, self-on-the-job learning is a thing, embrace individuals that are prepared to roll their sleeves up and learn.
If I go back ten years or so ago J2EE was a big thing, not so much now, if I had a time machine I could jump into and go ten years into the future, Kubernetes might be considered legacy and there will be a new kid on the block to supplant it. Technology never stands still, accept this and value people who accept and adapt to change.
There is no “Easy button” to press here, no one-size-fits-all solution, the best advice I can provide is to consider not just Kubernetes but the wider ecosystem that is required to support it. Be pragmatic about what you want, where you are coming from and where you are going to. If I was to try and distil my best advice down as far as possible, it would probably be along the lines of, consider:
- The full stack required to implement a platform and not just Kubernetes in isolation
- The day 2 requirements of applications running on Kubernetes and their lifecycle requirements
- The day 2 requirements of the Kubernetes cluster and it lifecycle
I don’t expect anyone to read this post and come up with a solution that will satisfy their exact needs, my hope and expectation is that I have provided some food for thought and grist to the mill by which people can make informed decisions.