Single or Multi-Tenant Clusters: When and Why

August 15, 2023

Season 1, Episode 11

In this episode, Jon Shanks and Jay Keshur dive deep into the topic of single vs. multi-tenancy in Kubernetes. They discuss the complexities and considerations when deciding on the architecture of Kubernetes clusters. The conversation touches on the origins of Kubernetes, its evolution, and how businesses can make informed decisions based on their specific needs.

In This Episode, You Will Learn:

  • The differences between single and multi-tenancy in Kubernetes.
  • The considerations and challenges of managing multi-tenant environments.
  • The origins and evolution of Kubernetes and its relation to Google’s Borg.
  • The impact of cloud-managed Kubernetes services on flexibility and functionality.

Themes Covered in the Podcast:

  1. Kubernetes Origins and Evolution: Kubernetes, inspired by Google’s Borg, was a collaboration between Red Hat and Google. It started as a bare-bones project and evolved over time with community and vendor inputs.
  2. Single vs. Multi-Tenancy: Multi-tenancy increases the risk profile due to shared resources. However, Kubernetes was designed to handle multiple workloads, making the decision complex.
  3. Security Concerns: Multi-tenancy can introduce security risks. Proper Role-Based Access Control (RBAC) and policies are essential to ensure that workloads do not interfere with each other.
  4. Operational Challenges: Managing multi-tenant environments can be complex. Decisions on scaling, load balancing, and other operational aspects need careful consideration.
  5. Cloud-Managed Kubernetes: Services like EKS, AKS, and GKE offer managed Kubernetes but may limit some functionalities of the open-source version.

Quick Takeaways:

  1. Kubernetes: An open-source container orchestration platform inspired by Google’s Borg.
  2. Single-Tenancy: A setup where only one tenant (user or application) uses the resources.
  3. Multi-Tenancy: A setup where multiple tenants share the same resources.
  4. RBAC (Role-Based Access Control): A method of regulating access to computer or network resources based on the roles of individual users.
  5. EKS, AKS, GKE: Cloud-managed Kubernetes services offered by AWS, Azure, and Google Cloud respectively.
  6. Ingress: A way to manage access to services in a Kubernetes cluster from outside.
  7. Cert-Manager: An open-source tool used for managing certificates in Kubernetes.
  8. Service Mesh: A dedicated infrastructure layer for handling service-to-service communication.
  9. Cluster Scaling: Adjusting the number of nodes in a cluster to handle the workload.
  10. Operational Challenges: The difficulties faced in managing and maintaining a system.

Follow for more:

Jon Shanks: LinkedIn

Jay Keshur: LinkedIn

Jon & Jay’s startup: Appvia


Jon: [00:00:02] So yes, you might be able to keep upgrading your cluster, but if you’re never upgrading the things in the cluster, at some point it’s going to be problematic because you don’t know whether those things work with those versions of Kubernetes. So you’ve still then got complexity when you start enhancing things in the cluster to keep everything aligned and in sync. Hello, welcome to Cloud Unplugged. I’m Jon Shanks.

Jay: [00:00:27} And I’m Jay Keshur.

Jon: [00:00:28] And I think today we’re going to bite this very big topic called single versus multitenancy around Kubernetes.

Jay: [00:00:36] Cluster architectures.

Jon: [00:00:38] Basically cluster architectures, yes. Like, what do you have to consider when making the decision and what are the things that go into that decision making as well as the definition of what we mean by architectures, I suppose.

Jay: [00:00:51] Kick us off. What does tenancy mean in this world? Jon’s got a lot of ideas in this space.

Jon: [00:00:53] I have so many ideas about the tenancy.

Jay: [00:00:57] He actually pulled out a tenancy agreement and was like, it’s all around.

Jon: [00:01:05] Yeah, tenancy, just for simplicity terms, I think especially for Kubernetes, I would deem it is more about whether you’re sharing it or not. And I suppose it’s possession.

Jay: [00:01:15] I mean, obviously that’s multitenancy when you’re sharing something.

Jon: [00:01:18] And when you’re not sharing it, then it’s single tenancy.

Jay: [00:01:20] But what is it that’s sharing it? As in what are the constructs of things that could share it?

Jon: [00:01:25] So applications, I would say, was the specific bit or the workloads. Somebody or has to be owning these applications. Obviously, they didn’t magically arrive in a cluster one day by themselves. So because of that, I guess it’s who has context on these apps that I can go and speak to that help me inform a decision on whether they should or shouldn’t be single or multi-tenanted in a cluster. And those are specific things around cost, security, ease of management, operational requirements, performance, and those types of decisions.

Jay: [00:01:57] Cool. Do you agree with that? I think so. Let me just try to summarize. So workloads are obviously owned by people, and if you have many different groupings of those people, whether it be teams, business units, whatever, if there’s many of those to a single cluster, then it’s multi-tenanted because it’s many to one. And if it’s one-to-one, then it’s single tenancy.

Jon: [00:02:24] Yeah, exactly.

Jay: [00:02:24] Cool.

Jon: [00:02:25] So if nothing else is sharing it, then yeah. So if those applications are all yours and that terminology of yours means a business unit, or if that terminology of yours means the project, then that is the tenant description. So then that is the definition of the tenant in that context, whatever yours, I guess, starts to mean in a business that you were saying. 

Jay: [00:02:4] Nice. So I guess let’s maybe go through the different reasons that you kind of just described there a little bit on…

Jon: [00:02:48] What were those different reasons?

Jay: [00:02:53] What were they? Tell me, what were they Jon? What were the different reasons for going?

Jon: [00:02:55] Did you disagree with those reasons? Or do you think there are missed reasons?

Jay: [00:02:57] Say them again.

Jon: [00:02:58] So there was cost, security, ease of management, operational concerns, which I kind of mean like what happens if I patch this? Does it impact somebody else? Do I need to make decisions on that performance? Maybe there are performance requirements and limits that stop me from being multi-tenanted because one application might take it all up and then that becomes problematic for the tenancy to other people. And those are the main caveats around basically whether you’re I guess they’re the big-ticket items, the big-ticket items that most things will fall under. I guess we can dig into separate episodes. I think you can probably go deep on all of them.

Jay: [00:03:36] I’m not sure. Yeah, I mean, some are simpler, right? So like the ease of management to a certain degree. Is it going to be easier to manage one thing over several things? Probably. There are obviously different ways of doing that, but I think there are certain topics like cost and security that just have you can literally spend hours and hours talking about because there are so many different ways to mitigate or share kind of security, how you’re trying to mitigate all this risk or kind of share cost out to the different teams or applications that are using their clusters.

Jon: [00:04:12] Yeah, I definitely say those two are like the big pieces, the mega buckets of kind of information that we need to kind of explore there, like the decision making.

Jay: [00:04:22] I guess should we go through maybe a little bit of a scenario? So let’s say you’re an essential platform team and you’re working inside an organization trying to gather some requirements to figure out how you’re going to be giving out these Kubernetes clusters to business units so that it can deploy some applications onto right. So I’m going to be gathering all these requirements from everyone trying to understand me as a Kubernetes expert, and how I’m going to decide what best fits the organization’s requirements. So what things would I guess I ask in this platform team?

Jon: [00:05:00] I guess if you’re the platform, just so we can, because we’ve spoken about these things before. I just want to make sure we use, it because people might not have heard our opinions, but you as a platform team is obviously there to be measured on success by the speed at which you’re getting applications live and the quality and the scalability and security of the application. You’re there to get the best out of that element and then be, I guess as a platform team, as frictionless as possible. So that’s probably making decisions for people where it’s seamless and you don’t necessarily need to know. And maybe you can somehow just place your app in. And those requirements, if you’ve been really good, are taken care of, taking care of or they’re led by you to then make it really easy by the application and it just decides where it should go. If there was there were multi-tenancies of multi-tenancies, I guess because you might have shared on groups, maybe applications that are less sensitive, you might want to share, maybe very highly sensitive you might share with more sensitive things. I guess it depends on how many multi-tenanted clusters you might have.

Jay: [00:06:04] And why you’re multi-tenanted more so than how many.

Jon: [00:06:10] The rationale behind it. And then the architecture, just so we cover that off too. We’re also, just for clarity, not just purely Kubernetes, but things then that makes life easier for a developer. 

Jay: [00:06:21] Yes, exactly that.

Jon: [00:06:22] So things can just scale up. The clusters will scale up. DNS, certificate management.

Jay: [00:06:29] Certain, I guess, commodity things within Kubernetes is taken care of for you or given to you almost as a service. So the platform team are there to provide you with Kubernetes as a service. And that Kubernetes isn’t just a cluster at the basic level, it’s a cluster and the sort of standard components that come with that cluster. Or we could talk about what that is. In fact, maybe we should do that now. I guess some of the sorts of typical components that you would probably see in a cluster, sort of starting from the top, is an ingress. So a way to access the things inside of that cluster from outside of it. It will be something to manage certificates, typically something like cert-manager, which is an open-source product that could also do certificate management for non-external-facing certificates. Exactly. So having its own internal cluster certificate issuer, which could enable mutual TLS between services. So you’ve got ingress, cert-manager, maybe something to manage the DNS. So let’s say you didn’t have a wild card on a load balancer or something like that. It would go off and provision the necessary DNS records and point them to the new ingress that exists. What else? 

Jon: [00:07:47] Service meshes.

Jay: [00:07:48] Service meshes, that’s a popular hot topic. What do service meshes do?

Jon: [00:07:53] So you could then do auth between services and networky kind of elements, that kind of service, talk to this service that can also be mutual TLS on there and things like that. Some do that.

Jay: [00:08:08] And they’re quite cool because obviously, I guess with service meshes you have the ability to be quite advanced with how you’re managing the traffic before and after it gets to your workload. So rate limiting or doing sort of blue-green deployments, canary releases, that type of thing.

Jon: [00:08:23] And knowing whether it’s healthy and all that kind of stuff. So it does help, generally speaking if they don’t have insane requirements in the service discovery. But if it’s high traffic, then obviously some of them are kind of user space. So it can be not always very fast, but it just depends. But just thinking about all those things because obviously, you need to get on the topic, we’re not going to list every single cluster component.

Jay: [00:08:49] No, but the next couple of ones that I was going to list were quite important. I do with scaling. Not important. No, not at all. No. Who wants to scale? Huh?

Jon: [00:08:57] No one’s scales.

Jay: [00:08:58] So the main ways to sort of scale workloads or clusters within Kubernetes is the horizontal scaling, i.e., having more pods, vertical scaling, which is giving the pod that you have more resources, and then cluster scaling, i.e., giving more nodes to the cluster so that it can schedule more things within it. They’re quite important, I guess because obviously nothing exists that cluster doesn’t really have a function without workloads being in it and having nodes to go on to those workloads.

Jon: [00:09:28] So DNS, how do things get populated? So there’s an easy name to go and consume the service. Load balancing is obviously something that’s going to load balance to the service, but you’re talking about in-cluster load balancing at that point. So like the ingress itself, not obviously a low balance for the cloud.

Jay: [00:09:44] The in-cluster load balancer is obviously the service. This is the load balancing that happens for things coming into the cluster. Ingressing in.

Jon: [00:09:55] Exactly. So then there’ll be like ingresses. So obviously distributing the traffic and then service measures potentially and certificates, I think I just mentioned that. And then the auto-scaling for obviously managing how much compute is needed overall to service the number of applications that come into that cluster. So they’re all the things that you would majorly want. But then it could be logging agents, monitoring agents, all these other pieces that also get deployed that you might want to which is obviously important.

Jay: [00:10:26] True, yeah, exactly.

Jon: [00:10:27] You need to get logs out somewhere centrally and things like that.

Jay: [00:10:30] I mean there could be a bunch of sort of in-cluster services that either attach to what’s running in that cluster and give you some outcome.

Jon: [00:10:39] Maybe policy. Proxy for authentication. It could be in single sign-on proxying or whatever else. It’s like a plethora of detail of platform-related components that increase the experience in the end to others that’s their duty can if they’re done well and they’re good services.

Jay: [00:10:59]They solve problems. And hopefully, they solve those problems well enough to improve the experience overall.

Jon: [00:11:05] So knowing that all those things are going into Kubernetes potentially, or some or like one or many of those things might go in, I guess depending on whether you do or don’t need them. What are the architectural decisions that you might be thinking about when it comes to kind of single or multitenancy? Because I guess if you’re multitenant then some of the types of technology you might end up needing might change.

Jay: [00:11:30] So if you’re, I guess, multitenant, then because you’re sharing, obviously, then maybe you want to do things like protect workloads from each other and making sure that you’re kind of architecting the policies that you have in place with things like the Open Policy Agent or sort of admission controllers to enable the workloads to not be sort of noisy neighbors. So also have things like limits and quotas in place so that one team or one tenant can’t saturate the cluster and have like an unfair internal denial of service. So that could fit again sort of more into the security side of things. Or we could talk about it from the sort of operational side.

Jon: [00:12:16] I was just thinking, not just from a Kubernetes perspective, what you might need to do within the cluster, within the context of Kubernetes, but the things you might then have to put in the cluster because you’re more concerned. So then suddenly you might start with the things you’ve mentioned because they’re very app-centric, but because you then think about the architecture, because you’re then more concerned if you are going down the multitenant route, it then changes what else needs to be in there potentially to reduce the risk that you’re saying.

Jay: [00:12:4] And then I guess because you’re reducing risk, then naturally you’re potentially increasing cost because you have to run something in the cluster to offload that somewhere.

Jon: [00:12:55] Or maybe increasing complexity because there are more things, and you need to know how to run all those things. But also probably the primary thing is the RBAC. Imagine when you get into multi-tenancy like that has to be very well nailed.

Jay:  [00:13:10] Yeah, exactly.

Jon : [00:13:11] You don’t really want some accident there where some group or some user or something or service account suddenly has free reign access to other namespaces or I guess kind of like environments, aren’t they a little bit synonymous with them, but or something can go and get the secrets of every single thing running in the cluster. So like the risk profile again, it becomes a little bit more even though you’re thinking of tenants when you’re thinking about it suddenly becoming multitenant, and normally the starting point of that usually segues straight into a security mindset, I think, before anything else. So they’re normally your first qualification questions. So you start to understand it could be at a service-per-service level at the beginning when you’re starting out. So you might be just working out because you might not know everything upfront because some projects might not even be started yet. So you can’t obviously find requirements out for future things. You can only go on what you know at the moment. So that platform may or may not be able to be shared with absolutely everything in the business, but it might be able to be shared with the current things.

Jay: [00:14:08] Yeah, right. So with these requirements that I have right now, let’s call it a multi-tenanted environment, so it can be shared and then you’ve got the fact that you want to give these shared services kind of equal weighting, equal access, things like that. So like you said, RBAC comes in, it is quite important. So based on your role, you can only have access to your environment within your cluster and everyone else in the cluster has the same principle, then you’ve got the impact on the shared services as well. So because you have all of the things that we just spoke about like ingress, like cert-manager, you have to have quite a lot of, say, management of those things to enable they’re capable of allowing this set of users, these applications, to be in that cluster and to be a shared service. Like your ingress pods don’t get over-saturated and that’s scaling properly, so that if there’s more demand in the cluster, then the shared services can scale as well, things like that.

Jon: [00:15:11] So I guess if you’re layering in if we pick up one a little bit if all the others didn’t matter. So say cost didn’t matter and all those other concerns didn’t matter. Ease of management, and operational risk, if you could remove them. So there was only one requirement. It was just kind of like a security one. That’s all it was.

Jay: [00:15:28] Got all the money in the world.

Jon: [00:15:32] No no no. Just for argument’s sake. Obviously, you can’t do that. By default, a better security posture is not to share.

Jay: [ 00:15:44]  Yeah, for sure. Exactly.

Jon: [00:15:44] So you already have a default stance, which is, if I don’t share, the probability of something getting hacked is reduced. Then something that is sharing, or many things sharing. Obviously, as the amount of sharing happens, the greater the risk, I guess, to some degree, because you don’t have, you’re increasing the risk, exactly. Over time. And I guess that goes for the in-cluster services that you’re then providing back to others. Those need to operate in a way that also reduces risk to the things. So even though they’re seen as a service and not the application itself, they obviously have a risk profile. Something could happen to them. You could try and DDoS one of them, maybe, or something like that.

Jay: [00:16:26] That could affect the application in the end.

Jon: [00:16:28] That could affect all applications versus just one.

Jay: [00:16:31] Yeah, exactly.

Jon: [00:16:31] Because I didn’t share. And so the amount of effort for somebody to go through to cause a problem, be it a security problem or an operational problem, is reduced because it becomes harder if you’re splitting them all out and easier if everything’s together like a honey pot. It’s all there for the taking. I’ve just got to work out how to get in there and that’s probably easier.

Jay:  [00:16:53] It’s funny, isn’t it? Because Kubernetes, in the way that it is constructed and the way that it’s been designed, it’s been designed to be multi-tenanted, this concept of environments and things like that.

Jon: [00:17:08] They always said that it was never designed to be multitenant when they were talking about it because they were like it really wasn’t thought about. Maybe it depends on the definition, of what a tenant means.

Jay: [00:17:17] But I guess what I meant is it was supposed to have multiple workloads in it. However, those workloads are kind of managed in terms of their possession and things like that. But I guess as you have realized, the more things you have in it and the more separated those are, or the little groups that they fall into, the more complex they are, and therefore the higher security risk that they have and all that kind of stuff.

Jon: [00:17:43] Do you remember when Kubernetes first came out? It was just replica sets. It wasn’t even deployed. It was all based on that an hour back, kind of wasn’t there at all. So it wasn’t designed for that at all. Right. From the beginning, vendors then that were aligned to it had their own objective about what this thing they were building on top of needed to build.

Jay: [00:18:04] But they had namespaces, right, so namespaces.

Jon: [00:18:07] There was no way of controlling even the namespace elements at the very beginning. There was no RBAC. That came later.

Jay : [00:18:12] No, but you had ABAC and ABAC, you had obviously you had…

Jon: [00:18:16] But it’s not role-based, obviously.

Jay: [00:18:19] But that’s, it didn’t need to be role-based because you had the concept…

Jon: [00:18:22] You talk about it as if it had an agenda. What I’m saying is it’s an open-source project. It had some form of agenda, but it was mostly community-driven, and the community drove it. And actually, if you look at probably when we say community ecommerce vendors mostly drove it because we’re building on top of it, so it’s become a thing to build on top of. And that’s kind of probably those requirements are fed into it more so than, say, a business if you see what I mean. A business had a bunch of clients because it was built for business use.

Jay: [00:18:52] I mean, it was built for business use. Google’s, right?

Jon: [00:18:56] Well, no, Borg was Google.

Jay: [00:18:58] Yeah, and then Kubernetes came out of that…

Jon: [00:19:01] I think Red Hat and Google collaborated together at the beginning to open-source Kubernetes from Borg. So it was like, we want to create an open-source version of it.

Jay: [00:19:09] Version of it, exactly. So some of that was already embedded.

Jon: [00:19:12] Kind of not really that similar because it’s quite different. Borg to a degree. So, yeah, it was like going like a fresh project, all very similar principles, and some of the things came from it, but not really the same. It was very bare-bones compared to what they already had in Google. People can correct me if I’m wrong, but from what I’ve understood, Borg was obviously much more about Google and how to operate. So there are loads of problems solved there for them. But actually open source didn’t have an opinion when it first started, and then obviously opinions formed over time, and those opinions have just evolved and evolved and evolved even more, which then makes it like, what am I doing with it? Yeah, exactly. Well, how do I make a decision on it? When now, because there’s a lot of things in it, make a decision on, and now I don’t know what I’m supposed to be doing.

Jay: [00:19:57] And also those opinions now have different Kubernetes versions or vented versions of the Kubernetes. So you have obviously the popular cloud-managed Kubernetes services where they don’t give you the full functionality of upstream open-source Kubernetes, but they’ll simplify the management of the control plane. But that comes at a cost of flexibility.

Jon: [00:20:20] By the cloud, you mean like EKS?

Jay:  [00:20:22] EKS, AKS, GKE, et cetera.

Jon: [00:20:26] We only ever name those three. The other cloud vendors, we don’t know. We only know three.

Jay: [00:20:32] Yeah, exactly. I guess it’s always going to be driven somewhat by how to monetize technology, isn’t it? Whether that monetization of technology is by a customer directly or by a third party that’s trying to monetize from the technology to a customer.

Jon: [00:20:52] Yeah, how they got there, I guess, is not I mean, it’s got there in the end. I guess what you didn’t see is what layer of simplicity they put over that. So when we talk about ease of management and things like that, it might have been easy with the vendor’s thing. And then downstream from a back-end implementation perspective, to make it more community and less opinionated, they’ve had to then make it unopinionated in a way that meets all needs. And then they’ve got an opinion over the top, which simplified the less opinionated, which you didn’t see. So then you’re like, now I’m just dealing with the unopinionated complexity, which seems really complicated. I mean, RBAC, cluster bindings, and cluster rolls, and roll bindings, it’s just like, wow, okay. And so you got to get it all right, and the verbs, for someone just to come at it blindly and be like, oh, I’m going to do some RBAC in Kubernetes. Right? There are some users that need access to a namespace. Like well, it’s not quite as simple as that, actually. You got to think about all the others is it a deployment pod, all the different API groups? It’s a lot. So it explodes into a load of complexity and then that becomes hard because you’re like, wow, okay, this is way more involved.

Jay: [00:22:07] I mean, this is slightly off-topic, but I’m going to do it anyway.

Jon: [00:22:12] I think I’ve got another podcast to go to anyway, so I’ll just leave you on your own. See you later guys…

Jay: [00:22:18] But the way that you’ve just described it is, I’ve got a user, I need access to a namespace, that simplicity, if you know about Kubernetes, to implement it is really tough to put all the security controls around it. However, there are such cool things happening in the industry now with ChatGPT-3 and all that kind of stuff. So you’ve got like where natural language processing plus scraping and understanding the Internet and all these products can then translate itself into an opinionated way of how to implement something.

Jon: [00:22:50] If you could validate it.

Jay: [00:22:53] If you can validate it, because.

Jon : [00:22:54] You can’t tell unless you knew it, you wouldn’t know whether it was right or wrong. So you still infer some understanding of what it’s produced that you need to qualify. And truthiness, yeah, otherwise you’d be like, I don’t know what you just did with right or wrong, because I don’t know, Kubernetes.

Jay: [00:23:10] I don’t know if it’s right or wrong or what other gaps that I haven’t described there might be.

Jon: [00:23:13] Exactly until there’s some validation on it all. But yeah, I think from application world to them, platform world, application world is a little bit more linear to some degrees from a very superficial high level. People are workingthe  on apps, and apps have nthe eeds and usually well, most apps need to have an environment. And that’s the terminology used namespaces. No one ever says that apart from in Kubernetes. No one was like, hey, I need a namespace for my app, you know, I need a dev namespace and in whatever. Right. That became a Kubernetes construct.

Jay: [00:23:46] And Linux. Really.

Jon: [0:23:48] Yeah. And then you’ve also got namespaces within the Linux kernel itself.

Jay: [00:23:51] Exactly. But that’s probably what it’s evolved from though, isn’t it? Because it’s that concept of segregation.

Jon: [00:23:57] And isolation, but not the environment, which is actually to do with the workloads. The terminologies are a little bit more agnostic to anything, but yet the use case for it, to begin with, is if I don’t have any apps, I wouldn’t use Kubernetes. Obviously. I’m only using it to deploy something. So I guess all those things we’re talking about on the requirement side is the more you’re layering in, so to start with security, even from a cloud perspective, if you’re not even sharing the cloud account, you’re not even sharing the VPC, it’s even more secure. If you’re not sharing the networking, it’s going to be even more secure. The more you share anything, obviously, the higher the risk profile goes up. Clusters too, so just architectural principles. Isolation provides obviously a better security boundary.

Jay: [00:24:42] I guess let’s maybe go on a little bit of a journey to describe some of this stuff. Right? So, yes, 100%, the less you share logically, the less you share, the less security vector or potential that there is or threat landscape there is. However, I guess that because of the operational complexity involved in having more things. So let’s say if I had two clusters now, it’s an extra one to manage.

Jon : [00:25:09]Just to clarify that, and not just manage the cluster, but the things in it.

Jay : [00:25:12] All of the things in it.

Jon : [00:25:14] All of the things we described before, could be even more than the things we described or less than described.

Jay: [00:25:19] Exactly. So if I just have a single cluster and then I have all of those components and then all of the workloads within them, then great, I do like one upgrade. It’s going to be really hard because I have to understand what the impact is to all of the services that are both in the cluster, i.e., the applications that are in the cluster, and the components that I have to upgrade alongside that Kubernetes version. Now, if I have to do that twice, and I’ve got two states of running workloads to manage, that’s really hard. However, in the last, say, four or five years, the big kind of cloud vendors that we always talk about. They’ve obviously got their own managed Kubernetes offerings and they take some of that burden away. Right. So where people had their default position because of that management complexity and overhead was just so high, i.e., they would optimize for ease of management and only have multi-tenanted clusters, and only a few of those multi-tenanted clusters maybe like a non-prod cluster and a production cluster or something like that. They can now use the innovation of the cloud vendors and use the services to take some of that complexity away, and now they can optimize for security in a better way. But then you still have cost to optimize for. Right. And that’s the thing that maybe we’re starting to see it a little bit now with things like Fargate and some of the kind of other offerings, but there’s still like big monumental things that you have to kind of decide about.

Jon: [00:26:49] Because those things are obviously ancillary. But ECS can be very expensive because you still got a cluster. It’s the VMs that are expensive, a control plane less. So obviously there’s still a cost, but there’s less to it. But that’s the same with another service. If you take it as an example, Amazon’s ECS can end up really expensive and it can be harder to share. Right. So there are different it’s not just a Kubernetes problem, that’s kind of a cloud problem, I think, generally speaking, because.

Jay: [00:27:17] Compute is expensive and the service they’ve added on top of compute means it’s even more expensive.

Jon: [00:27:22] Yeah, exactly.

Jay: [00:27:22] And it’s value-based.

Jon : [00:27:25] So if the infrastructure was free that we’re using, would you then share? No, because you’re like, Well, I’m not paying either way.

Jay: [00:27:34] Exactly.

Jon: [00:27:34] So I guess you take cost out of the equation, you probably do the right thing and just go through the security principle, for sure. But that was interesting about operational overheads because you’re saying, obviously the cloud vendors now provide them with services, they manage the control plane and the provisioning of the actual cluster itself, and they make upgrades easier depending on which cloud provides you on. It’s a bit of a sliding scale of how simple to how hard it is.

Jay: [00:27:55] And then how long it takes as well.

Jon: [00:27:57] And how long it takes. Yeah, but the things in the cluster that they don’t know about and then those being compatible with the version of Kubernetes less so, yes, you might be able to keep upgrading your cluster, but if you’re never upgrading the things in the cluster. At some point, it’s going to be problematic because you don’t know whether those things work with those versions of Kubernetes.

Jay: [00:28:17] Exactly.

Jon:  [00:28:17] So you still then got complexity when you start enhancing things in the cluster to keep everything aligned and in sync.

Jay: [00:28:25] This is, ladies and gentlemen, why platform teams will always exist to some degree, right?

Jon : [00:28:3] ChatGPT.

Jay: [00:28:35] What are you talking about. Platform.

Jon: [00:28:40] Maybe it’s ChatGPT all over the world, ChatGPT-4.

Jay: [00:28:43] But yeah, I mean, it’s like one understanding all of those components that go into a cluster, how they all work together, because there’s different ways of configuring them all. Even if you are choosing the three things that you just kind of have to have, right? Just those three things. There are like a million different ways of configuring just those three things inside your cluster, let alone how your organization is using them and making sure you’re testing that happy and unhappy path and all that kind of stuff all the time.

Jon: [00:29:16] Yeah, it’s a little bit of a tangent, but there is an assumption a lot of the time that because you’re using something that problems get removed and obviously, as in like, all pain is gone and it’s like not quite. It’s reduced some element of it, but it’s never the full thing. And even in the cloud, you can go and use a cloud service, as we’ve just said, and it will solve some problems. But even with other services like RDS and things like that, like moving upgrades between versions, sometimes you just can’t take the leap too far and it’s got to be staged and there’s all this kind of complexity. Your assumption is that I don’t really have anything to do, I’m just going to consume this thing, and we’re good to go. And you’re like, because I’m using the cloud, and you’re like, well, kind of, but not really. It just depends. So it depends really kind of question in the end, which is like, yeah, it’s a bit frustrating because.

Jay: [00:30:09] The value of paths can be diminished.

Jon: [00:30:11]  Well, that’s where platform teams are coming from because there is an unfinished story in it, right? So the requirements aren’t fully met.

Jay: [00:30:19] But I think that’s probably because of how flexible because I guess if we’re talking about sort of infrastructure as a service and they’re moving up that layer to platform as a service, and then one more to software as a service, because the cloud vendors are they’ve obviously got businesses to run, right? Their business is to enable other businesses to run that have lots of different concerns, lots of different things that they might want to do. So therefore they expose a load of different options in how they configure those things. They’ll never be.

Jon: [00:30:50] They can’t know all the shapes of all the things.

Jay: [00:30:53] They just want the money.

Jon: [00:30:54] This is the service. You go and work out the rest. Because I don’t know your business and what you do or don’t need.

Jay: [00:30:58] We don’t know your business. We’re going to try to simplify the things that we can. But that’s a really low bar.

Jon: [00:31:05] Which is why these decisions if you are like container orchestrators like I just mentioned ECS, and you got Kubernetes, which is like, if we’re in Amazon in this state, so we do EKS. But whatever it is, somebody’s got to run an orchestrate. They’re running at the container. That’s the value prop, I suppose, is like, I’m going to schedule where this should run based on some opinions on the data. Right. And obviously, there are ways to do it in Kubernetes and ways to do it with other things, but the principles stay the same. When you’re doing that, then the lens we’re looking at it from is, well, if I don’t have to build anything extra, if I built something once and it’s just there, you can just go and consume. Right. So actually the speed of consumption is way faster. But then I do have to do RBAC. And you need to know those other things. Right. So they’re like, oh, not quite as fast, actually, because there is still something I need to do, but at least I’ve removed the infrastructure element around it. So there’s some up-speed because the infrastructure exists, which is usually then that’s 10% to be what happens. Like someone gets something going.

Jay: [00:32:07] Yeah, exactly.

Jon: [00:32:07] And they’re like, yeah, I’ve got something going, and it’s kind of working. And this project’s hosting now there are a few apps that are in there and that’s kind of worked well. And someone else might hear and be like, yeah, we’ve got another project and we don’t need hosting. You like, got one? Yeah, I’ve already got some infrastructure. No brainer. Just reuse that. What’s the problem here?

Jay: [00:32:27] You don’t have to wait ten weeks for someone at a data center to give you something.

Jon: [ 00:32:30] Or even go through an approval system. I’m like, well, actually, I need to put a ticket in for them to get a cloud account. So there are all these conversations. Or I could just create a nice site and bypass the process, and then you can get your app in, and sometimes it’s like a friction-free route. But then the reason that process was there was to ask the right questions of the business, but because it was so slow that people just want to bypass it, you’re then sharing, and then you do end up multi-tenant just by proxy because people wanted to remove friction. So the rationale behind it might be to do with sometimes none of those things of, like, cost or like, you know, those decision makers.

Jay: [00:33:13] Well, it probably is, but implicit. Yeah, very implicit.

Jon:  [00:33:17] It’s more emotional there. I want to help these people out. I want to be helpful, and I can get this team going much quicker by reusing what’s there than anyone else can or doing all that process and all your stuff, they just don’t understand. They don’t understand. That’s why they got the process. I understand more about because I’m more from an engineering background, maybe, whatever, but I’m more connected to the developer side, so I’m just going to get them going. And so the intentions are really good. They’re trying to do a good job.

Jay: [00:33:48] It is. Right. But I guess what you’re saying is the due diligence isn’t in place. Due diligence. But that governance and due diligence on you have a process, not necessarily.

Jon: [00:34:01] Slowness. By process you mean speed reduction, right? Is that what you meant?

Jay: [00:34:07] Sometimes you’re like, oh, what, you’re going off and building a service, a service that we already have multiple things of what you’re doing, so just depending on how much you’re sort of trusting those people that just want like a namespace or whatever created.

Jon: [00:34:22] Yeah, I’m being facetious a little bit, but I kind of understand it’s something sometimes it’s organic and has no real intention. Right. And there’s a thing that exists, you’re figuring it out as you go. It’s a very organic thing and you got people experienced, they’re maybe done the organic journey and then realize some of these questions. I got people that maybe have learned more or red led from the industry, but then they’re asking the right questions upfront to make the right decisions so that they don’t have to then actually work out how they’re going to split this app out now. Because someone’s rumbled us, just being rumbled. And now, I mean, I was like, this can’t share that.

Jay: [00:35:00] This huge cluster you’ve got created with like 500 apps from different units.

Jon: [00:35:03:] Exactly. And we’ve just been hacked and we’ve got fines like we can’t be sharing, this is not okay.

Jay: [00:35:10] But Jon, you sorted this person out that wanted all these things, talking about.

Jon: [00:35:13] Jon, that was the wrong person.

Jay: [00:35:16] Was it Tom Hanks? Tom Shanks? Jon Shanks.

Jon:  [00:35:27] It was Kay Jekur, not to be confused with Jay Keshur. Do not confuse them, please, because that happens a lot. People confuse the two. Very different. Very different guy.

Jay: [00:35:37] Different guy.

Jon: [00:35:37] So the architecture, obviously, if you’re not thinking about it upfront yes, it probably will be faster not worrying about it. And you’ve provisioned something and it’s there, and you can quickly do a few environments or namespaces and you can wiz some generic RBAC together, whether it’s secure or not secure, and whether you have what were PSPs, which now PSS, obviously, but PSS, yeah. So, yes, obviously if you’re designing for those and looking at the security of what things can and can’t do in the cluster, but you might not and you might skip it and it might not be there by default and people then literally can kind of do anything because there’s nothing really restricting anything. Then later on it’s probably going to bite you at some point. But you didn’t know. So you guess the reason for a story is you kind of need to know because there is a huge amount of risk there on not knowing those things. And if you aren’t asking the right questions at the beginning, is cost really, really important? And if it is a bit of a trade-off of cost and security, which it seems to always be to a certain degree. Yeah, if we’re always thinking isolation is like the pinnacle and you’re dropping down the bar.

Jay: [00:36:43] And really, I guess because through experience, we probably know the questions to ask in figuring out which one you’re kind of optimizing for or the reasons that you would optimize for one over the other, and then what controls you would put in to make sure that that one thing that you’ve optimized for, you’ve let the other one go completely right. If you’re optimized for cost, you don’t want to just not think about security at all. Maybe we should have like another couple of episodes.

Jon: [00:37:14] I think it’s good to be all-rounded. So I think everything’s going to be a trade-off so we can go deep on cost and we can go deep on security. But really you also have to compare and contrast because it’s a tradeoff for sure. So it’s like, oh, I’ll just do all the cost reduction thing there and not worry about security. I think we definitely should, but I think what’s important is knowing you’re always trading off on some hundred percent in life. It’s never going to be really cheap and really secure and really performing and really resilient.

Jay: [00:37:45] Exactly.

Jon: [00:37:4]  Because the price points don’t work like that of a cloud vendor. It’s like you pay for those things.

Jay: [00:37:53] I mean, it’s just not how life works. You can’t have it all.

Jon: [00:37:59] It’s free and secure?

Jay: [00:38:02] And it’s the best free, secure thing that I could have. What amazing. Wow.

Jon: [00:38:07] Yeah, I think it’s worthwhile like we’ll obviously spit it out, but when you are making decisions, you always have to think about those principles. Am I biased because I want to make this easy to manage? Is that compromising now? Or maybe the right thing to do with cost?

Jay: [00:38:25] Exactly.

Jon:  [00:38:26] Is cost now taking more of a priority over security? Security taking more priority over operational ease? Are we getting ourselves wedding into making the wrong decision because we’re biasing towards a specific thing without the facts?

Jay: [00:38:41] And maybe there are other ways to solve some of these things, right? So kind of hiring engineers that have been through it, or having software solutions that take care of these things, or automating in a slightly different way, or introducing new in-cluster, out-of-cluster components that again, make some of this stuff easier.

Jon: [00:39:00] Do you think all these things are reasons why people just don’t want to use Kubernetes?

Jay: [00:39:04] Yeah, it’s so complex. Honestly, it is so complex. Right? Because Kubernetes, it was just designed to schedule some workloads on some infrastructure that’s it, it’s an API so that you can go and use some computing and disk and memory and all that kind of stuff. But now I’ve got this whole language to learn, I’ve got this API to interact with. Yeah, exactly. But I mean, that’s how it started. It just started as an API and then it became an ecosystem. And it’s an ever-growing, ever-evolving ecosystem that you have to patch, maintain, know the best practice of, and stuff like that. But the value of it when you’ve got it right is so high to the business.

Jon: [00:39:46] And you understand it. I suppose it’s like anything though, isn’t it? As you know? But people obviously won’t, I had a really lame attempt at learning French.

Jay: [00:39:59] Yeah. The value of that was really high. Right.

Jon: [00:40:04] Had I ever succeeded, would have been really high because I would have actually been bilingual and able to actually hold conversations when I go to French-speaking countries, however, never could master it very well. So it was, like, very frustrating. You want to get the value out of something, but sometimes that is kind of uphill.

Jay: [0:40:23] Gets stuck at the first hurdle.

Jon: [00:40:25] Yeah.

Jay: [00:40:25] Is it bonshour or bonjour?

Jon: [00:40:28] What is it? Jesus. You’ve also got to hear it as well. So it’s a little bit obviously not quite the same.

Jay: [00:40:35] But you have a northern accent whilst you’re speaking French.

Jon: [00:40:37] I deliberately speak French even more. I make it pretty aggressive. Like anything, things just take a bit of time or a bit of research. Obviously, Kubernetes isn’t as hard as learning a language, so it’s not anywhere near the same.

Jay: [00:40:52] Exactly.

Jon: [00:40:53] But just as a principle, we shouldn’t get put off “just because.” I can’t immediately do it. I guess things that are simplified.

Jay: [00:41:02] The way that I almost think of Kubernetes is like a car, right? So you can have a car and if you don’t know how to use it, it can be very dangerous. You’ll crash it, and it will cause a lot of damage to you or whatever you’re trying to do. But if you’ve got a really nice car and you know how to use it, then it’s going to get you from A to B a lot faster.

Jon: [00:41:22] Yeah. Some people have accidents. Doesn’t stop you from ever driving.

Jay: [00:41:26] Yeah, exactly.

Jon: [00:41:26]  Just because other people didn’t do a good job of it. And so, yeah, you might hear horror stories because you’re like RBAC and security and all these things. But the reason that might have happened was what we just described earlier.

Jay:  [00:41:38] Was to get to A to B first.

Jon: [00:41:39] Yeah. They just did it organically and they didn’t prep and they didn’t really learn, and then it bit them. So you can’t look at the end and be like, oh, it’s obviously really hard, or that sort of I don’t want that to happen to me.

Jay: [00:41:50] But I guess that’s part of why we talk about these problems, right, is because I think we’ve been through some of this stuff and we’re here to kind of help and stuff.

Jon: [00:41:59] We do need to learn to drive.

Jay: [00:42:05] I mean, it sounds like someone definitely needs to learn how to speak French.

Jon: [00:42:09] Maybe I’m a little bit bitter. There’s a little bit of this inside me because I could never do it. I will have to pick that up at some point in life.

Jay: [00:42:19] Indeed. So, look, we’re going to come back and talk to you about security.

Jon: [00:42:23] We’ll go deep dive into security. What we’ll do is we’ll do security, but then start to compare what it means to other things as we’re doing it. And then we’ll do cost to start to compare what it means to other things when we’re kind of doing it and then maybe bundle the other things together if there isn’t politics doing it. Is that right?

Jay:  [00:42:38] Yes. Jon, how do you say bye in French?

Jon : [00:42:41] Yeah. Anyway, great speed to everybody. Adios.

Subscribe to receive resource and product updates