Single or Multi-Tenant Clusters: Security

August 15, 2023

Season 1, Episode 12

In this episode of Cloud Unplugged, Jon Shanks and Jay Keshur delve into the intricacies of security decisions when considering single and multi-tenancy on Kubernetes. They discuss the challenges and considerations of ensuring security, the importance of encryption, and the nuances of soft and hard multi-tenancy.

In This Episode, You Will Learn:

  • The difference between single and multi-tenancy in Kubernetes.
  • The significance of encrypting secrets and ensuring data security.
  • The concept of soft and hard multi-tenancy and their implications.
  • The importance of role-based access control (RBAC) in managing cluster access.

Themes Covered in the Podcast:

  1. Security Decisions in Kubernetes:
    • Understanding the challenges of ensuring security in both single and multi-tenancy setups.
    • The role of encryption in safeguarding data and secrets.
  2. Soft vs. Hard Multi-Tenancy:
    • Exploring the differences between soft and hard multi-tenancy.
    • The implications of each approach and their use cases.
  3. Role-Based Access Control (RBAC):
    • The significance of RBAC in managing access to the Kubernetes cluster.
    • Ensuring proper authentication and authorization mechanisms.

Quick Takeaways:

  1. Multi-Tenancy: A setup where multiple users or tenants share the same resources.
  2. Single-Tenancy: A setup where each user or tenant has their own set of resources.
  3. Encryption: The process of converting information into a code to prevent unauthorized access.
  4. Role-Based Access Control (RBAC): A method of regulating access to resources based on the roles of individual users.
  5. Kubernetes: An open-source platform designed to automate deploying, scaling, and operating application containers.
  6. Secrets: Sensitive data like passwords, tokens, or keys that need to be stored securely.
  7. Control Plane: The set of components that manage the overall state of the Kubernetes system.
  8. Node Pools: Groups of nodes that share the same configuration.
  9. Ingress: An API object that manages external access to services within a cluster.
  10. Certificate Management: The process of managing digital certificates to secure information.

Follow for more:
Jon Shanks: LinkedIn
Jay Keshur: LinkedIn
Jon & Jay’s startup: Appvia


Transcript

[00:00:00] Jon: Hello, welcome to Cloud Unplugged. I’m John Shanks. And I’m Jay Keshur. And as promised, we are gonna take a little bit more of a deep dive. We don’t have endless amounts of dives, as I’ve caveated with a little bit super, obviously, just to say expectations, around security decisions when you’re going single and multi-tenancy on Kubernetes.

[00:00:23] Jon: How and why you would make those decisions, what you make them on. So I guess we, before we saying if you were being fully secure and there wasn’t any other issues, you’d obviously go for full isolation. So if you didn’t, no other constraints, security. Yeah, exactly. There’s no other constraints that you had to worry about.

[00:00:38] Jon: Cost wasn’t a problem. There was magic tools that made it really easy and simple. You didn’t have to worry about anything and everything scaled and did it all for you and all the magic was there. Then you would just naturally say, actually isolate all the workloads. Also, somehow the other issues were solved, like networking and how they’re gonna communicate with each other, these apps and all this other stuff.

[00:00:58] Jon: So it was a, they aren’t constraints. Then from a pure security perspective, isolating the workloads is gonna be better from a security posture perspective than sharing things, applications and workload sharing infrastructure. Yep. Do you agree with that? Just be voice. Yeah. Yeah, that 

[00:01:14] Jay: makes sense. Isolating the workloads isolating the teams.

[00:01:17] Jay: I guess I isolating the tenants from each other cuz you could go super, super secure and isolate even that tenant’s workloads. So take it down down 

[00:01:30] Jon: a little bit. Every single workload is isolated, even in, that’s literally, I, 

[00:01:34] Jon: The worst, the mega cost issue. We’re saying it’s not a problem, right?

[00:01:36] Jon: It’s not a problem. No 

[00:01:37] Jay: cost issues. The worst management overhead 

[00:01:39] Jon: or whatever, but not problems, loads 

[00:01:41] Jay: of problems if you’re like, I’ve got this thing and I don’t want any pos potential threat at all on this other thing that I’m running Two different things of data classifications and all this kind of stuff.

[00:01:53] Jay: Two different user groups accessing it. You just don’t want to expose either in, in any way. Then you might [00:02:00] even isolate those, right? Yeah. Completely different 

[00:02:02] Jon: clusters. This is the fictitious, we’re living in right now, all of the monies. So if all the things weren’t issues or magically, then that would be the ultimate.

[00:02:10] Jon: Goal for like security was like, it’s full isolation. No workloads are sharing even the VM even or anything. So you go like full blown, everything is fully isolated, but obviously that isn’t the world we live in. So there is, I know, element of sharing. I’m so much magic in this fictitious world. Yeah.

[00:02:29] Jon: So I guess then you have to think about security at that point. Yeah. As gonna to wind. And we’re talking about things do share because we do know there are costs. Costs and management. Exactly. 

[00:02:39] Jay: Performance, reli, reliability, resilience, 

[00:02:42] Jon: et cetera. Yeah, exactly. All those are factors. , and because of those factors, they now start to weigh in because obviously doing it the way we suggested would be an insane amount of engineering to go and work out how you’re gonna do it.

[00:02:54] Jon: And the cost of it, if you’re in cloud, would be high anyway. So affordability of anybody doing it is. Why not share? 

[00:03:03] Jay: Alright, away from this weird, fictitious world, what’s more of a kind of common pattern if you say, have you’ve got a single use, single cluster for a tenant? What types of. Things or aspects of security would you want to mitigate, just to conform to breastfeeds?

[00:03:20] Jay: Just so we’re grounding in something. I guess 

[00:03:22] Jon: if you are sharing, you’re saying, or even if you’re single talented, you’re not sharing, so I’m not sharing, it’s just my own cluster. Yeah, you’re 

[00:03:28] Jay: just your own cluster, but just baselining. I guess what level of security you’d have, even if you weren’t sharing, and then we can build on the level of security or the risk that you want might want to mitigate if you were sharing.

[00:03:42] Jay: Yeah. Do you see, so we 

[00:03:44] Jon: continue contrast a difference. So some questions would be, are there gonna be any secrets involved in what’s gonna be hosted? Of course, yeah. There’s always, oh, there is of course. So there’s always gonna be secret search. Think about it. If. 

[00:03:55] Jay: We’re talking real general sense, right?

[00:03:57] Jay: So we’re just talking kind of normal [00:04:00] application workloads. Those workloads might, yeah. So we assume have access to databases and things like 

[00:04:04] Jon: that. So it’s not just a generic, a really simple basic app. It’s not, yeah. These basic apps, these are, 

[00:04:10] Jay: this is a little bit more 

[00:04:12] Jon: complicated. No. So it’s more complex and it’s got some, it’s gonna have some secrets in there.

[00:04:15] Jon: And either way, I suppose you’re gonna have our back. Why would they cluster itself as in like access to the api? You need to obviously talk to the API with some authentication. Why does it have to be role-based? We need to au and then the auth needs some access to something. Yeah. But otherwise, there is no auth, basically This’s, no ar back at all.

[00:04:35] Jon: It was just flat, which obviously you wouldn’t want to have even if it was owned by you. Yeah. 

[00:04:39] Jay: There’s d. Kind of access 

[00:04:41] Jon: methods. So I was saying aba 

[00:04:43] Jay: role-based access. Role-based. Yeah. But there’s, it doesn’t have to be role-based. It could just be aac, which is not 

[00:04:49] Jon: that, right? Yeah. But you want to be specific, cuz I’m presuming if we’ve just gone from a, you’re authenticated.

[00:04:55] Jon: But you don’t. And if it can’t be a simple app, cause what you just said is can’t be a simple app, then it can’t be a simple team either. But it’s only one 

[00:05:00] Jay: team is the point. So you might, you 

[00:05:03] Jon: may not need that. I would cuz it’s gonna be, I thought before we’re talking about, there’s other things in this cluster, presumably Ingress and cert manager and those things.

[00:05:10] Jon: So you’re saying they don’t need our back? They do, yeah. All right, so then we need our back then, because those things have to get into the cluster 

[00:05:17] Jay: somehow. So now we’re talking about a platform team that’s created them. Okay, so there’s our back. That’s what we started premise. Yeah. Yeah, that’s my fault.

[00:05:24] Jay: You would need our back because you have a platform team, so just to shoot 

[00:05:29] Jon: there from as if everyone listens to every single episode. Yeah, exactly. Where we decided just to recap decisions were before. It’s a platform team managed cluster that’s gonna be provided to bring one to many different services that enhance the experience for developers, certificate management, auto scaling, ingress.

[00:05:50] Jon: Dns, all those types of things could be, there’s way more obviously than those things we’ve listed, and it wasn’t just bare bones Kubernetes only just to host some apps, it was gonna get enhanced. [00:06:00] So because of that, E even thinking about security, I don’t think a security team would be best pleased.

[00:06:06] Jon: Literally anyone that accesses that cluster or supports anything in that cluster has just has access to anything generically. It’s not a great security model. Probably isn’t. No. No. 

[00:06:17] Jay: So I guess also because that platform team is managing that, they don’t wanna keep making changes to stuff that people have broken.

[00:06:25] Jay: Yeah. Because they have access. So yeah. Our back is a good place to start. Yeah. What’s next? 

[00:06:30] Jon: If we’re talking about secrets and there’s gonna be secrets, which obviously now has to be anyway, I’ve caveated even for myself. Cause things are running in and they’re probably gonna need secrets. Then the encryption of those secrets, if it’s in the cloud, then you can probably use kms or something to encrypt those secrets behind the scenes.

[00:06:44] Jon: The actual. Operating system itself, obviously you might wanna start there. The slimmer the better. The less things on it, the better. The more read only the file system, root file system is obviously that’s good. And you can create separate partition for say the docker containers that can’t be read only cuz they’ve gotta be pulled down.

[00:07:02] Jon: Let’s just go back, 

[00:07:03] Jay: couple of steps, secrets and encryption. Now obviously we’re talking about the cloud provided Kubernetes clusters. Yeah. Kubernetes, it doesn’t have 

[00:07:11] Jon: to, I’m just saying whatever it is, you can, you want to encrypt the backend is gonna be et cetera. D that’s storing and data in there.

[00:07:18] Jon: You wanna make sure that’s encrypted at rest, but then also there is secrets object in there and you probably wanna make sure that’s encrypted with something uniquely. 

[00:07:27] Jay: Isn’t that just part of the data store that’s encrypt? 

[00:07:29] Jon: You can use KMS to actually encrypt the secrets themselves. Yeah. There 

[00:07:33] Jay: is a concept in Kubernetes, isn’t there?

[00:07:36] Jay: Like they, they think about secrets in a different way and slightly more, controls around it. Yeah. Because you And it has different, almost a different 

[00:07:44] Jon: partitioning. Exactly. Cuz you could in theory have, this is all quite a little bit theoretical, but somebody. Might be allowed to look at all data from ET CCE D because they’re responsible operationally for it.

[00:07:59] Jon: That doesn’t mean [00:08:00] projects are comfortable with them. Also being able to see the, say the secrets or whatever. I guess it just depends on roles and responsibilities. So you might want to make sure that secrets are quite sensitive to that project. Project line may not, might want encryption at the secret level, obviously.

[00:08:14] Jon: So you do wanna make sure that just because, sorry, at 

[00:08:16] Jay: the rest level, cuz. Accessing those secrets would be managed anyway through our back. Yeah. 

[00:08:21] Jon: So it’s just now I’m thinking about you could export the data from ET CCE D is what I’m talking about. Okay, cool. Yes. In which case then it’s just all, yeah, flat.

[00:08:29] Jon: Cool. Yeah, because it’s now not encrypted. That means you get secrets too. So might be like, I’m not comfortable with that. If you do export the data from ET CCE D, somebody also then could just see the secrets because they’re not encrypted. So then you could encrypt the secrets with something else so that even if they exported it, they wouldn’t be able to, the secrets, they need a key.

[00:08:45] Jon: See, that’s what I’m talking about. Great. So 

[00:08:47] Jay: it’s encrypted in the database and then it’s encrypted at rest. 

[00:08:51] Jon: Yeah. Would you do that? Yeah, absolutely. 

[00:08:53] Jay: Makes sense. Yeah. , 

[00:08:55] Jon: and then obviously the operating systems. I don’t know what you’ve seen, different operating systems. For obviously the root file system being encrypted.

[00:09:03] Jon: There’s obviously been a change as loads of OSS nowadays. Yeah, 

[00:09:06] Jay: so obviously the kind of cloud provided, managed Kubernetes services, the control plane is always Linux for the most part, right? Yeah. Then you have a bunch of node groups, node pools or whatever to manage. You can have those on windows, you can have ’em Linux, you can have ’em on many different flavors of Lin Linux.

[00:09:25] Jay: Let’s just assume that it’s a flavor of Linux then. You obviously got file shares to manage within that and where those Docker containers are running from. Obviously that’s some of your, potentially, that’s some of your business logic that you are running within those containers and having them on unencrypted.

[00:09:45] Jay: Drives isn’t great cuz it’s cloud, it’s infrastructure, right? So it’s, and it’s virtualized, which means that any point it could be scheduled into someone else’s account or whatever, and they, there is a process of like potentially gaining [00:10:00] access to things that have been running on that before. So you obviously need to protect yourself from those types of measures too.

[00:10:06] Jay: So having encrypted. Volumes, having containers that are running from encrypted volumes. Makes sense. Yeah. 

[00:10:14] Jon: And then preventing installation on the root file system obviously is good too. So if you can make that read only and then have a separate file system for where the docker images are gonna be coming from, and that can be right, but then you can put the controls then around.

[00:10:27] Jon: The containers themselves. Yeah. And what they have access to or don’t have access to. But then that stops people also being able to write to the main root file system and overwrite certain things on there, pretending that they are the real processes. Real immutability really. Yeah. Pure immutability.

[00:10:44] Jon: Obviously some people will also try and encrypt the root file system, but then obviously needs, it’s a bit more complicated at that point on all the key management and decrypt. You can just divide it up. 

[00:10:53] Jay: Yeah. I’m not sure why you would ever really need to, if you are using, if you do 

[00:10:57] Jon: it that way, you don’t, cuz you can’t do anything anyway.

[00:11:00] Jon: Yeah, exactly. But people, certain operating systems, it’s not 

[00:11:02] Jay: possible. Maybe if you’re baking your own image rather than just using something off the shelf. Yeah. 

[00:11:09] Jon: Yeah, so that’s normally you’re starting at that level cause you’ve gotta provide these things. Obviously networking and all those things will be important, but you’re gonna be inside a Kubernetes your topology of what you’re thinking about upfront.

[00:11:20] Jon: Is the infrastructure just all shared equally? But this is, sorry, 

[00:11:24] Jay: we’re talking about the single shared process. 

[00:11:26] Jon: Single shared, yeah. Yeah, but if I’m saying even if it’s single shared within my project, even the applications are the all equal in risk. Oh, I see. Okay. Am I comfortable with all this app just being equal in security even though I’m not sharing with anybody?

[00:11:39] Jon: Or are there certain apps Actually I’m a little bit worried about. I don’t know why you would be, but you might have a real legitimate need. 

[00:11:45] Jay: So this is, the concept of soft and hard multi-tenancy. I know , what’s 

[00:11:50] Jon: the difference between soft and hard multi-tenancy? Yeah. Another 

[00:11:53] Jay: thing. So Hard. Tenancy. Yeah. I think is [00:12:00] when you are in single clusters, I, if you’ve got hard separation between the kind of control plane and everything else. And then soft multi-tenancy is where you’re sharing, say the control plane, but not the worker nodes. You can’t have multiple, 

[00:12:19] Jon: is that right?

[00:12:20] Jon: Sounds like it should be the way around, like soft multitenancy, it’s just multitenancy or tenancy. Cause obviously 10 to have a tenant would presume mean is being shared 

[00:12:29] Jay: full Soft segregation, hard 

[00:12:31] Jon: mo segregation. Yeah. It feels like soft would be that it’s just a control plane, but then it’s flat and the infrastructure isn’t, and hard would be the infrastructure.

[00:12:40] Jon: The control plane is shared, but the infrastructure isn’t, is in its hard. And a little bit like so 

[00:12:45] Jay: hard is basically, it’s out. It’s ev the control plane and the 

[00:12:48] Jon: worker nodes aren’t shed, but then it’s not shed or it’s single tenancy. So it’s not, there’s no, is no tenant the 

[00:12:54] Jay: platform, I guess is the te is the concept 

[00:12:56] Jon: of tenant, is that right?

[00:12:57] Jon: It sounds odd where it’s a tenant coming from, you’re just a tenant of one 

[00:13:00] Jay: tenancy. I need to like Google this because I . There is a concept of soft and hard and I can’t remember, 

[00:13:07] Jon: is it in relation to multi-tenancy? Within itself, as in you can be really Yeah, strict within multitenancy of all, it can be as strict as you can or can with taints tolerations, which you’re not gonna get, which we should get onto at some point.

[00:13:18] Jon: But whereas then the infrastructure nodes aren’t shared from an application perspective, I don’t know. I’m just guessing from what you’ve said. And I’ve heard the term, but I’m never really pay attention. Just sounds like that’s what it might mean. 

[00:13:30] Jay: So hard multi-tenancy is obviously strong isolation.

[00:13:33] Jay: Yeah. Which is exactly like node groups, what we talked about, with not sharing the cluster and not sharing the shared services. So it’s hard isolation. 

[00:13:42] Jon: Oh, it’s cuz called shared hard isolation. Yeah. 

[00:13:44] Jay: So it’s called isolation rather than tenancy. Yeah, I was gonna say cause. And then soft is when there is a little bit of softness to that.

[00:13:53] Jay: And so you’ll share the cluster API potentially, but not the, it’s up to you what your [00:14:00] level of softness is. So whether you want to share the API and not the node 

[00:14:06] Jon: pools. So it’s soft isolation. Hard isolation. Exactly that. Exactly. 

[00:14:10] Jay: And then there’s nothing in between. And then your versions of softness.

[00:14:14] Jon: Change. Very soft. Yeah, I know. Exactly. Medium soft. This is 

[00:14:18] Jay: getting to a weird place for you. . Yes. What? So basically like super hard tenant isolation is separate clusters, complete separate clusters, single use. It’s a single tenant and Exactly. Single tenant. And then there’s ways to have soft isolation in a multi-tenant cluster, which is.

[00:14:40] Jay: Where you would have things like isolated worker nodes. 

[00:14:45] Jon: Yeah. But there isn’t much else you can really do because like obviously the control plane needs the control plane. The API still, the API R back still are back. If you’re using, if you’re putting it on, you could, so 

[00:14:56] Jay: really could have different shared services for those work.

[00:15:01] Jon: Worker nodes. Yeah. But that’s nothing to do with Kubernetes 

[00:15:03] Jay: specifically. Yeah, I know, but in, in this world of isolation that we’re talking about. 

[00:15:07] Jon: Oh, yeah. I suppose from Plat more generic platform saying Yeah, exactly. But they might not even run in Kubernetes, those services. How as in they could just be services that are ancillary.

[00:15:15] Jon: Like I meant, 

[00:15:15] Jay: I meant Ingress. I 

[00:15:16] Jon: have ingress. Oh, in cluster services? Cluster, not cluster, just likes alab or some of the services. 

[00:15:22] Jay: Even your in cluster services might be segregated in this 

[00:15:25] Jon: world. Yeah. Each project has its own ing. Exactly. Each project has its own thing. I see. Yeah. Yeah.

[00:15:30] Jon: So the only thing that’s extreme, yeah, 

[00:15:31] Jay: I guess there’s just yeah, suppose there is. Yeah. That is an option. Obviously Kubernetes is just a bunch of options in the end. Probably isn’t something that you’d likely do because now there’s not much benefit to be gained when you are not running that api, Kubernetes API in the control plane, cuz you could just get cloud providers to do it for you at a very small cost in fact, free sometimes.

[00:15:55] Jay: Soft isolation tends to happen less so [00:16:00] in that world. So the different, ways of then isolating, apart from just say the work and nodes is obviously the clusters in this world where you just have the single. 

[00:16:11] Jon: So many just probably got so confused. 

[00:16:14] Jay: I know exactly. I’m just like going off a bit of a tangent.

[00:16:16] Jay: But in, in the single 

[00:16:18] Jon: single, soft, hard, isolated, multi-tenant, single tenancy, hard, multi-tenancy, softness, words. This is what this means. 

[00:16:25] Jay: So what other controls would you have in the single ? 

[00:16:30] Jon: Jay, you took a hole. Jay. Exactly. 

[00:16:33] Jay: Coming back. Coming back to it. 

[00:16:36] Jon: erm I guess it rewind back cuz I don’t, I guess we’re referring a huge amount of knowledge to a degree.

[00:16:41] Jon: So like anything, you’re gonna have an operating system or an AMI image or some image inside the cloud vendor. They have ones that are designed for running containers, operating systems that are specific just for running containers, not just like generic operating systems. That’s obviously better cause it’s, the principle is everything’s gonna be a contain.

[00:16:59] Jon: And obviously there’s security groups and all these other things that isolate containers. So already from Aker’s perspective, you’ve got C groups and all the other things that are gonna isolate, which is what it’s about getting the. I guess what they deem software virtualization isolated at the operating system level and that’s the, that’s basically what gets provisioned and then there’ll be a cube on there that is obviously talking back to the control plane and the next dis.

[00:17:25] Jon: Post sending information about itself to be like, Hey, this is my C P U memory and this is who I am. This is what’s running, et cetera, and then that’s gonna talk to the run time. That’s then gonna obviously run containers on it. But then the control pain that you’re speaking to in Cub eight is what we’re saying is if you’d made those decisions.

[00:17:41] Jon: Obviously does we Kubernetes objects that’ll be, here’s my app, this is what I wanna deploy, here’s a container. This might need a secret. What we’re then saying is, oh, there’s a secret object. Now I’m a bit worried about that. Cause that’s going in some data. What about if somebody exported that data? Yeah.

[00:17:56] Jon: Could they export the secrets as well? No. Cause we need to encrypt that too. [00:18:00] Yeah. So now you’ve got some levels of security from when people actually are putting objects Kubernetes. And telling the API what they want, it’s gonna go off and reconcile and make those things happen that you’ve asked for if it can.

[00:18:12] Jon: , but then the workloads underneath, you’ve already made decisions on even what the operating system is under that’s then isolated and even the partitioning on it, and then the encryption of that partition, even just in case that is ancillary to Kubernetes, but isn’t Kubernetes itself. Yeah, 

[00:18:28] Jay: exactly.

[00:18:29] Jay: So you, I guess you’re just talking about the actual. Not within the API at all, really. It’s just the actual node itself and how basic the containers 

[00:18:38] Jon: are running on those nodes. Yeah, exactly. Elements, right? Nothing Kubernete centric at that point. Yeah. The only bit it is then obviously the Cuba that I mentioned and the control pain, but actually where those containers are gonna run.

[00:18:48] Jon: You’d wanna do that anyway, even if you weren’t using Kubernetes, is what I’m saying. So they’re just best practices from an OS perspective. Yeah, and there’s best practices from a Kubernetes perspective. It’s about the presses is from the data store perspective of Kubernetes behind. So there’s layers and then, 

[00:19:03] Jay: alright, so let’s jump into the Kubernetes layer now at that stage.

[00:19:07] Jay: Alright, so you’ve done nodes, they’re hardened running that sort of best practices use using best practices that you’ve just described. What’s next? How would you isolate or how would you achieve a kind of standard level of security at that level? Actually, another thing maybe at that node level is whether you are allowing some of those processes on those nodes to run in certain spaces or run, or being able to be escalated into like privileged mode or whatever.

[00:19:36] Jay: So that’s, I. . That’s where Kubernetes touches the node. 

[00:19:41] Jon: I guess level controls. A few things even before that is why it’s a big topic is do I want this endpoint for Kubernetes API to be public? Does it need to be private? Exactly right at the start. Yeah. What about access to it? What about.

[00:19:54] Jon: The cubit’s also got a talk that I mentioned that’s running on the VMs. , is that encrypted? Yeah. Does that have a [00:20:00] certificate as well for mutual tls? Is it authenticating itself? So something can’t pretend. Yeah. To be that cuber and start getting access to data and information, there’s all these layers because obviously it’s an api.

[00:20:12] Jon: QEs itself has to make sure that the things talking. That it needs to function are also hardened. And then also the decision of how private and how public do I wanna operate? Am I just gonna be like, I don’t mind, it’s public cause I’m gonna rely on OS and that OS might be open id. So I’m relying on open Id connect and that’s fine.

[00:20:33] Jon: And my security posture is I’m okay with that because it’s an endpoint that’s encrypted and I trust single sign on and open ID flows and then they are back or. That’s still too risky. I still want all of that plus. Put it on the VPN and don’t make it public or lock it down to any ips that I know and then firewall it off, but it could still be public.

[00:20:55] Jon: So there’s lots of choices even right at the beginning to be made before even you get to running workloads on when you’ve got your, just your control plane. Yeah. Given just exactly installation stage. 

[00:21:06] Jay: And then I guess when you are getting a little bit closer to understanding how to run workloads. Yeah.

[00:21:12] Jay: What’s next? How are you gonna , I’m just 

[00:21:14] Jon: so eager 

[00:21:14] Jay: to get something wrong. Yeah, I know. Exactly. I have this cluster, right? Let 

[00:21:17] Jon: gimme, gimme validate. I run anything on this cluster’s a bit. Gungho don’t seem, there’s no arrb back on it. Apparently. It’s all, there’s arrb back. We’ve 

[00:21:27] Jay: there is our back.

[00:21:27] Jay: Come on. It’s . And we realized there’s, it’s. Soft. Classic. No, no isolation. Exactly. Ur , . 

[00:21:36] Jon: Yeah. So I do think though those decisions, even those early decisions, make it harder. Cuz if it’s private, then obviously the V P N is there, right? You need some form of VPN N otherwise how you actually even accessing it.

[00:21:49] Jon: Yeah. 

[00:21:50] Jay: Or not necessarily vpn, you could have a bastion or a jump box where you are. True. That’s somewhere. Yeah, exactly. 

[00:21:56] Jon: But there is, you’re just moving the T. You can’t directly talk to it without, [00:22:00] yeah. I mean you could directly talk to it if you’re on the network of it, you’re moving the boundary to something that else that can internal talk to it.

[00:22:06] Jon: Yeah, exactly. An internal boundary now. Yeah. Yeah. So that can make it a little bit hard. Obviously to then do all the automation around everything. You’ve then gotta go off and deploy the things in after the cluster’s built, like the ingress, like the sos. How is that thing, what is the thing that’s talking to the API that’s now on a private network and does it have access to it?

[00:22:26] Jon: Yeah. Obviously from a q and eight perspective, that stuff isn’t always there. All the things you might want won’t be out the box. Some things might be from the cloud vendor and that you can choose from, and there’ll be a bunch of other things that you are gonna have to then put on even before anyone’s deploying the app, Jane, even before.

[00:22:42] Jon: Wow. Sounds like a 

[00:22:43] Jay: lot of work. So I still wanna blow my up. So can you 

[00:22:47] Jon: anyway, four months later where we’ve got through all of these decisions, , what you mean? Jay got another job. Does he not working anymore? . Where’s working now? Somewhere that 

[00:22:59] Jay: clusters ready. Someone that lets me use bloody Kubernetes

[00:23:06] Jay: Yeah. So what kind of controls are you gonna put in place so that the workloads can run in a, if like in a secure 

[00:23:14] Jon: way? . So I’m gonna do some the workloads. I’m gonna do some security. Oh, security thing you asked for. Oh, thanks. Yeah, thanks. Cool. And that’s it. That’s the only, so that’s the things they’re gonna do.

[00:23:25] Jon: And you able to run, you secure, 

[00:23:27] Jay: this is why John has paid the big box. Yeah, exactly. Security for security 

[00:23:30] Jon: sake. Just, I’m gonna do the security things. Secure things. Don’t worry, Uhhuh. They’re all done. Yeah. What is it you’re asking for? So you want to 

[00:23:40] Jay: what? What is best practice around running workloads in a secure way that where, where Kubernetes configuration meets best practice for running processes on 

[00:23:52] Jon: nodes.

[00:23:53] Jon: Oh so you are saying now from an infrastructure level, you want to isolate some apps to specific bits of infrastructure [00:24:00] so they don’t share fully the enr, is that what you’re saying? Not 

[00:24:03] Jay: necessarily fully the in infrastructure, but so that they can’t escalate out and break out of the app. I see.

[00:24:08] Jay: Break out of the container. Whereas a 

[00:24:10] Jon: few things you could do and some of them overlap cause you obviously solve the problem in different ways. But you can use pss, which was originally psp. Pod security policies. POD security standards. Is IT standards or system? There’s pod security. 

[00:24:25] Jay: I was fair Pod security admission, which is the first kind of layer of that.

[00:24:30] Jay: Yeah, this is the first layer, 

[00:24:30] Jon: but pod security, the actual policy itself, which is now being diminished because they’ve moved a lot of it to be like more driven by policy agents. Yeah, exactly. That’s thing, it’s evolved a little bit where before you’d have a positive security policy and that was some.

[00:24:45] Jon: Kind of generic elements that was limited on scope of how much it would do anyway. That’s I don’t really want people mounting in the host network. I say people, containers. I don’t want containers. Same. There’s just many people. I don’t want containers being able to be orchestrated to then mount in file system elements, specific things from other containers, et cetera.

[00:25:06] Jon: So you’d isolation other things would be. The actual process inside the container shouldn’t be root cause then if it gets exploited, obviously then it’s running as route. Who can install things in on that container and do other stuff? So you can start to enforce all these standards as policies to make sure those things are r read only file system.

[00:25:25] Jon: Again, you might be like, actually, It needs to be only root file system can’t speak today, and 

[00:25:31] Jay: then moving across user groups or username spaces in the learner kernel being privileged or whatever. 

[00:25:37] Jon: Yeah. Set comp policies as well. Exactly. Types of system calls that could be made by that process that’s running and outside that container.

[00:25:44] Jon: So you could even be really locked down. You can use SE Linux. You can drop as well capabilities. You can drop capabilities. 

[00:25:51] Jay: Yeah. So that, yeah, so that you can never expose the capabilities. That container shouldn’t be able to have binding to certain networks under certain ranges or [00:26:00] whatever.

[00:26:00] Jay: There’s 

[00:26:01] Jon: a lot. You can, there’s a lot, right? There’s quite low level to a point because you need to understand what that would mean. You basically 

[00:26:07] Jay: need to be a Linux admin. You, it’s the same skills that you would have as a Linux administrator. Really 

[00:26:14] Jon: back in the day. Were. Aware of like system calls and things like that.

[00:26:18] Jon: Yeah. If you were low level enough Yeah. To understand like how processes work and user space and kernel space and all those things, then yeah, I suppose infers quite a lot of. Knowledge around how to secure what’s running on it, but if there are Windows, containers obviously different. You’ve 

[00:26:35] Jay: got very similar concepts nowadays, obviously because Windows and they’re merging to a certain degree, 

[00:26:41] Jon: aren’t they?

[00:26:41] Jon: Yeah. It’s definitely aligned more over the years, isn’t it? Yeah. Yeah. Obviously different ways. That’s not my specialty with those, so I can’t really comment on these. No. Mine, which is why I’m like just glossing over it really quickly. Kinda the same nowadays, aren’t they moving on

[00:26:54] Jon: Yeah. Not my real specialty, but yeah. So those would be the main things. What would you. Be doing. So say if a team was, had some scenario where there was some front end services and backend services and maybe one of the backend services is P C I E, in nature, but it’s single tenant anyway.

[00:27:15] Jon: Would you still wanna architect differently or would you treat it the same? Potentially, 

[00:27:19] Jay: I guess at front end. Really, because you’ve already got an ingress layer is you’re keep keeping that separate and so all traffic coming into your cluster might be on potentially a different node pool and then everything else is exposed internally in the cluster anyway, cuz you’re not actually getting, you’re not binding certain pods directly to a load balancer.

[00:27:43] Jay: You’re going through an internal service, i e a load balancer exposed through ingress. With EngineX balances. Exactly. And 

[00:27:50] Jon: then got another green cloud, then it would be, there’ll be a 

[00:27:52] Jay: cloud load Sr. To the ingress, which is the internal kind of load balancer inside the cluster most of the 

[00:27:59] Jon: [00:28:00] time.

[00:28:00] Jon: Not always but n 

[00:28:01] Jay: EngineX. Yeah, exactly. And then that is then rooting traffic to your pod through se, the service kind of definition of that pod. So it’s kind of load balancing internally to your pod or pods at that. Your, it’s internal anyway, so I’m not sure you would necessarily need to segregate that off to a certain space.

[00:28:22] Jay: But if you wanted 

[00:28:23] Jon: to apply, but say that this external facing thing that is then the ingress. Yeah. That ingress is living on What? Vm? Like where? So that 

[00:28:35] Jay: You’d probably have a separate node pool for where ingress 

[00:28:38] Jon: is. Okay, so you dialed, like thinking about curving it out cause as if it’s flat then it could be on the 

[00:28:43] Jay: same node.

[00:28:43] Jay: Same, exactly. It could be on the same nodes. Exactly that. So you’d have different node pool for where the ingress is. And then as I was saying, everything beyond that stage could be on one sort of larger backend node pull. Even this front end service is, Just another layer of backend really, because the front end is English, 

[00:29:00] Jon: so the low balance is going to it so indirectly, but it’s still going to it through Hgtp protocol.

[00:29:07] Jon: Even though there are stages of the routing. Yeah, but the full place will be, I guess we’ll just be talking about risk profile from an application perspective. It like rights, some infrastructure. We’ve done some of the hardening from an infrastructure level. There’s a bunch of other stuff we’ve mentioned around hardening Kubernetes stuff and what you can and can’t do.

[00:29:26] Jon: But now from an application level, the find the clusters private. So I can’t actually talk to the API of Kubernetes. Yeah. But I can talk to a service that is hosted inside that private cluster. Yes. And that’s public. That’s public. Yeah. And that’s now an Engine X, right? Proxy. . Yep. That’s basically, essentially then proxying traffic to a service endpoint, which is basically rooting it then to an app.

[00:29:50] Jon: Yeah, 

[00:29:50] Jay: potentially an Engine X with web application firewall. All right. In front of it. In front of it. Yeah. So I’d put. Put a waff in front of it just to mitigate, some of the kind of [00:30:00] common risks like SCLs and injections and things like that. And then it would pass traffic down to your actual application, and you probably want that traffic to be protected.

[00:30:09] Jay: So even between, so your traffic to the ingress is gonna be over hc, tbs and encrypted. And then from the ingress to which is on one node. To a pod on another node is gonna have encrypted traffic towards those from those two. And then I guess you’re talking about another backend service.

[00:30:29] Jay: What’s encryption? 

[00:30:29] Jon: The traffic, what we’ll be doing, the traffic encryption. Cause obviously the application is there. The application is designed. For encryption and they’re gonna have to do it at the app level, and you 

[00:30:39] Jay: could do it that way. But I guess to make developer experience really nice, you’d probably wanna provide a bunch of in cluster services, and these are the things that we talked about previously.

[00:30:49] Jay: So you can have something like cert Manager running. So for the ingress it gets a certificate from a free service. Let’s Encrypt. And that’s a free certificate. That’s you. Last for 90 days and it renews itself and all that kinda stuff. And then maybe internally you have a, an internal certificate issuer, and your, the cert manager is automatically generating a cert using that issuer and giving it to the pod, mounting it as a secret for your application to then use or even having a separate kind of sidecar alongside your application to do, to terminate tls.

[00:31:28] Jay: So there’s a couple, there’s lots of different patterns in, in Kubernetes, lots of different add-ons you can use the Envoy kind of proxy. You can use engine X, lot service meshes lots 

[00:31:38] Jon: of things. So whatever you use, I suppose what you’re saying is you don’t want, is it actually has to terminate.

[00:31:45] Jon: The TLS has to terminate at the container level. Like the pod level really? Exactly. Yeah. So that then obviously it’s process level as close to the actual, not going over a network anymore when it’s then talked to the service. But anything going over the network always needs to be encrypted. But otherwise, [00:32:00] local hosts obviously fine.

[00:32:01] Jon: Yeah, it’s one thing to another. Great. That’s all right. But anything traversing actual network and not O not okay to be unencrypted. 

[00:32:08] Jay: Anything traversing. Different 

[00:32:10] Jon: nodes. Yeah. Yeah. Anything over the actual network beyond local host? I guess you’ve 

[00:32:15] Jay: got a different concept of networks, right? Because you’ve obviously got the kind of Kubernete Kubernetes layer network and then the 

[00:32:21] Jon: kind of host level, but even then that’s pod to pod, right?

[00:32:24] Jon: So you’d wanna make sure cause it could be on another node. Yeah, so I suppose any network that isn’t local host, just to be on the safe side, you just wanna make sure that you don’t know whether you’re on the same node or not, but it’s still gonna root it as normal. Traffic. So you wanna make sure that’s encrypted.

[00:32:40] Jon: And then what about if you need storage or something? In my app? Yeah. So you have the storage interface and that’s gonna be configured. God, we’re gonna make Kubernete sound so complicated, . Yeah. Please don’t let any of this put you off using Ed. But now the more I’m talking, you start to realize like this just quite a lot.

[00:33:01] Jon: But there’s a lot. Yeah, exactly. There’s a lot. And this is supposed to be scary related , but Yeah. So I’ve, we’ve talked 

[00:33:06] Jay: about, 

[00:33:07] Jon: Security. You can talk about the room times. We haven’t. Yeah, exactly. 

[00:33:09] Jay: But we’ve talked about the security of where the file share that your images are running from, but there’s obviously if your containers, once they’re running, they actually need to mount some storage.

[00:33:22] Jay: Storage container Yeah. In that container. And you probably want to encrypt that. Again using some sort of external key in Amazon, you obviously have kms to encrypt the ebss or, yes. All the five vendors do. 

[00:33:34] Jon: Yeah, exactly. They all have a key vault. Yeah, exactly. Element to be able to like obviously encrypt.

[00:33:38] Jon: Yep. Nowadays. So that’s good. You’ve got that. But there’s also, there’s standardizations that have happened over the years as obviously vendors have clouds a vendor as well. So who say vendors are including cloud in it. The more variety there became, the more standardization on interfaces. So then you had there’s loads of [00:34:00] storage that you could get.

[00:34:01] Jon: So we’re gonna have to call it container storage interface called the csi, which is 

[00:34:04] Jay: also the. Container secret interface. So there’s like a CSI driver and a CSI drive, like ones for storage or ones for secrets . Very 

[00:34:14] Jon: fun. But, and then there’s obviously the runtime interfaces, which is all the different ways of running containers.

[00:34:20] Jon: Yep. And that’s standardized over the years. And then there’s the oci, which is the Open Container Initiative, which is actually all these run times that you’re all creating. We need to standardize on them cuz there’s so many of them. We’re gonna standardize on that. That’s when the OCI I happens. There’s just been loads of evolution around, not just Cubane specifically, just options around the space.

[00:34:38] Jon: Just options around the 

[00:34:39] Jay: space and then standards on those 

[00:34:41] Jon: options. Yeah. But really to be honest, who cares? Just use, do you know what I mean? Use run C, use container D like all this stuff. Study. You just need to run a container from an application collective, you’re gonna need storage. We’re saying encrypt it and you’re gonna, we are gonna need to run containers.

[00:34:57] Jon: Make sure the os is, 

[00:35:25] Jay: like we, we haven’t even again touched on say some of the shared services, right? So let’s say. That ingress that you set up or let’s encrypt or whatever.

[00:35:34] Jay: You then have a bunch of other things. You have kind of encryption modes, you have ciphers that you could be using. It just 

[00:35:43] Jon: It’s a lot. Even the TLS element of the actual certs. Yeah, just that, getting those versions right. Just. 40 old versions. Like all the things of, all the detail of all the stuff is that, yeah, it’s from a security perspective, there’s a lot cuz there’s a lot of components and therefore all those things have to be done [00:36:00] well from a security perspective.

[00:36:01] Jon: The more components, the more things there are to secure, which is what makes it. Quite a headache, and we again, 

[00:36:07] Jay: haven’t talked about some of the risks that you might want to mitigate, so third party supplier risk and where you are getting your containers from and things like that. Whether you are scanning your containers to make sure it doesn’t have any vulnerabilities before you run it in your cluster with potentially your other workloads.

[00:36:25] Jay: What’s the process they’re going through? Are you standardizing on that? Who are you allowing to even deploy to your cluster? Is that a service? Are you letting developers direct AC access your cluster, this cluster directly? Are you going through like a C I C D system? Are you going through more of a kind of declarative approach with GI ops?

[00:36:44] Jay: What does that look like? So there’s so many layers 

[00:36:48] Jon: to, but I suppose those layer, maybe it’s more actually like talking, maybe it’s more the platform. Responsibility because to be fair, those have always existed. Thinking about it, operating system security has always been a thing. Certificates have always been a thing.

[00:37:02] Jon: , patching, managing all that has always been a thing. Encrypting storage always been a thing. Encryption at rest always been a thing. Encryption in transit always been a thing. The principles have always existed, so it’s always been hard. Se Linux been around ages, making sure you order, building se, linnux policies from the audit.

[00:37:17] Jon: All those things that you used to have to do anyway. All been things because processes are running. On VMs. That has an operating system and despite however it’s got there nothing’s changed really. Nothing’s changed from that aspect. Yeah. Just the only thing is you can share now the virtual machine, and there’s more security boundaries within the Linux kernel that allows for that.

[00:37:39] Jon: It’s in the evolutionary bit, but then all the other things are still the same, but. I think, and if you haven’t come from that background and maybe you’re learning for the first time, you could be overwhelming for sure. You might be overwhelming cuz you, they’re all new concepts to you potentially.

[00:37:51] Jon: But actually they’ve been in the industry for a really long time. Yeah. But just different, a different shift. Sounds like it’s all Kubernetes, but actually in some ways, [00:38:00] Some of it isn’t. Yeah true. 

[00:38:02] Jay: But I guess even in Kubernetes, like the cloud providers, if you’re using a managed Kubernetes offering, then they’re taking care of some of that complexity, but 

[00:38:12] Jon: you start to tell it things Yeah, sure.

[00:38:14] Jon: You want because Yeah. You then need to know those things that are Sure. Good things. To pass in to the configuration of Kubernetes and that we spoke about. And maybe if you didn’t know out the box, the defaults might not be any of those things because it wouldn’t work. Gonna get a private cluster be like, why can’t I talk to the cluster?

[00:38:31] Jon: I have my access to cluster. Yeah, exactly. So you might not realize, so out the box of defaults are normally the reverse of those things. Cause it can’t make assumptions. So I think that’s when people might reuse a terra module or reuse something. Designed for consumption at the end. Yeah, exactly.

[00:38:47] Jon: Not designed for the default boundaries and security. Yeah, exactly. Bad defaults. Yeah, exactly. Bad from a security perspective, maybe. Exactly. But not bad from a consumption one. Yeah, you’re right. What about. Network policies. Oh 

[00:39:00] Jay: yeah, exactly. So segregating, which pods can talk to other pods. It’s another kind of layer, of boundary control and security 

[00:39:09] Jon: policy.

[00:39:09] Jon: And then signing container images so that you know that they’re coming from a place that you trust G or even you can do open, I. Can use the key of Open ID to sign actually keyless signing can’t you as well? Yep. Oh, did you? Yeah, we . Oh yeah. Did we? Yes. . Oh, did 

[00:39:28] Jay: Somebody Avie wrote a blog on it.

[00:39:29] Jon: Fair enough. But yeah, so there is a blog about that. That’s true. But yeah, you can. There’s still all those things is you’ve not just encrypted, but you want to encrypt maybe the container image. Then you need to decrypt it with something, and that’s gonna give the trust. You can also use the hash as well, which is actually each layer of the container image is hashed rather than like a simva version.

[00:39:52] Jay: Referring to the actual hash, which is a thing that doesn’t change rather than the A version tag. Yeah, 

[00:39:58] Jon: tag, yeah. Does [00:40:00] change. Exactly. So then it’s actually that’s the hash that I’m expecting. Yeah. Yeah. And then basically I think it like, Does the sum of all the layers. Exactly. And then that does the sum match.

[00:40:09] Jon: Yeah. And then if it doesn’t, then I know it something’s altered or that’s not right. Yeah. From what I’m expecting. So that’s another thing you can do. Something’s injected something into a layer or whatever else. Exactly. Cuz the container has made up of many layers and each layer has a hash. So there’s lows and lows and lows of layers to this, and layers of security that goes on layers to 

[00:40:27] Jay: images and layers to security.

[00:40:30] Jon: Probably what we have worked out is you probably do a lot of these things anyway, to a point from, if you have 

[00:40:39] Jay: the background in Windows and Lennox administration, then you’re probably used to a lot of these best practices overall. But if you are not, then some of these concepts 

[00:40:49] Jon: are gonna be new to you, but some of them don’t matter whether you were single or multi.

[00:40:52] Jon: Yeah. It’s if I’m pulling a container down, how the hell do I know? Is it my container? Exactly right. If I’ve not encrypted it or I’m not validating any of it, if 

[00:41:00] Jay: I’m not checking, that has come from a reputable source. Yeah, 

[00:41:02] Jon: exactly. So I have no awareness of actually what’s running in that container in the end if I can’t trust that it was mine, that’s nothing to do with single or multi-tenancy.

[00:41:11] Jon: But obviously matters more if it’s multi-tenancy. Exactly. Matters more because if there’s loads of things sharing that host, it’s gonna affect everything. It’s gonna affect everything rather than just the app line. So this is all like the decision process, but doing these things is obviously good. , but the impact when it goes wrong.

[00:41:29] Jon: Changes and multi. Yeah, exactly. And I think that’s the thing, isn’t it? Yeah. But then doing it everywhere when it’s singular, also harder, because how do I know that’s done everywhere? Yeah. Repeatedly. And how do I get visibility of all of that? The management of it all. Yeah. Yeah, exactly. Then changes, and then there’s the.

[00:41:47] Jon: And you’re like, oh, this is quite a lot of money to do it this way. Yeah, but it’s more secure. Yeah. You’re like, yeah, but I’ve gotta scale my 

team. 

[00:41:53] Jay: Yeah. Cost in the implementation of it. Yeah. The maintenance of it, the things that it might run in the cluster, [00:42:00] support those that way of working. Yeah. Yeah.

[00:42:01] Jay: It’s quite high. It’s 

[00:42:02] Jon: very high. So if you were to go and get a security product and probably run it against a bunch of. Against your cloud account, against Kubernetes, against all stuff, you’re probably gonna get loads of problems. Yeah. Probably based on half most of the things we’ve just mentioned. Spit out loads of these issues and then you’re gonna have a big book of work anyway. Yeah. That someone’s gonna have to do. Whether you like it or not, if you’ve tried to be secure, if you haven’t been secure, then you’re not secure , isn’t it? It’s pretty simple. Yeah. And all these are the things you probably should be doing if you care about security, but if no one knows , is there a problem?

[00:42:35] Jon: Yes. If a. And no one hears it. Did the tree fall? I’m 

[00:42:38] Jay: pretty sure there’s someone that knows. I’m pretty sure there’s someone that’s in there right now. Yeah. With a lot of knowledge. we’ve talked about security quite a bit. We haven’t really given the. The other aspect of this, which is cost, because obviously, like we were saying earlier, everything’s a bit of a trade off.

[00:42:57] Jay: These are really the baseline minimum standards that you would, that we would potentially harden a node, a cluster, an organization’s capability on. But we’ll have to talk about how to have the same or similar sorts of patterns to managing cost, both the single and multi tented level. And we’ll do that in 

[00:43:18] Jon: another episode.

[00:43:19] Jon: Cool. All right. Are you gonna do that? Yeah. Talk about cost. You love money? Yeah, I love the cost. Love the money, . Cool. All right. Thank you, AIOS.