August 15, 2023
Season 1, Episode 23
In this enlightening episode, Jon sits down with Chris Stura from PwC UK to delve into the intricate world of platform engineering and its pivotal role in business success
Guest Introduction: Chris Stura
Chris Stura from PwC UK to delve into the intricate world of platform engineering and its pivotal role in business success. Chris, known as the “Cloud Guy” at PwC, shares his journey from the early days of cloud adoption to the present, highlighting the evolution of technology and the challenges businesses face in the realm of cloud cost assurance.
In This Episode, You Will Learn:
Themes Covered in the Podcast:
Quick Takeaways:
Follow for more:
Jon Shanks: LinkedIn (https://www.linkedin.com/in/jonathanshanks/)
Jay Keshur: LinkedIn (https://www.linkedin.com/in/jaykeshur/)
Jon & Jay’s startup: Appvia (https://cloudunplugged.io/)
Jon [00:00:02] Hey, Chris, welcome to the Podcast. It’d be good to do an introduction for yourself, where you’re from, and a bit of background, and I can kind of sit and talk about some of the things we’re going to chat about today.
Chris [00:00:14] Yeah, absolutely. So currently, as you can see from my hoodie, I’m the Cloud Guy at PwC. And the official title is Director of Financial Services in consulting and helping customers to migrate to the public cloud. One of the things that I’m also not quite officialized yet, is that have credentials to gain the first AWS Ambassador badge for PwC UK. I’ll be one of the first that we have in that space. It’s a quite coveted area. You can see I’m a big AWS fan. I used to work over at CloudReach, and we were one of the people that brought PwC to market originally early on. And, what we’re seeing now from then until now, quite a lot of things have changed. You can tell from my accent, I’m British. I was born in Oxford and grew up in San Diego, California. So definitely been around the block and the world a bit, the possibility to experience a lot of different things. In my time, I haven’t always worked in financial services. I’ve been in a lot of different industries, from the public sector, private sector, telco, media, technology, product companies, consultancies, you name it. I’ve done a bit of everything from software build to advising customers. So a lot of interesting things I’ve seen over time and have the good opportunity to experience a lot of technological evolutions. I saw things like Java emerge. So I’m old enough to remember the birth of Java and it’s in its beta stages. The different versions of J2EE, a lot of the containerization stuff, and the operating system wars have been kind of a wild ride over time. And now we’re seeing the public cloud, its evolution, and the evolution of the kind of battleground that you see between the different cloud providers in that space as well. So I think I’ve been very fortunate over the period in my career to see technology grow up and expand so much over those time.
Jon [00:02:01] That was good. To not get into the technology side, I’m kind of curious because you just said you’re from England. Do you come back over here often, then? Are you always coming back to the UK? And do you work between both places or are you mostly?
Chris [00:02:15] No, so I’m based in the UK full-time. So I work in London. I live next to Heathrow Airport, quite close to Heathrow Airport. And I have done it for the past eight or nine years. Before that, I was doing consultancy services software build around Europe. So I haven’t worked back in the States since my university days.
Jon [00:02:32] Oh, right. I was going to say because San Diego to Heathrow is quite a different lifestyle, isn’t it? So I didn’t know if you’d.
Chris [00:02:39] Well, there is a direct flight.
Jon [00:02:41] But did you cut your teeth, then, from the technology perspective in the States? Did you work over there, or was it more studied and then came over here?
Chris [00:02:50] So I did some work over there, but not much again, early-stage career. So I’m an engineer at heart, and my expertise is in distributed systems, distributed computing. And so I was focused on very early on before threading was a thing. You know, you had kind of the Virtualization of threads on a single CPU, and then the use of multiple systems and big compute grids. And that was kind of the area that I had picked as the most interesting to go and exploit. So more of the software focus side of things, but fortunately, I mean, if you think about cloud technologies today, the cloud is just a big distributed system. And this is what you’re seeing in a lot of the big systems that are out there. I think Kubernetes is one of the prime ones.
Jon [00:03:27] Yeah, it’s come around hugely over the years. I’m kind of curious to know them. Where it was then in your journey? Because I’m guessing you were over here in the UK then and started your career here after studying out in the States and then moved over here and then started your career, I guess, from the software side, just by the sound of things and distributed systems. But then when did you get into the cloud side? When was your first exposure from outside of that to cloud technologies? And I’m not sure whereabouts it was at the time. Was it like an S3-only play at the beginning or an EC2 playback when you started?
Chris [00:04:06] So I started using the cloud early on. I was at the time, I was doing some work in Italy, in the North, Lake Garda. And noticed the advent, even before AWS came online, we started to see services from some of the kinds of VMware Virtualization. You started to get VMware as a service sort of thing, not VMware Cloud. And then some of the cloud services that backed off the back of that. So you’d have kind of KVM-based systems and early Linux distributions are capable of mild containerization, something called OpenVZ. And so we started using those very early on because I was working for kind of small software development type ISP, initial SaaS type offerings that you started to see in the market. And a lot of the designs around those systems for efficiency’s sake were built in a not quite containerized fashion as we know today, but definitely with very light Virtualization, which is built into them. We’re starting to use API-driven architectures and event-driven architectures. Though at the time we didn’t have a lot of software systems that we know today. So you didn’t have things like Kafka or Confluent. You didn’t have the distributed databases that you see today, the Cockroach Labs of the world, or CockroachDB. You had to kind of build things from the ground up. And there were kind of basic components you’d assemble. And you also started to get some of the kind of early-stage open source application servers at the time, Java world and things like Glassfish and stuff like that. And so in the software that we used to build to deliver the SaaS solutions to clients, we would go out and deploy these technologies and we would make use of the very first thing we used on AWS was SES at the time. And we’re talking years ago. So it’s been quite a journey. And when you started to get things like EC2, then that was just a game changer because we were able to get build systems more dynamically. And I remember the US looking very early on at not only how we would deploy the infrastructure to code the DevOps as you know it today, but going back years, we were looking at using the APIs themselves within the software to auto-scale the software that was built. And this was all for efficiency’s sake.
Jon [00:06:09] Yeah, so were you then working with application teams? I guess we like the bridge from an infrastructure perspective before I suppose kind of DevOpsy movements were happening around then anyway, wasn’t it? So it was kind of a look.
Chris [00:06:21] Yeah, I’ve always kind of straddled the two worlds. Cause I came from the software space and building applications. The infrastructure was being able to manipulate the infrastructure was something that I saw as coming into the development space. It was a way for the US to avoid reaching out to the classic system administrator. And it made the development of the software a bit more efficient from an engineering standpoint. So we had the APIs, we’d get access to the APIs, and then we’d… basically manipulate those within the software as much as we could. Then you started to get the deployment pipelines and the rest of the maturity around the DevOps space, which in my view, I think it went in the early days a bit too far to the infrastructure side. And if you think about the balance between who was the system administrator at the time and the engineering function, I think the system administrator role evolved much faster than the developer role evolved into the infrastructure space, take advantage of that in the software. A lot of the projects that I had worked on early on, we took that more from the engineering side because there were smaller companies working for and use of a system administrator in its own right was kind of an overhead that we didn’t see necessary. A lot of the developers took on that role. So there was DevOps from the other side of the fence. But of course, in the evolution of what we’ve seen in the industry, that’s happened more on the SA side towards DevOps, DevSecOps, et cetera, where you set up the infrastructure in a somewhat static fashion, though using things like DevOps, CICD Pipelines and whatnot for the management. So you have the rapid iteration behind that, but it’s not connected to the software systems you’re necessarily deploying. I think there’s been a nice movement around SRE when Google started to introduce that methodology where we’ve started to see that come back a bit, system administrators caring about the applications, the applications being tuned for the infrastructure they run on and a lot of the metrics that you’re getting the observability, then leverage to make the software better. It’s a bit more cooperation between infrastructure and software. As we move into the realm of Kubernetes, I think infrastructure is largely disappearing and that whole thing is starting to blend back up again.
Jon [00:08:27] Yeah, it’s been like, I kind of agree on that. I think you said it because it’s been a definite agreement on the DevOps movement. It had a sense of irony behind it in some ways, because it was supposed to close down the Delta between Ops and Dev, but they created, the industry created a role behind it. Therefore, creating a silo between Ops and Dev, but I called it DevOps. So it kind of, it felt like a little bit of a rebranding, even though with the same problem, and then everybody was trying to abstract to work, like the roles themselves over time, even though the principles and methodologies were set out and aligned about what they were there to do and the challenges that were there to overcome. There was a sense of irony around, did create pillars, like a whole pillar of DevOps, who would do those are doing outside of the Dev and the Dev wouldn’t necessarily know how they did it. And then they just got a dependency on the same people and then brokering between them and the infrastructure. So really it was like an interface that became human between what the Dev needed and then the actual infrastructure rather than an API. And then that was kind of like programmatically understanding how to use it. So it’s kind of like quite funny to me, it’s kind of ironic.
Chris [00:09:37] But there is a, I think there was a lot of good in that, right? Because of the traditional sense, the infrastructure that you had before was static. So you had to go out, you had to buy tin, you had to order tin, you had to make a business case for tin, and the tin then had to last X amount of time, right? Even though customer requirements changed, you still had to kind of make your stuff fit in the tin. And so from a development standpoint, you were always very cautious in what you asked for. You’d go in and you’d say, yeah, I’m good with this, which is probably oversized times for what I need it to be, right? And you have all sorts of problems with managing that infrastructure when it went wrong, or you couldn’t get the right pieces, or you can get it fixed in time. The old world of the tin was completely different from what you then had in the infrastructure space. So I think one of the biggest developments that I found interesting, especially with the adoption of AWS is when they introduced Auto Scaling Groups on EC2. So that was a real game changer because you could communicate back with the DevOps teams and you could have infrastructure that scaled. And you were able to take advantage of or build stateless software. So you could take advantage of that scalability. You started to have to think about the design of systems and decouple the data parts of the system from the computational side of the system. Again, something that you would do classically in distributed systems, but wasn’t really… Because you were looking at three-tier architectures at the time, right? So the applications that you were building, these are the kind of the Java enterprise applications you were building, EJB’s, you were building the Java server pages, and this was all mishmash between the application server, the database, and a web front-end forwarding and a proxy. So the types of applications you were building, weren’t the same as the big microservices applications of today where you’re able to segment the problem in different places, able to orchestrate between the different services easily, and you’re able to add horizontal scalability, and components with an ease that you just couldn’t before. You still had the constraints of the physical infrastructure. So I think even though we didn’t move fast enough from the development side to take advantage of the software-defined infrastructure, there were a lot of benefits that wove their way into the applications as they came online.
Jon [00:11:46] Yeah, I think what’s interesting is that the milestone movements in the market and the industry came from… I mean, this is theoretical to a point, or kind of educated as well from reading, but I’m sure there’s somewhere in between. But it’s for an engineering business like Amazon to take a problem. They took the problem and found an engineering solution to the problem, which is how to present infrastructure as APIs. How do I present these things to be more like a commodity to teams to be able to be enabled versus process and methodologies, which is another way to solve the problems? It’s like, well, we’ve got this issue with development teams not being effective or fast enough or being detached from infrastructure. So we’ll come up with methodologies and principles over it versus solving from an engineering perspective. The same with Google, solving the scale from an engineering perspective, containers, how do we move faster than VMs? And so I guess different organizations saw the problem and took a different approach to the same problem that maybe some other people did, and they yielded different values to kind of move the industry forward along with the principles. It’s kind of been quite interesting to see business constraints at scale with engineering talent and then what they kind of came up with to solve the problems that then kind of moved the whole market along really because it kind of became revolutionary, didn’t it?
Chris [00:13:04] So, I mean, there are a few things in what you said, some of which I agree with, some of which I don’t. So, I mean, the fact that Amazon did add APIs to infrastructure, but I don’t think that’s what changed the game from a cloud perspective. I think what Amazon did was shift the procurement cycle. So before, scaling, it was a financial problem. And what those APIs did was make that a technical problem, you know, system administrator or other. The processes didn’t go away, they just changed. So the early part of cloud consultancy was all about, let’s set up a landing zone, let’s introduce cloud governance, right? These were major themes, especially early on in cloud adoption. And the people that got this wrong, it went dramatically wrong for them. Things got out of control, they didn’t even know what they were spending anymore, right? And then we got fin-offs started to come into place. We had to worry about how we were spending. And this just led to, you know, it was the same governance in different ways. Ultimately, see the accountants, they made their way back into the area anyway. It just took them a bit longer to realize that the real revolution there was that the technology departments now can procure what they needed when they needed it to make the application run as best they saw fit and to evolve what they needed over time. And this has a whole series of challenges in the fin-op space how do I financially plan for that flexibility? And I think that’s a problem that today no one’s solved yet. I went around reinventing when I was there last year, and I realized that from the year before, the number of companies attempting to solve fin-ops had multiplied as opposed to consolidating. And this told me that it’s a problem, it’s out there. People haven’t found a solution to that yet, because the moment in which you start to see maturity in that space, you’ll start to see a lot of reconciliation businesses will start to fold one into the other and you’ll start to see a few main players emerge there.
Jon [00:14:52] Do you think that to pick on that, the reason it’s not solved is because of like, you have to bear with me on this train of thought because I need to articulate it in a way that makes sense, but do you think it’s to do with like the accountability, because who owns the budget is very different to who’s controlling the infrastructure. And the way the infrastructure is being led is through applications and then maybe deterministic scale, right? So it just depends on how you’ve managed to deliver and what the parameters are in the cloud. Hence, also a problem, because it’s not one-dimensional, right? So it’s multifaceted on how and what you’ve done in the cloud will then predicate the cost in the end. And do you think that’s kind of the reason it isn’t solved is because all the different roles involved are conduits to the cost? And therefore it’s quite hard then to work out not just the prediction of the cost, but who’s owning control of it in the end fully. Do you think that’s why people are struggling to solve it?
Chris [00:15:51] I agree with the complexity point. So managing the cost of anything in the cloud is quite complex. There are on Amazon, I think there is something in the region of SKUs and combinations of things that you can buy. So just priceless complexity is absurd. Now, most big businesses can’t or won’t just use one cloud provider, they’ll use many cloud providers. So that complexity is multiplied many times over. In addition, engineers aren’t good at budgeting. They’re not accountants. They’re there to solve engineering problems. And there’s a combination with a bit of the generational shift. So with the cloud, we got used to as consumers having things immediately that don’t break. I don’t know if you’ve ever experienced life pre-cloud and how the internet was back then, but that was the land of Error 500. So every once in a while, you had capacity problems, you had web server errors because you have a finite scale. Now in the cloud, you don’t see that anymore. You’ve got the inverse problem. You have spikes in people’s budgets. And of course, the problem is that trust is very difficult to build and it takes moments to destroy. And, if we think about the future generation, the millennials, Gen Zs and Gen Alpha, they don’t put up with the classic BS my generation would have put up with. If something breaks, they lose confidence immediately. It’s very difficult to build that back. It’s very difficult to acquire them as a customer in the first place. They’re used to getting things for free. They have generations of instant gratification, the digital natives. So as a business, even if you were a harsh CFO, you do a lot of damage to your business if you started to say, beyond this point, we’re not spending. Because your entire business would collapse. Your customers would just disappear overnight. You can’t do that anymore. And all businesses today are run or have a front door or have an interaction with technology. And this was, of course, exasperated during the pandemic. We saw exactly how much technology could do and how far we were linked into it across all Elons of basically everything. We even started connecting to the internet to order a pint of beer when they opened up the pubs. So this stuff had to work. It was super important. I think the highlight of this is there were, going into Christmas, there’s always a few wobbles on the payment system when you go and pay with your credit card. And the amount of anger, bad media, and attention that type of wobble sends through the ecosystem is incredible. It’s an example of our dependence on technology. So they’ve got this interesting challenge of trying to balance the books whilst not being able to take away the service. And this necessarily means that the finance teams need to be more technically savvy. And that takes an evolution because the people sitting in those departments weren’t technologists native. And so now you’ve got an even more difficult transition before it was, how do we bring developers and system administrators closer together? And those people, at least, were both talking tech. Now you’ve got an even bigger challenge, which is, they’re the techie guys. And how do you bring them closer to the people that were managing dimensions in SAP? It’s a bigger jump. And that’s why you haven’t quite found any business that has solved that completely. I mean, think about PwC. Even we have a product called Cloud Cost Assurance that attempts to solve this problem. But even as a traditional accounting firm with technology people in it, even we haven’t solved that problem entirely. We’re trying. We acknowledge it’s a problem. We acknowledge we may be able to play in space. But does it mean that we’ve solved the problem entirely?
Jon [00:19:25] It’s difficult because if you’re not, all of the conflicting elements around things and what you’re saying, which kind of makes a lot of sense, to be honest, but being agile, not planning everything up front, trying to build as you go, trying to iterate as you go, makes it unpredictable in some ways by design. You’re supposed to design on the go to find the right outcome in the end, which in itself means you don’t know what it’s going to be at the end. So then to then predict something, not just from a consumption perspective, but even from an application perspective, in how it’s even going to look at the end. When you’re told that waterfall if you were to do those things, and therefore you’d never get out the gate and you’d be left behind, because it takes so long to build it, and prove it’s not a good way to work. So it’s kind of conflicted in some ways. The models feed each other. On one side, it’s designed for, yeah, just use what you want. And as much as you use, you’ll just pay for it. At the same time, you’re like, I don’t know how much we’re going to use because we don’t know what we’re going to build. And then some of the people are at the top saying, how much is it going to cost? Because we need to budget for this. And no one knows. What is funny about all of that, though, is that I don’t hear, out of all of the cloud tools and conversations, optimization tooling, as in things that make it, that optimize the service you’re building to be cheaper. It’s all about how you manage the cost of the cloud if you see what I mean.
Chris [00:20:54] It’s cultural, right? So there is a tool out there. And I think it looks the right way. So re-invent, not this year, but in the year past, Amazon released a framework for environmentally-friendly application design. So they started thinking about the carbon footprint of applications. Now, in the cloud, because computing is electricity and electricity is sometimes powered by carbon when not by renewables, in reality, the efficiency of what you build is directly proportional to the amount of emissions that you’re going to output, minus the location. Because some locations, of course, are greener than others. But the thing is, that’s something that the generation that’s building the applications of today care about. I think you’d be hard-pressed to walk into a development department and find engineers passionate about meeting the CFO’s budget. I’m not sure you’d find that. And then you have to consider also the outsourcing phenomenon. So I’ve never seen requirements go into an engineering house that says, build to this cost spec, minus the fact that they say, build to this time frame so that I only pay you this much. They’re not thinking about the ongoing operational costs. That’s not inside the NFRs. It’s not something that is dictated in the user stories. It’s not something that’s tested against. And so naturally, the engineers aren’t going to care about it. And then culturally, it’s just something that engineers don’t care about because they were used to building on infrastructure, which was already there basically. And the motivation behind optimizing wasn’t to save money. It’s because they couldn’t get the tin fast enough. And so if the application or the thing or the needs evolve, they would optimize the application, make it fit in the same box. And this is where you get things like application performance management. You get all the profilers and all that kind of stuff. So you can figure out, what are the hot spots in the application? What is the bare minimum I need to do to fit this same thing in the box with a higher load? No one designs for that upfront. Now, fast forward to this generation, think about this, am I designing an application that has an enormous carbon footprint? Let’s take something big which has a nice carbon footprint. Say you’re writing an AI ML, a data pipeline in Python, and you’re running this on something like Databricks. Python is not the most efficient language. It’s all interpreted. And that’s going to have an enormously higher footprint than, say, the same thing written in Rust on something like a Pachiero. So you can make different design decisions, which make the whole thing more efficient just by the nature of the computer that you’re running. But they’re more difficult. So why should you do it? Because they’re going to compromise your timeline. So how do you communicate that from a business standpoint? There is a disconnect in the world of how we build software, which doesn’t lead the US to integrate financial metrics. And then, again, the problem that the finance people have today is on applications that are already built. And because of a lot of the efficiency, when you think about performance efficiency, the greatest thing you can do to improve performance is actually to change the way the applications are built. That today is enormously expensive. And therefore, there is a cost of running it, yes. But the cost of changing it may be prohibited. So not always can they intervene. It’s a bit too late. It’s kind of like the train has left the station to a certain extent. There’s a bit of a chicken and egg problem around this, which is difficult equally. Let’s go back to the infrastructure side, DevOps. The infrastructure is code bits, which are still separate for the majority of applications. And then you have the release cycle of the cloud providers every six months. The latest greatest thing comes out of all three of them. How many stacks take advantage of the new stuff? How does it fit? That’s not even contemplated.
Jon [00:24:24] That’s interesting. So what would you then, so I guess if you’re saying that the financial teams are not involved, I guess where finance meets engineering, what you’re kind of saying is like financial impact to your engineering, which probably does correlate to a point pretty much to the emissions too. So I guess they’re like going hand in hand. The more you’re consuming, the more emissions probably. I don’t know if we are spending any money. So they’re a little bit interlinked. But then are you saying then that people, that teams should be designed for the cost to condition upfront, like decisions you’re making technically, and that’s probably the right process around it? Or are you saying maybe it’s a technological problem? and technology needs to solve the technology problem. Both, I don’t know. What’s your view?
Chris [00:25:11] Well, I think there’s an, there needs to be taken into consideration. So you need to create a culture and engineering of thinking about not so much the cost, but how efficient it is. Efficiency needs to become cool again. Efficiencies currently, I mean, maybe it’s more cool today than it was five years ago, but it’s not as cool. It was as it was when I was building software and see on 386s at a certain point. One of the experiences I had in my life, I had the experience of looking at a performance engineering team their job was to take lines of assembly code out of a post-compiled application and see, whether they were paid or given bonuses based on how many lines of assembly, how many instructions they could pull out. Now that’s something that made sense when you were running server applications on 386s because of the wise, the application just didn’t work and there was no other solution, right? You were too far down on Moore’s Law and hadn’t yet solved the problem for the US. So you had that challenge and there was a culture of doing at a time and it was incentivized to do that and it was more because the wiser application was perceived as unreliable. No engineer wants to build unreliable things. That’s just not culturally acceptable in engineering. You want to solve the problem in the best way you possibly can. Now fast forward to today, especially in the cloud world, reliability, that’s no problem at all, as long as your wallet can scale, and be as soaked in the application. And the wallet’s not sitting in the pocket of the engineers.
Jon [00:26:37] True. But then you can’t, I guess, is this then around cost consciousness? I guess there’s obviously like, cause some of these things that you say like cultural, and it becomes behavioral and you’ve kind of touched on like generational elements on like how people are perceiving services that they’re consuming as well, generationally where everything becomes more accessible. So, therefore, you normalize, like your expectations become normalized to some degree because it’s all you’ve known. So therefore you just expect that it to be the thing you’ve always known, if not better. But at the same time, then, if developers have not been made to be conscious of costs at all, and then also not rewarded for being cost-conscious, then no tools, I guess, unless the tool was to coerce the behavior. Because I kind of imagine if teams knew how much things were costing, they’d probably be quite surprised. You know, if they could see like upfront, this is going to cost X or this costs you Y. And they were getting, they were likable to see really from service to service perspective easily without any integration work to do just part of their development life cycle. I reckon they would probably do the right thing.
Chris [00:27:46] Well, it depends. So everybody’s motivated by different things. I think it depends. So I think people do care, but when you’re in a startup, so when your actions have an impact. So if you’re in a startup and you design something bad, and I’m an example of, at a certain point, there was a customer that told me that they forgot to turn something off that cost them grand over a night. In a startup, that could be terminal in terms of an error. So clearly, because the teams work tighter together, they’re all vested in the success of the business. Now, fast forward that to a very much larger business. And actually, it’s about the KPIs that have been set out to you by the system. And no engineer is gonna have KPIs on the code that he wrote, how much that’s costing in production. Because A, it’s difficult to measure, and B, there would be an impact in terms of time to market. Because the other thing you have to balance out is that newer generations want things faster. So let’s go back to the granddaddies of this, the Spotify guys, right? Wizards in, let’s roll out the new feature and see if it works quickly. Daily release cycles are perfect. But designing things for performance needs time. And if you’re applying all the time, you’re gonna diminish that time to market, which may reduce your competitiveness. So there are trade-offs to be had. And I think what businesses are starting to learn is that there’s a cost for some of these things that they’re bringing to market. It’s not just about method and technology. And you need to be able to try and manage those costs. This is why I’m a big fan of cyber reliability engineering because that encourages optimization all the time. So you’re looking at, this is something in production, you have an entire team of people which are laser-focused on making sure it never goes down, it’s more efficient, it’s more cost-effective, et cetera. Those guys have metrics for something which is running, which they have the observability data, and then they can change the application to make it better. And that’s their mandate, they’re measured on that. Unfortunately, you rarely see cyber reliability engineering done well in any business.
Jon [00:29:46] That’s true. I still think though, if there was a tool. I’m still a believer that if the information’s present and it presents itself in the right ways, then those KPIs, even in a large org, if there was a way to see the cost of the services you’re iterating it, because it was made simple to do, and everybody could kind of see that. And it was like, it wasn’t obscured, right? Or obfuscated from like, you just can’t tell and no one knows. And then I’m sure there probably would be KPIs because it’d be much more accessible to make them KPIs. After all, you’d know what it was and where it’s going across everywhere. I just think it’s just because it’s not so visible that it becomes an impossible type of KPI, maybe, because people will probably balk at the fact that like you’re saying, oh well, I was trying to work all this out. It’s gonna slow the US down on delivery. You say we need to now work this out means we’re gonna be slower. Something has to give at some point, where, and maybe you probably will over time, where if it’s effortless to know and it doesn’t take you engineering effort to know, then you might always try and design to be cost-effective anyway, because it’s so visible to everyone, including you. I would have thought, I just don’t think cloud costs attach to application delivery very well because it still attaches to more infrastructure and it doesn’t have any Conway’s law elements to it. It’s just like names that you see tooth right, it’s just like bars on a spreadsheet or account IDs, or it’s not unless you put all the engineering effort into doing a really good job on tagging and everything else and just generally speaking, it’s so divorced from the application delivery in truth and so on.
Chris [00:31:23] But I think it’s also difficult to know. So there, when you’re talking about, when you get into the realm of building the applications themselves, there’s innate complexity in the way that the algorithms are put together. And some of them, so there is such a phenomenon as the kind of sleeper type algorithm. These are algorithms that when run over a reasonable size of data or a sample portion of data perform within their KPIs. But then the calculation of there’s something in an order of magnitude order of algorithm called big O. And one of the things that scared me was I went around and I asked, you know, class and group of engineers read straight out of university, do you guys know what order of algorithms is? Do you know how to calculate the big O? And they said the big what? And the problem with that is that the cardinality of algorithms, and this is both true for data and their efficiency is what drives that cost metric. I once had an application that I had to debug, which had an unbroken loop. And what this caused under normal conditions, few hundred records, no problem at all. All of a sudden we had a client use case where they had over a million records in there and it was taking minutes to run the same thing. And everybody’s asked themselves, why is that the case? Why is that the case? It was missing a statement in a line of code. Those things are, it’s very difficult to see what happens. I mean, the best prediction engine in the world is not gonna look into the algorithms. It’s going to look at what stack are you deployed on and try and make some predictions on if you’re using that stack and it has that shape and on average, this is the number of transactions you can flow through, then yeah, we think it’s gonna do that. But actually, the order of algorithms can be exponents and they can be exponents of two and whatnot. And those can dramatically reduce your performance and increase your costs. It’s a much more complex problem to solve in terms of how I know what that application is gonna cost me when I deploy it. It’s not easily resolved.
Jon [00:33:14] Yeah, that’s fair. I suppose there are too many permutations. I still think though there is an opportunity to at a bare minimum, organize the information, and the data in the right ways because at least you can forecast it. If it’s in a specific environment and it cost you X and you’re gonna be moving that across multiple environments, there is a very easy calculation on the overall cost and the projections of what you do. I mean, forecastable without like, obviously the damage is done in one environment already incurred a cost on whatever it is you’ve engineered, but it’s this magnitude of which you’re gonna scale out, I suppose at a bare minimum could be reduced.
Chris [00:33:48] And so you’re looking at the observability. And this goes back to what you would expect on a proper SRE team, as I was saying before, right? Is that if you have a team in place that is looking at, in production, how is that performing against the actual data that you have? What are the trends of that? And zeroing in on where you can make efficiencies laser-focused on making that better, give them KPIs around cost performance and carbon, which is more likely to yield reliable results than trying to do things at deployment time.
Jon [00:34:18] And what do you think all that means in platform engineering, which is yet another overlay to some description, right? So you kind of have the platform engineering principles of reducing the accessibility of the cloud into like a set of commoditized things, right? Workflows and all these other stuff to make it even easier to go even faster, which then makes it even harder maybe then in the grand schemes to then work out.
Chris [00:34:43] I suppose it depends on how you design things. We can go back to the operating system wars, right? There’s a bit of a lesson to be learned on platforms and composition from that. So you had windows was largely a monolithic type design platform. And then you had Unix of course, was built on small components and composable. The Unix paradigm is a kind of onion kernel paradigm that they use in the ring of applications that then are used to make the operating system hang together. What you find is focus provides robustness, scalability, and efficiency. Now the problem is, is more the way that businesses tend to build these sorts of things. So I have multiple conversations with businesses that run programs or projects. And if you have a program, maybe even outsourced, say to somebody like the US or any of the other SIs, and you say, build me a platform that does Kubernetes, we’re going to build you a platform that does Kubernetes to your requirements for today. But guess what? Kubernetes will continue to evolve. And the requirements that you gave me may not have been entirely accurate, or you may decide that different sorts of things happen when you start to deploy things on there, and you realize, well, you forgot something. So how do you maintain the platform? There’s a question and I always think that it’s better. There’s a certain point, somebody, I’m not, can’t name because I don’t know how many people said it, but somebody came out and said, you know, every company is a software company. If that were true, then every piece of software released by a company would be a product and not a project. And therefore when businesses operate with organizations like the US, they wouldn’t say, PwC, go out and build me Kubernetes. I can do that for you. Love to do it. Brilliant. But you need to be bought into it. So you need a team internally, which is going to work with the US, build that platform, and then take over that platform, evolve that platform. You should be using organizations like the US for Burst. So when you need something velocity above the baseline, then we come in and we can kind of help you with that. You need knowledge that you don’t have, come in and give you that knowledge that you don’t have, but you need to take that away, bring that into your organization, make that part of what makes you unique, build the product, and then you’re successful even in the platform methodology, build the platform as a program. And you won’t eventually you’ll end up with a cascade of inefficiency. You’ll have no means to untangle it.
Jon [00:36:59] And what’s your opinion on like platforms that are not necessarily, cause there’s all this like movement on like platform as a product? And I think depending on your size and scale as a business, people probably deem lots of things like platform such an ambiguous term anyway, right? So people would be like, well, I’ve got CI and a few scripts and some Terraformers. I mean, is it a platform? I’m not sure, but I mean, it’s like you’ve done something. That’s a hundred percent sure whether that’s classified as a platform or not. I don’t know. And then I guess what’s your opinion on why, like how businesses should be thinking about platforms in general to support the business because scripts and glue and duct tape might be fine if it’s the scale small enough as in like you can get by integrating a few things loosely or like even tightly. But then the minute you try and roll out that scale across loads of different teams, it’s not necessarily thought about from a workflow perspective or the end user in mass because you didn’t need to. And then, I don’t know, just talking around it, but would you say what’s your perception of platforms and then products as a platform and when you should and shouldn’t be adopting those things?
Chris [00:38:05] I mean, in general, software design was most successful when it was based on components. Platforms are compositions of components that offer services to make developing software easier. You should Ideally, when you’re building software that solves a business problem, write as little code in the plumbing and as much code solving the business problem, boilerplate to business logic ratio, right? If that’s high, your software has high value. If it’s low, your software has low value and you’re likely to have a lot of repeatability within that, which again, the economics of software, the more code you have to maintain, the more difficult and more expensive that’s going to be over the lifetime. So platforms in general are a good idea because they embody this component-based design principle. They’re giving you a lot of the plumbing for free. That’s all standardized and it can be focused on. All right. Now, equally, they’re also difficult to build because they’re solving horizontal problems, not vertical problems. And so they tend to be much more expensive to build and maintain because the engineering effort and hence the quality of engineering and the operating model, delivery model that you need to put in place around building something, maintaining something like that is going to be very different to building a verticalized, just simply because of the complexity, the platform engineer doesn’t know how the platform is going to be used.
Jon [00:39:23] True. That’s very well articulated. I think I agree with that, but I think there is a perception of market manipulation to some degree on, you know, I’d say it was market manipulation. That’s probably a bit harsh but as an influence of the market on what it means and then perception of what it means. Do you see what I mean? Like people will perceive what they’re doing to be a, to maybe be actually like horizontally delivered, like you’re saying, but actually with the way they’ve done it, it kind of wasn’t because they don’t know the difference of whether what they’ve done was vertically solved or horizontally solved and whether they have driven component thinking and like good architectural design principles, you know, they might not understand whether they are or aren’t doing what is good platform design thinking in what they’re delivering. And therefore a shoot, they built a platform. So I’m like, there you go. Yeah.
Chris [00:40:12] If we take the pure definition of a platform, the moment in which you have more than one use case running on a single underpinning piece of software or a set of software components, you have a platform. Whether that platform is good, scalable, and meets both the requirements in the most efficient way then goes down to the design and the thought that you’ve put into it. The take is, is if you’ve designed a piece of software, which fits a component and it was vertically designed for time to market. And most vertical designs are, you’re probably not going to get the best result when you reuse that on an entirely different use case, which sort of kind of fits kind of the square peg round hole type of situation. Yeah. Hammer hard. It’ll fit. It’s not necessarily going to give you the real, the right result. This doesn’t mean that there aren’t things within the verticalized solution that can be adapted, thought of, extrapolated, turned into a platform, and then repurposed against both use cases. So there is a way in which you can evolve things from vertical to horizontal. They still require the same, if not more engineering effort to do so. The fact of the definition of the platform, then you have the barrier between technologists and business owners, right? For the business, the definition of the acceleration and the time to market will dictate the definition of the platform. And by that definition, what I’ve described before, even though it is a square peg round hole is still faster time to market. So they are leveraging a platform in a business sense. It’s not necessarily engineering Nirvana.
Jon [00:41:38] Yeah. I guess it’s worth perception of the value, the attachment to value. Cause usually platforms in my, I mean, you can correct me if I’m wrong, but I don’t think I’ve ever seen many businesses correlate revenue targets back into engineering functions at a platform level. You might see things trickle down. If you’re part of a product team, you know, and you might have a P&L, you might have a revenue target as a whole. What you then don’t see is the attachment may be from a business objective downstream of like, did you correlate yourselves to that revenue target, you know, cross-functionally or even in that vertical that you’re operating within to say, yeah, we did all this and it accelerated the revenue number because that’s what matters to the business. It’s very rare you ever see that. I think it’s very hard then to know whether what you’re doing is bringing true business, it becomes subjective or technically subjective on, you know, what you’ve done. I think that’s where the problem lies. Was it value for money or not? I guess for the business side.
Chris [00:42:39] I think it depends on the, I mean, I’ve held the role of technical manager as well. And I’ve always found it quite easy to articulate the value of engineering and business terms. The key drivers are normally productivity, time to market, and operating costs. For example, if you’re using common components and designs, those things are easier to maintain and operate. They require fewer people. They have fewer errors. They’re easier to fix. They tend to be fixed across a broader spectrum and the drivers, the calculations around, I’ve reduced the opportunity cost, but coming to market faster. That’s undeniable. The size of the engineering team and the time necessary to deliver the output is again, very easily measurable. So the benefits of say the platform, which is providing that acceleration are quite easy to calculate. They’re not, it’s not rocket science. Let’s put it that way. Is it articulated in business normally? Not so much, but equally there is very poor articulation on the requirement side. So think about the functional requirements and the gathering of those. You tend to have a product owner, which has a backlog of functional requirements. The business owner asking for those requirements would rarely peg a growth target to his KPI or his bonus saying, you give me that. I’m going to make this amount of money. And so if you don’t have that cascade as an engineering manager, maybe it was where I see you like, yeah, you asked for that by building it from scratch. It would have taken me, X, instead of taking me Y, which is much smaller. So clearly there’s a value there and I can articulate that further up the chain. It’s the business that needs to evolve itself in terms of why are you asking for that and what does it mean for the business in terms of value? Then that’s value up to the chain. It’s value to customers.
Jon [00:44:21] Think from my observation, one of the biggest problems I see is that disconnect from technology to business still, which even after all the evolution of tech and the evolution of cloud and the things we were talking about, I knew got spoken about it as well just now and before around like even the FinOps aspect and the financial aspects and all of those things are just recognitions of a detachment, I think from how technology is kind of meeting the business all the time and it’s kind of always slightly out of sync somehow and the operating.
Chris [00:44:54] Yeah, I think it’s, I mean, for me, one of the big trends was, and what was it years ago, maybe years ago, you had banks for me, our technology companies in denial. They’re the ones that say, we’re not technology companies, we’re banks. And I was like, but actually, you just deliver technology products. But they insisted they’re banks. And so what they did many, many years ago is they outsourced all of their technology abroad to the lowest bidder, right? That led to enormous amounts of technical debt, which we’re experiencing today, atrophy in the products, you know, high operating costs, high cost to serve, a series of things which plagued them. And now you’re starting to see a trend more towards insourcing again, because they said, well, maybe we are a bit more technological. Maybe we should do that. Unfortunately, the engineers have fallen a bit out of love with that. And they pay a bit of syntax to get them back in. So that mistake is quite costly. And it goes back to that point, right? If businesses acknowledged that they are software businesses, then you wouldn’t have this disconnect. Because if you were a software business, would you hire a COO that didn’t understand anything about technology? You probably wouldn’t. It’s the fact that you accept that, or you are kind of in denial that you’re not a software business, or you can thrive without software in some way, ignore the fact that it exists, and still be a player in the market. That’s becoming less and less true, shifting into the realm of now we’ve got generative AI. Now, you’re just seeing the boundaries of technology push further and further and deeper and deeper into business. So the message there is that businesses just really need to hire people in the business that understand technology. There’s very little escaping that.
Jon [00:46:32] Well, kind of coming to an end, so as well for the Podcast, it’s been fantastic talking to you about all this stuff. If somebody wanted to get hold of you, how does somebody find you, by the way?
Chris [00:46:42] Well, I’m the easiest person in the world to find on LinkedIn through my profile and look up my name, Christopher Stura, PwC, or I’m a consultant. So I’m a gun for hire. I mean, you might have anybody who wants to have a conversation, they can get in touch and have as much my time as they like.
Jon [00:46:56] Perfect. Anyway, it’s been an absolute pleasure speaking to you, and thanks for all the insights as well, so that’s been great.