35,99 €
As technology evolves, IT talent shortages and system complexity make it essential to have structured guidance for building scalable, user-focused platforms. This book provides platform engineers and architects with practical strategies to develop internal development platforms that enhance software delivery and operations.
You’ll learn how to identify end users, understand their needs, and define platform goals with a focus on self-service solutions for cloud-native environments. Using real-world examples, the book demonstrates how to build platforms within and for the cloud, leveraging Kubernetes. It also explores the benefits of a product-centric approach to platform engineering, emphasizing early end-user involvement and flexible design principles that adapt to future requirements.
Additionally, the book covers techniques for maintaining a sustainable platform while minimizing technical debt. By the end, you’ll have the knowledge to design, define, and implement platform capabilities that align with your organization’s goals.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 639
Veröffentlichungsjahr: 2024
Platform Engineering for Architects
Crafting modern platforms as a product
Max Körbächer
Andreas Grabner
Hilliary Lipsig
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
The authors acknowledge the use of cutting-edge AI spell checkers, such as Grammarly, with the sole aim of enhancing the language and clarity of the book, thereby ensuring a smooth reading experience for readers. It's important to note that the content itself has been crafted by the author and edited by a professional publishing team.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Preet Ahuja
Publishing Product Manager: Suwarna Patil
Book Project Manager: Uma Devi
Senior Editor: Adrija Mitra
Technical Editor: Nithik Cheruvakodan
Copy Editor: Safis Editing
Proofreader: Adrija Mitra
Indexer: Tejal Soni
Production Designer: Alishon Mendonca
Senior DevRel Marketing Executive: Rohan Dobhal
First published: October 2024
Production reference: 1270924
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK
ISBN 978-1-83620-359-9
www.packtpub.com
To my wife, best friend, partner in crime, and North Star, Lesya, for being my loving partner and biggest supporter, who always has my back and calls my creativity.
- Max Körbächer
To my parents, who allowed and supported me to follow a different career path than they initially had in mind. Thanks for being the best role models one could wish for.
- Andreas Grabner
To my husband, Scott, without whom I’d never have become an engineer. To my parents, Annette and Calvin, who have loved me and believed in me – your encouragement guides me. To my parents-in-law, Joe and Mary Ann, who have been there every step, and to my kids, who support my dreams and still think I’m fun to hang around. Thank you for everything, all of you.
- Hilliary Lipsig
Max Körbächer is a technology advisor and platform architect who focuses on utilizing cloud-native technologies and open source to simplify the challenges of complex systems. He is the founder and managing director of Liquid Reply, a cloud-native engineering and consulting company. His work history includes roles as an enterprise architect in the media and power utility industry and as a demand manager planning medium and large IT projects. Max is also the founder and, currently, co-chair of the CNCF Environmental Sustainability Technical Advisory Group, CNCF Ambassador, Linux Foundation Europe Advisory Board member, and initiator and organizer of the Kubernetes Community Days Munich and Ukraine.
I want to thank my family – especially my wife, Lesya; my best writing supporter, Felix, who stayed for all the late-night writing sessions; and all the open source community people for discussing ideas, solutions, and the future of cloud native.
Andreas Grabner is a technical advocate for making distributed systems observable and making automated data-driven decisions across the software development lifecycle. In his capacity as a CNCF ambassador and a DevRel at Dynatrace, he connects and educates global software engineering communities on building and continuously validating digital services for resiliency, high availability, and security.
Since his early days, he has been passionate about software quality and performance engineering as it results in building excellent digital products. Andi uses his advocacy platforms to share best practices on topics such as observability, progressive delivery, DevOps, site reliability engineering, platform engineering, and digital business operations!
I want to thank my wife, Gabi, for being supportive throughout the whole process, especially while writing during our vacation. Also, a big thanks to my employer and all the global tech communities that supported me along the way!
Hilliary Lipsig is an autodidact and start-up veteran who has frequently learned and applied technologies to get a job done. She’s had her hand in every part of the application delivery process, honing her skills originally as a quality engineer. Hilliary is an IT polyglot, able to talk the lingo of both the Operations and Development teams. She’s currently a Principal Site Reliability Engineer at Red Hat Inc., working on Kubernetes-based platforms. She’s passionate about GitOps, continuous integration, scalable processes, consistency in tooling, and good developer documentation. Her open source activities include contributions to the CNCF Glossary and she’s a member of the Code of Conduct Committee for Kubernetes.
A big thank you to my family, friends, and colleagues who supported me, let me spitball ideas at them, and even volunteered to proofread for me. Your belief in me helped me believe in me.
Peter Portante has been a Red Hat, Inc. employee since late 2011, working as a senior principal software engineer in the Performance and Scale team for the first 12 years and as an Enterprise Account Solution Architect since the spring of 2023. Prior to Red Hat, he worked at HP/Compaq/Digital on HP ePrintCenter, Tabblo, TruClusters, gWLM, POSIX Threads (two-level scheduling in Tru64 Unix and OpenVMS), and the OpenVMS Print Symbiont.
Fábio Falcão has been a technology professional for over 20 years, with solid training as a SysAdmin and in-depth knowledge of DevOps methodologies and tools. Having graduated in computer science and being passionate about technology and how it helps improve people’s lives, he decided to pursue this career as a child and, today, really loves what he does, helping teams achieve their goals. From Brazil and living in Portugal since 2022, he joined IBM to work with the data lineage team, in addition to supporting several other products linked to artificial intelligence.
I would first like to thank my parents (in memoriam) for all their dedication and encouragement throughout my life. I would also like to thank my wife and children, who support me when I don’t have enough time for them, and my friends, who always help me grow professionally and personally with their advice and chats.
Lukasz Bielinski is a seasoned senior DevOps leader with 15 years of experience in IT, specializing in containerization and digital transformation for large-scale systems across banking, telecommunications, pharmaceuticals, and software industries. He has successfully led and executed numerous Kubernetes projects, significantly enhancing operational efficiency and system reliability. Lukasz co-founded and formalized a start-up where he played a crucial role in introducing Kubernetes and OpenShift into organizations, driving successful digital transformations. His technical acumen spans a wide range of DevOps tools and practices, making him a trusted advisor and leader in the field.
I would like to express my gratitude to my family for their patience and support as I dedicated time to reviewing this book. Special thanks to the authors for entrusting me with this important task and to my colleagues for their ongoing encouragement. I’m grateful to be part of a community that values continuous learning and collaboration.
Thomas Schuetz is a cloud native architect with a keen interest in cloud-native application delivery. He teaches at an Austrian University of Applied Sciences, focusing on cloud-native technologies, where he shares his industry insights and practical experiences with students. Involved in the cloud-native community as a CNCF Ambassador, Thomas is enthusiastic about open source projects, contributing as a Keptn GC member. His approach to software delivery and troubleshooting combines practical know-how with a passion for education, aiming to make cloud-native technologies accessible to all.
I am deeply honored to have been invited to review this book by thought leaders in the cloud-native community. Contributing to this work has been both satisfying and rewarding, and I thoroughly enjoyed reading it. I would also like to express my gratitude to the entire cloud-native community for their continuous innovation and for making advancements such as platform engineering possible.
Biswajyoti Chowdhury has over 13 years of experience in cloud-native platform engineering, has led the development of multiple internal development platform initiatives, and has successfully migrated several large enterprises to the cloud and established their development platforms. He has worked with prominent technology consulting firms such as Accenture, Wipro, and Cognizant. Currently, he serves as an architect at Financial Software & Systems, where he is focused on building a payments-centric development platform. Biswajyoti is passionate about open source, cloud-native infrastructure, and tools that enhance developer productivity. Biswajyoti holds industry certifications such as GCP PCA, ACE, and is a Red Hat Accredited Professional in Cloud-Native development.
I would like to thank my family, my mentor, and my colleagues, who have put up with me and shaped me into who I am today.
Hey, welcome to Platform Engineering for Architects! Platform engineering is the practice of creating environments that can build, test, validate, deploy, and operate software in a secure and cost-efficient way. Platform engineering is about automation, and about enabling the platform’s users, developers, and operations to focus on value creation. A platform, often defined as an Internal Development Platform or Internal Developer Portal (IDP), abstracts away the complexity of the underlying infrastructure and all the moving components required to support the software life cycle from its setup until it goes live in production. But platform engineering is more than just technologies that have to play well together. It requires an open mindset and a holistic approach to define the purpose of the platform, along with following principles in the decision-making process and fostering a culture of change and innovation.
In Platform Engineering for Architects, we will engage you in building up a product mindset to build a solution that ages and matures with time but stays young at heart. Step by step, we will create our strategic direction and define our target architecture for the platform. This will become a living artifact for you and everyone working with you. Throughout the book, we will cover the four different pillars of a platform and the relevant decisions you have to take: the infrastructure representation by Kubernetes, the automation, the self-service capabilities, and the built-in observability and security. By the end, you will be equipped with tools to handle costs including actual infrastructure costs and technical debts. If both are managed well, they can even become your best allies in overcoming organizational obstacles and politics.
For every aspect and topic we write about in this book, we also provide additional sources that cover full technical details. Our goal is to provide you with a framework of references and an approach for defining platform architectures that can mature over time without being tied to a specific technology or version. We are well aware that the only constant in life is change, and this holds true for platform engineering. That’s why we also encourage you as a platform engineer and architect to keep up to date with those changes and bring the product mindset to your users and stakeholders.
The book is for platform engineers and architects, DevOps engineers, and cloud architects who want to transform their way of implementing cloud-native platforms to use a platform as a product.
This book is also for IT leaders, decision-makers, and IT strategists who are searching for new approaches to improve their systems landscape and software delivery, covered by a holistic approach that goes beyond the simple “you built it, you run it.”
Chapter 1, Platform Engineering and the Art of Crafting Platforms, provides an introduction to platforms and IDPs. It covers the relevance of a product mindset and the ambition to build a system desired by users.
Chapter 2, Understanding Platform Architecture to Build Platforms as a Product, will guide you through all the relevant groundwork and approaches to creating your platform architecture. You will discover the value of the platform as a product, the first implementation of the thinnest viable platform, and how to observe and measure its success and adoption.
Chapter 3, Building the Foundation for Supporting Platform Capabilities, will walk you through the mandatory steps and processes of defining a solid foundation of a platform that can grow from an initial set of features toward key enterprise-supporting platform capabilities.
Chapter 4, Architecting the Platform Core – Kubernetes as a Unified Layer, provides insights into what makes Kubernetes the preferred platform for platform engineers. You will learn about the core integrations and relevant decisions we have to make before we can focus on extra enhancements.
Chapter 5, Integration, Delivery, and Deployment – Automation is Ubiquitous, provides you with a stable understanding of the complexity around building, deploying, testing, validating, securing, operating, releasing, and scaling software and how we can centralize and automate this experience with self-service capabilities.
Chapter 6, Build for Developers and Their Self-Service, reviews concepts around IDP integrations and shares best practices for building resilient, flexible, and user-oriented platforms.
Chapter 7, Building Secure and Compliant Products, elaborates on security standards frameworks, and trends; how to leverage the software bill of materials; and defining the right actions to secure the platform without limiting your capabilities. Furthermore, we show you how to ensure the app delivery process will provide hardened and secure software/container packages and how to use policy engine technologies.
Chapter 8, Cost Management and Best Practices, explains the concept of cost-increasing elements of a platform and how to optimize those costs. You will learn about tagging strategies, general cost optimization scenarios, how to use observability to identify optimization potential, and best practices to put them into practice.
Chapter 9, Choosing Technical Debt to Unbreak Platforms, provides you with tools, frameworks, and methods to actively manage your technical debts. Like with costs, technical debts can grow and will have a negative impact on your platform if untreated.
Chapter 10, Crafting Platform Products for the Future, emphasizes the imperative of change and our role as platform engineers in fostering change in a controlled way, balancing reliability and innovation.
You should understand the basics of cloud computing, Kubernetes, the ideas around platform engineering, and how to define those architecture-wise.
Software covered in the book
Kubernetes
Backstage
CI/CD solutions such as GitHub Actions
Keptn
Argo CD
Crossplane
Prometheus
OpenTelemetry
Harbor
OpenFeature
Renovate Bot
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Platform-Engineering-for-Architects. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter/X handles. Here is an example: “The directory path is nested within the chart, and the use of this directory called crds/ allows Helm to pause while the CRDs are added to a cluster before continuing with the chart execution.”
A block of code is set as follows:
apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: crontabs.stable.example.com spec: group: stable.example.com version: v1 scope: Namespaced names: plural: crontabs singular: crontab kind: CronTab shortNames: - ctAny command-line input or output is written as follows:
$ kubectl label nodes platform-worker2 reserved=reserved node/platform-worker2 labeledBold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “These platform logs should also, as much as possible, be clear of any Personally Identifiable Information (PII).”
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.Share Your Thoughts
Once you’ve read Platform Engineering for Architects, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link belowhttps://packt.link/free-ebook/978-1-83620-359-9
Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directlyIn the first part, you will learn more about the foundation of platform engineering and platform architecture. We will sharpen your understanding of a platform product mindset, showing you that platform engineering is more than just building systems. Then, we will guide you through the process of creating a platform architecture, from defining the purpose of a platform to the thinnest viable platform, as well as how to measure a platform’s success and acceptance. In Chapter 3, we will focus on designing the foundation of a platform to enhance the user experience and avoid technical complexity.
This part has the following chapters:
Chapter 1, Platform Engineering and the Art of Crafting PlatformsChapter 2, Understanding Platform Architecture to Build Platforms as a ProductChapter 3, Building the Foundation for Supporting Platform CapabilitiesIn this first chapter, we will learn how to identify when our organization is in the right state to plan a platform. For this, we will clarify why platforms have become such a relevant topic, how a product mindset fits into this, and what the checkpoints are to find out whether we are ready for a platform or not. We will learn about the platform differences and which platform types are most commonly built.
Next, we will delve into the three core elements of a platform: the pervasive cloud, the developer experience, and the main attributes of a platform. Overall, we will see recurring elements of cloud-native engineering. This leads us to the question of whether we really need yet another abstraction layer. We will also consider whether a platform will help us to overcome the problem of a high cognitive load caused by overengineered complex systems and development processes or just end up being yet another layer. We will reflect on some of those layers to find an answer for ourselves.
Finally, we will go into aspects that go beyond the technology and the implementation of platforms. It is crucial to understand the sociotechnical aspects and put the human, our actual stakeholders, at the center. This allows us to define a better platform product and find approaches for a close collaboration.
In this chapter, we’re going to cover the following main topics:
The demand for platforms as a productImplementing developer- and product-focused solutionsDo we need yet another abstraction layer?Sociotechnical aspectsIn the cloud-native environment, hardly any other topic has built up such a myth in recent years as the term platform and the associated role of the platform engineer. As with the introduction of the first usable CI/CD pipelines, this gold rush led to rapid adaptation, often without sense or reason. Now that we have arrived in the valley of knowledge, we can deal extensively with the question: do you need a platform, and if so, how do you design and implement it to ensure that it lasts into the future?
To answer this question, we should first look at what constitutes such a platform. A platform is the combination of different capabilities that are required to master traditional and cloud-native environments so that it supports the end user in the development, delivery, and operation of an application. Platforms can be an enabler to turn non-cloud native infrastructures into valuable resources. However, most computing platforms today provide some sort of API that can be used to automate the deployment and instrumentation of the available resources and build the foundation of a platform. Platforms provide consistency across any kind of resources for the end users and grant access to its capabilities via a self-service API, templates, CLI, or other solutions. The following example also highlights that a platform is composed of many components:
Figure 1.1: Example of a platform/IDP
We see usually the topic of platform appears in the context of cloud-native, but why is that so? Cloud-native technologies enable organizations to build and run scalable applications in public, private, and hybrid clouds. This approach is best illustrated by functionalities such as containers, standardized service provisioning, immutable infrastructure, and declarative APIs. Such functionalities realize loosely coupled systems that are resilient, manageable, and observable. These enable developers to make frequent changes with minimal effort. In short, a platform is an enabler for cloud-native computing and uses its tooling to instrumentalize it.
The experience of a software engineer on a cloud-native platform differs from developing software natively toward a cloud provider. Building systems focusing on one Cloud Service Provider (CSP) will bind you to the logic of that closed ecosystem. You will surely have a similar effect when you build on cloud-native platforms due to the fact that those are often Kubernetes-centric, utilizing the heavy unification of integrations toward the Kubernetes API. However, the catch is that cloud-native platforms deliver the same experience without you recognizing the underlying infrastructure. As most companies have at least two to three cloud or cloud-like service providers and already have difficulties in adapting those, a cloud-native platform is a game changer [1]. Developing software on a cloud-native platform changes the mindset and architecture. However, without adopting that mindset, the chance of failure is high.
However, there are more aspects to consider for a platform than just unified infrastructure management. Platforms have to be made for a purpose. The common definition of whom platforms are built for, and who the stakeholders for platform engineers are, states that those are exclusively developers. These definitions fall short of mentioning that a whole organization, operational teams, and other specialist teams also benefit from a platform. A platform provides software engineers with a simple access point to build, test, deploy, release, and operate their software. It provides deep insights into the usage and allows the caretaker and administrators to maintain the infrastructure, platform, and integrations fearlessly. To translate this into business terminology, a platform can provide a faster time to market, with more flexibility to change and adjust its components, while keeping reliability and robustness high.
What does this mean for a company now? Due to the shortage of IT professionals on the market, the fast pace of changes in IT, and the overload of training teams for cloud technologies and providers, a platform introduces the right breakpoints for competencies. We need these breakpoints to declutter the trend of putting multiple disciplines into a single role such as DevOps. Also, platform engineers utilizing DevOps methodologies are not DevOps. We actively need to protect this role from repeating the mistakes made with the DevOps role and stay sharp in its definition. Platform engineers integrate experts’ provided capabilities, simplify their usage of those capabilities through their platform for developers, and enable self-service for the engineers. However, no developer will need to become an expert in multiple topics such as security, observability, infrastructure configuration and automation, and so on. This is in contrast to a common picture of DevOps, who need to become experts with anything that is required within their silo for their application to keep them alive. We will need DevOps in the future for the advanced handling of applications, but we must make their lives easier, too.
The platform provides an integration layer for the bottom-up capabilities that require special knowledge such as security, databases, or even the deployment of VMs or bare-metal servers, as well as top-down usage by developers and DevOps. As visualized in the following figure, the platform engineering team is responsible for providing this layer.
Figure 1.2: Capabilities and responsibilities in a platform-driven organization
Of course, this also means that another team of experts must be trained and educated. However, a comparably small team of platform engineers can usually build and run huge environments. A platform ideally reduces the cognitive load for any other team within the company and lets them focus on their core value again by simplifying the machinery around the development process. This platform helps to reduce the stress and improve transparency. Companies from all over the world frequently share their experience with platforms and platform teams, and the typical tenor is on how they solved problems they couldn’t tackle before, or how much this has improved the quality of their products and services.
These platforms are often called Internal Developer Platforms (IDPs) because they are usually built for an enterprise’s internal development team. Throughout this book, we will use the terms platform, IDP, platform product, and cloud-native platform interchangeably. However, we’ll first highlight certain aspects just a little bit more:
Platform: General term for the cross-cutting layer of technology that allows unification of services for developers.IDP: Emphasizing the aspect of developer, Software Development Life Cycle (SDLC), and tools needed to develop softwarePlatform product or platform as a product: Highlighting the dedicated team taking care of the evolution ability and long-term commitment of a platform, as well as establishing a different mindsetCloud-native platform: Focusing on the abstraction and enablement to use standardized APIs and integrationsThat perspective might feel fine-grained, but the term platform itself often leads to more confusion. A cloud platform is also a platform, right? A Software as a Service (SaaS) could also be seen as a platform. Referencing a cloud-native platform or IDP gives the right direction and understanding. Depending on your organization’s maturity, it is therefore also essential to clarify these terms and establish a common understanding, language, and shared knowledge.
To highlight the positive impact a platform can have, we can look at three totally different companies and their results from using IDPs. All of these cases primarily focus on Backstage as the developer portal and entry point for the IDP.
Spotify, as the inventor of Backstage and the mother of the IDP movement, claims that the following is true of their internal Backstage users:
2.3x more active on GitHub2x more code changes2x more deploymentsOnboarding time for new developers dropped from 60 days to 20 daysThe Expedia Group reports different numbers:
It takes four minutes, on average, to create a new component or appOver 4,000 users are using the IDP for at least 20 minutes per dayThe technical documentation is viewed over 50,000 times per monthJust over 15% of the internal developer tools are integrated with Backstage, already reducing context-switchingNow, the last company we should take a look at is Toyota:
Projects ship now weekly artifacts instead of monthly8-12 weeks are saved on overhead efforts per team, resulting in over $5 million in reduced costs or time and budget used for value creationStandardize deployment templates reduce failure and speed up deploymentAll those numbers are interesting to understand in the context of a digital-native company, a travel technology corporation, and one of the biggest car manufacturers. Any of them can show a clear positive effect [2].
Speaking of organization, introducing new solutions is commonly done as a project. So, at some point in time, someone decided to invest money in building their own platform. This approach faces one fundamental problem: the deadline. Projects are required to reach a target within a given time and budget frame. If the project runs out of time or money, it focuses on the operation and maintenance of it. These two parts of a life cycle are treated as separate things, causing a time period of rising sun and sunset. To explain this a bit, you can see heavy investments, communications, and excitement during the implementation phase. However, after hitting the deadline, the project turns into a dead object that requires maintenance. DevOps didn’t change this behavior; it simply often got new names, new roles, and different processes. However, in the end, budget, people, and attention are turned away to the next project, while just a fraction of the former budget stays. This is frustrating for engineers who have worked hard on the implementation, and it will become frustrating for the organization over time when costs for pure maintenance keep increasing. Still, the people who have built it might leave or join other projects. This short-term view on implementing systems has slowly killed many good projects and team spirits. More importantly, it shows that the business value of the solution isn’t clear. When an implementation, such as a platform, can provide explicit value, there shouldn’t be a reason to turn away attention and cause its sundown.
While doing many implementations as projects is a valid approach, this is its death sentence with platform implementations. Regular implementations are feature complete; they can exist after they are done. But a platform will be always moving, always being upgraded and always be implementing new features. When working with open-source and cloud providers, you will learn early how fast tools and software are in their own development cycle. Features, fixes, and security patches are continuously published. This is a significant challenge for larger organizations as they are still used to far slower release cycles. The upside of keeping up the speed of this rapid development is that you, as an organization, can profit from new features and capabilities frequently. It is an innovation driver and enabler, allowing you to implement systems in other ways and solve problems you might not be aware of. Is a problem that isn’t painful for you a real problem? Organizations tend not to consider such things as issues since they are used to only identifying painful processes and approaches.
Let us look at an example. In the current year (2024), the European Union released a law to improve companies’ reporting on their carbon emissions. On a high level, this also includes IT resources. Also, within the last few years, multiple open source foundations and projects have been started to bring transparency to the energy consumption of software. A year ago, we would only have been able to report very rough, highly estimated numbers when it comes to energy consumption for a data center, for a server, and with some manual processing, for a piece of software. Today, we can obtain fine-grained information for any application running on bare metal, hypervisor, or containerized within Kubernetes as tooling has evolved to provide this data. Public CSPs provide more and more insights into their own energy consumption. What can we expect for the coming year? We can expect even better numbers, including the regional carbon mix of the energy and end-to-end visibility and transparency of such numbers. With platforms and platform engineering teams, such transparency will naturally come over time. It doesn’t require a project turning IT upside down. Calling out the demand for it will result in platform engineering teams implementing those capabilities into the platform’s core to benefit everyone who uses that platform to build, deploy, release, and operate software.
This is called a product mindset and it feels natural for platforms to adapt to the demands of their environment.
Platforms as a product are user-centric, listening to and actively researching the end user demand to keep improving their services. A product is also aware of its value. Similar to any app on your mobile phone, it uses its own value to refinance further development and new features. Here, there are no deadlines and no sunsets. The goal is just to strive to become better with every release. What this gives to an organization is an expert team that keeps actively working on central enablement for providing your business with a platform of value generation.
Designing and developing a platform as a product goes beyond the pure engineering aspect. It faces organizational challenges that should be considered when you actively decide to build a platform. In fact, you have to deliver valid numbers on your benefits and show that your platform carries its own costs. This must be in the mindset of the product owner and platform engineers. The idea here is not to become business people but to be able to clearly communicate the reason for existence and, more importantly, to be a product that doesn’t have a deadline.
Right now, you can find three different types of platforms as products:
IDPs:Provide a best-in-class experience for software engineersEnable the development and operations teams for the end-to-end support and visibility of their softwareBring governance, compliance, and securityEstablish a self-service for the development teams and simplify the deployment processData science and machine learning platforms:Similar to IDPs and often evolve out of thoseLeverage their scalability to research, analyze, and process data cost-efficientlyOvercome complex implementation and make them generally availableProvide direct, secure access to relevant data sourcesLow-code/business platforms:Strongly driven trend to provide platforms that bring solutions with which it is possible to implement new features with relatively less to almost no coding demandWe will see them more in the years to comeIn our book, we will focus on the product-centric view of architecting IDPs.
Like with any other complex environment, we first have to ask – do we really need a platform? Do we know what we will use it for?
Although platforms can provide a lot of benefits, they are not always the answer to your organizational questions. The signs that you are not ready for a platform yet are as follows:
You have only monolithic applicationsYou don’t have your own development teamYour DevOps, SysAdmin, or infrastructure team is heavily overworked or siloedYou have very simple applications that can run anywhereYou are having a hard time providing a budget for training to grow the skills of your teamsYou usually run commercial, off-the-shelf solutionsOn the other hand, when does an IDP make sense to you? The following criteria are indicators that you’re ready for an IDP:
You have requirements for multiple infrastructure environments or foster a multi-cloud strategyYou require advanced control over your environments (security, compliance, and deep insights into infrastructure and application behavior)Your development team is continuously overloaded with non-valuable tasksYou have a curious and interested DevOps or infrastructure team that has taken its first steps toward a platform without knowing itYour application requires some kind of orchestration due to microservice architecture because many components or different integrations need to play well togetherYou want to enable your organization to optimize your IT for costs, transparency, quality, or securityBefore you define a platform product for your organization, you should answer all the points on the checklist. It makes sense to have multiple points illustrating why you need it. For example, having a team ask for an IDP, or having someone mention that they’ve heard of it in a conference, is not a strong foundation for making such a decision. The introduction of platforms is a journey, and from our experience, it can become the central focal point for one’s company in a relatively short amount of time. Under such pressure, you still need to have a purpose and direction.
Now that we have learned how to define the purpose of our platform, we will need to discuss the question of whether we really need this additionalabstraction layer.
Let’s briefly summarize what we have seen so far. A platform puts a bracket around, and a layer of abstraction on top of, your existing infrastructure and environments. The platform enhances it with further capabilities so that your development teams can utilize it in an automated, self-serving way.
From a technical perspective, this represents the next layer of abstraction. Therefore, it is only right to discuss this new fabric we put on top. Going from the bottom to the top, we see the bare metal, followed by the hypervisor for virtualization; this is topped by cloud providers. Some might include containers, Kubernetes, or serverless components – and now, we will add our platform. These are at least four layers, each promising to make the layer underneath simpler and glued together by a hard-to-be-defined meta-level of scripts, Infrastructure as Code (IaC), cloud libraries, and automation. So, do we really need this yet another layer, or are we using it to keep ourselves busy building things?
There is no simple answer to that question, but by looking into the purpose of each layer, you might be able to grow your own understanding of it.
Hypervisors were initially introduced to simplify the supply of hosts so that software could run and better utilize servers. Today, they still serve the same purpose but could be replaced by Kubernetes and container runtimes. A key argument against this replacement is that a virtual machine provides better isolation and higher security. Without getting too deep into this discussion, there are options to provide very solid isolation, such as with Kata containers. The only component that causes headaches is the container with the OS and its runtime. Looking some years into the future, WebAssembly (Wasm) could be one part of the answer to that problem. Without an operating system in the container and pure naked binary files, there are almost no gates open for attacks. However, let’s give it some more time.
Infrastructure as a Service (IaaS) providers and public cloud providers enhance this with software-defined storage and networks, reducing the complexity of building your own data center and managing all physical dependencies. In addition, they provide further capabilities of commonly used scenarios, such as databases, load balancing, user management, message queues, or ML playgrounds and pre-trained AI models. This is a very useful implementation, which leads to a rapid development of the industry and an extension of what is possible. However, this also moved the whole industry in a problematic direction. The technology is developing faster than people and organizations, in particular, are able to adapt to it. We see that there is a shortage of professional engineers across the globe, while businesses are looking into providing more digital services every year. Solutions and their dependencies are therefore built natively to the cloud. The return on this effort can be significant. You are able to manage any kind of infrastructure and services with a relatively small group of people globally. Yet the reality also states that the average company has between two and three IaaS and public CPSs, plus (often) its own computing capacities, as well as around 10 SaaS providers, which can go up to around 50 for major enterprises [1]. CSPs also have between 40 to 200 different services. In other words, we are able to achieve a lot today, but the complexity of those environments has also become significant.
To tame this scale, IaC and Cloud Development Kits (CDKs) have become the tools of choice to manage your landing zones and software integrations. The fun part of the story is that practices such as DevOps, which are commonly misinterpreted, have made things even worse. These misinterpretations have now led to the sudden expectation for developers to also set up and maintain the infrastructure for their software needs.
Last but not least, we have systems based on containers, Kubernetes, or serverless. Dozens of options exist for each of them to provision those environments, deploy the code, and run the components. It’s understandable that there are too many layers you have to take care of. However, their development is reasonable, as you don’t want to do things in the old way of getting software up and running. Pushing code to images and from there to a runtime you choose simplifies the provisioning process.
Overall, to represent the level of complexity, we can think of a three-dimensional object such as a cube made of cubes. The following illustration shows the different service layers, representing different maturity of abstraction and how layer after layer comes together to form an IT environment.
Figure 1.3: The multi-dimensional complexity of computing abstraction and simplification
Now, that figure is oversimplified, considering the hundreds and thousands of options you have in each dimension. However, it still gives a first good hint: if you need a platform, build a wrapper around this construction and tame its immense complexity to harness its power.
In order to manage all these layers, we need to know about and use many tools, as well as follow various processes. It becomes difficult to focus on the actual job and create value while spending a large amount of time on things that should simplify our work. This is called the cognitive load. Originally made famous by Daniel Bryant, that term puts a bracket around the job overload and mental stress of many developers, as well as other specialists within IT. Reducing the cognitive load brings more happiness and satisfaction, but also effectiveness and reliability, to the engineers. Looking at the following graphic simplifies the perspective on what needs to be handled as a professional across the different decades. However, going forward, we have to reduce this load. AI could be part of this, alongside new concepts for running computing processes, and platforms of course.
Figure 1.4: The extended cognitive load with a projection to an ideal future
Not only does technology change over time but it also piles up. This means that we have to run and maintain legacy systems while changing architecture styles and introducing new programming paradigms and new tools. This also changes the responsibilities and extends them far beyond the typical borders of one’s job description from some years back. Breaking down this problem can reveal an answer to the question of whether we need a platform to solve our problems or whether implementing one will increase the complexity again.
In the end, every organization is different. Some are stuck in the early 2000s, and others continuously try to adapt to what comes next. Even within the same organization, you can often find drastic differences. One department might run everything on some VMs in their own data center, while the next might deploy functions within a global CDN or edge provider. Therefore, it’s on you to draw up a vision, strategy, and goal for your platform, if you really need it.
The complexity we experience and the act of putting pressure on engineers needs to be encountered because IT tends to become more complicated over time. In the upcoming section, we will focus on implementing the right solution for developers to overcome that troublesome direction we are heading toward, and which can even lead to burnout.
Throughout the next few years, we will see an evolution of cloud computing. In this context, platforms will play a crucial role. On the one hand, the cloud will be everywhere, becoming an abstraction for infrastructure. It doesn’t matter whether this is in the form of edge computing or very specialized services or offers. On the other hand, as we have learned in the previous section, we have to focus on delivering environments that enable the best experience possible for developers and other roles, so those people can focus on generating value. Bringing these elements together practically is the key enablement for an IT organization to keep up the speed with the market while delivering continuous value to your company.
The pervasive cloud is not a single solution. It clusters a variety of cloud-computing capabilities that are undergoing a transformative shift to drive business and innovation significantly. The key advancements focus on the integration of cloud technologies anywhere, from private data centers over distributed computational networks to the edge. However, the pervasive cloud goes beyond that. It follows concepts to bridge physical gaps through sensors, IoT components, mobile devices, and other smart connected solutions. Therefore, it is known under other terms such as ubiquitous computing, ambient intelligence, or everywhere.
Gartner, the research company, assumes that six further technologies will shape the pervasive cloud and define its nature [3]:
Augmented FinOps: Combines DevOps methods with cost optimization and budgetingCloud Development Environments (CDEs): Simplify and unify the development environment, reducing human errors and ensuring reproducibilityCloud sustainability: Achieving environmental, social, and economic benefits, reducing the harmful impact of the strongly growing cloud computing tech, and leveraging its power for goodCloud-native: Implementing cloud characteristics as defined beforeCloud-out to the edge: CSP capabilities extended to the edgeWasm: The potential ubiquitous runtime and binary format for everywhere, but not necessarily everythingHowever, we need to ask why this is now relevant to us as platform engineers, architects, and developers.
First, you can find many of the technologies that we are already working on in these definitions and assumptions. Cloud-native, FinOps, edge, and CDEs are daily realities, while sustainable IT and Wasm have experienced heavy development in recent years. That’s all relevant in making it clear that we are not discussing sci-fi technologies that won’t be attainable within the next 100 years. It’s happening right now and it is ready to be used. We develop and innovate all of those foundations; it just might not be as visible and prominent as GenAI.
Second, to extract the maximum value from cloud investments, businesses must adopt automated operational scaling, leverage cloud-native platform tools, and implement effective governance. These platforms integrate essential services such as SaaS, Platform as a Service (PaaS), and IaaS to create comprehensive product offerings with modular capabilities. IT leaders are encouraged to utilize the modular nature of these platforms to maintain adaptability and agility in the face of rapid market disruptions. Imagine the complexity of such environments without a platform that tames this wide range of motion. Even so, with all that complexity, we need to keep the product mindset in focus, or else it will be hard to provide reliable IT services and solutions in the future.
Figure 1.5: Cloud concepts are found everywhere in a pervasive cloud
Looking at the preceding diagram, you can find elements of the pervasive cloud everywhere. We shouldn’t look at this figure as if those are separate items. Everything is connected. Apps on phones talk with services in the cloud or in local hubs, corporates have multiple networks connecting various computing environments with each other, and we have entirely skipped more progressive concepts such as Web3 here.
Important note
IT as we know it today is undergoing a heavy transformation, both in the visible and invisible spectrum. With every step we take, we increase its complexity while facing demographic pressure and a shortage of professionals. Sooner or later, most companies will be required to have their own platform. If they don’t, they will buy it as a service.
It is not sustainable to hope that every developer will be able to cover the extremely wide landscape of tools and technologies without burning out within a few years. Therefore, the quality of user experience is pivotal in determining the adoption and success of a platform. A well-designed platform means that it is intuitive, easy to navigate, and aligned with the developers’ expectations and workflows. Enhancing the experience involves streamlining interactions, minimizing friction points, and providing a visually, technically, and functionally pleasing environment. This not only improves user satisfaction but also boosts productivity and engagement. The question is how to achieve this.
We must consider that every developer might have a different preference when designing the platform. It starts directly with the problem of the interaction between the platform and the user. Developers might ask various questions, such as the following. Do we need to set up a portal? Is pushing code on a Git service enough? Can I interact with the platform via CLI? It can be hard to tell, but successful platforms provide all of those interactions. Starting with an API-centric approach will enable any other path to be taken simultaneously. A strong API is the core of a good platform. In reality, most platforms still provide multiple different interfaces. The rapid development of tools to unify this will overcome such challenges and if considered to be built on greenfield, it can be then placed directly into the core.
An example of such a core is Kratix. The Apache 2.0-licensed open source platform describes itself as a “... platform framework for building composable IDPs.” In the following figure, you can see how Kratix positions itself between all the common tools we use today and provides one entry point.
Figure 1.6: Kratix overview as a central integration component
Kratix achieves this through the concept of Promises, which is technically a YAML document that defines a contract between the platform and the users. Every team has to go through a complex onboarding process, not because of the platform itself but because of other dependencies such as CI/CD, Git repositories, and linking everything together. With Kratix Promises, you encapsulate all those steps or combine multiple Promises into one.
Now, Kratix supports simplifying the platform foundation for the developer experience, yet something is missing. The other side of the coin is a developer portal. Backstage is an example of an open source Apache 2.0-licensed solution developed by Spotify. Kratix and Backstage are working well together and integrating seamlessly. Backstage is a framework that enables GUIs to be declaratively created with the aim of unifying infrastructure tooling, services, and documentation to produce a fantastic developer experience. Backstage comes with three core features: the service definition, the Backstage service catalog, and its plugin system, through which you can enable further features such as docs.
Figure 1.7: Backstage’s three core features
At this point, we have seen the challenges that need to be solved, and we have taken a sneak peek into the solution space. That should give us a feeling of the current possibilities before we dive into details throughout the next chapters.
A platform