35,99 €
Generative AI (GenAI) is revolutionizing industries, from chatbots to recommendation engines to content creation, but deploying these systems at scale poses significant challenges in infrastructure, scalability, security, and cost management.
This book is your practical guide to designing, optimizing, and deploying GenAI workloads with Kubernetes (K8s) the leading container orchestration platform trusted by AI pioneers. Whether you're working with large language models, transformer systems, or other GenAI applications, this book helps you confidently take projects from concept to production. You’ll get to grips with foundational concepts in machine learning and GenAI, understanding how to align projects with business goals and KPIs. From there, you'll set up Kubernetes clusters in the cloud, deploy your first workload, and build a solid infrastructure. But your learning doesn't stop at deployment. The chapters highlight essential strategies for scaling GenAI workloads in production, covering model optimization, workflow automation, scaling, GPU efficiency, observability, security, and resilience.
By the end of this book, you’ll be fully equipped to confidently design and deploy scalable, secure, resilient, and cost-effective GenAI solutions on Kubernetes.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 437
Veröffentlichungsjahr: 2025
Kubernetes for Generative AI Solutions
A complete guide to designing, optimizing, and deploying Generative AI workloads on Kubernetes
Ashok Srirama
Sukirti Gupta
Copyright © 2025 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Portfolio Director: Kartikey Pandey
Relationship Lead: Prachi Rana
Project Manager: Sonam Pandey
Content Engineer: Sarada Biswas
Technical Editor: Simran Ali
Copy Editor: Safis Editing
Indexer: Manju Arasan
Proofreader: Sarada Biswas
Production Designer: Shankar Kalbhor
Growth Lead: Shreyans Singh
First published: June 2025
Production reference: 1230525
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK
ISBN 978-1-83620-993-5
www.packtpub.com
To my mother, my wife, and everyone whose presence shaped the person I’ve become.
To my special needs children, thank you for teaching me life’s true meaning: empathy, resilience, and humility.
– Ashok Srirama
To my wife and children, whose love fills my life with purpose and joy. To my parents, for all their guidance and unwavering support, and to my brother, for always being my ally, offering constant encouragement and strength.
– Sukirti Gupta
Over the past few years, I’ve had the privilege of working closely with organizations that are pushing the limits of what’s possible with cloud-native technologies. From Serverless modernization to Kubernetes-scale container orchestration, one thing has become abundantly clear—Generative AI is no longer a futuristic concept. It’s here, it’s powerful, and it’s redefining how we build, deploy, and scale intelligent systems.
When I first reviewed this manuscript, I was struck by how comprehensive yet hands-on it is. Kubernetes for Generative AI Solutions does more than teach you how to run LLMs or deploy state-of-the-art models—it gives you a production-grade blueprint. From observability to cost optimization, from secure scaling to HA/DR patterns, this book meets developers, architects, and product leaders where they are—and takes them forward with confidence.
I’ve had the privilege of working with Ashok Srirama and Sukirti Gupta, who are recognized thought leaders in cloud-native technologies. Their combined experience at AWS and in modernizing large-scale systems is evident throughout this book. Ashok’s deep technical expertise in EKS and GenAI infrastructure, paired with Sukirti’s strategic perspective on product development and GTM expertise, make this book both visionary and grounded.
What I especially appreciated is the pragmatic tone—not just code, but also battle-tested patterns, modular Terraform stacks, observability practices, and real-world tips that could save you weeks, if not months, of trial and error. As someone who spends a lot of time enabling teams to scale with Kubernetes and Serverless on AWS, I found myself nodding along and learning from the practical insights this book delivers.
Whether you’re a solutions architect trying to productionize LLM apps, a start-up founder exploring OpenAI and Amazon Bedrock, or an enterprise leader looking to make sense of GPUs on Kubernetes, this book is a must-have on your desk.
Ashok and Sukirti have given us a gift—a guide not just to building GenAI systems, but to building them right.
Rajdeep Saha
Principal Solutions Architect, AWS
Bestselling Author, Mentor, Speaker at KubeCon, AWS re:Invent
Ashok Srirama is a principal specialist solutions architect at AWS, where he leads initiatives to architect scalable, secure, and cost-efficient container-based solutions for enterprise customers. With over 19 years of experience in IT, Ashok brings profound expertise in cloud architecture, Kubernetes, container platforms, and, most recently, Generative AI.
Before joining AWS, Ashok held pivotal cloud architecture roles at AIG and IBM, where he led digital transformation initiatives and cloud migration projects across insurance and communication sectors. His technical acumen spans across designing distributed architectures, infrastructure automation, and application modernization using containers and serverless technologies.
As a recognized thought leader in cloud-native architecture, Ashok has authored numerous technical publications, including 20+ official AWS blogs and technical guides on Amazon EKS networking, observability, security, and container CI/CD pipelines. He has presented at over 25+ public events, including AWS re:Invent, AWS Summits, and start-up CTO cohorts, sharing his expertise with the broader technical community.
Ashok’s commitment to technical excellence is reflected in his extensive certification portfolio, which encompasses all 12 AWS technical certifications and the complete suite of Kubernetes certifications from the Linux Foundation. His achievements have earned him the coveted AWS Gold Jacket and Kubestronaut accreditation.
Beyond his architectural work, Ashok is passionate about enabling developers to simplify the complexity of running GenAI workloads at scale using cloud-native tools.
Sukirti Gupta is a technologist and product management leader at Amazon Web Services (AWS), where he leads the adoption of Generative AI technologies across start-up ecosystems. With over 15 years of experience in cloud computing, AI/ML, and data center technologies, he has played influential roles in shaping product narratives and engineering solutions for high-impact workloads across AWS, AMD, and Intel.
At AWS, Sukirti leads initiatives that help start-ups integrate GenAI into their product strategy, enabling them to innovate with powerful infrastructure and tools. His previous roles include leading cloud product development at AMD and managing GTM strategy for Intel’s flagship computing platforms, where he helped drive billion-dollar revenue programs.
Sukirti holds a B.Tech. from IIT (BHU), Varanasi, an M.S. in electrical engineering from the University of Cincinnati, and an MBA in strategy and marketing from Santa Clara University.
In addition to his corporate work, Sukirti loves to mentor AI start-ups through IIT’s accelerator programs and frequently writes on Medium about GenAI trends and product leadership.
Swati Tyagiis an AI/ML leader with over a decade of experience, specializing in Generative AI, large language models (LLMs), and responsible AI. She has contributed to impactful AI initiatives across finance, healthcare, and education, including in her current role as a senior machine learning engineer at JPMorgan Chase, where she focuses on deploying scalable and ethical AI solutions.
Swati’s expertise spans large-scale model deployment, explainability, bias mitigation, and hyper-personalization. She brings together deep technical knowledge in AI/ML with hands-on experience in MLOps, cloud-native architectures (AWS), and production-grade model development. Her leadership extends to advisory roles with LLM start-ups and AI education initiatives. She actively serves on technical program committees, editorial boards, and advisory panels across the AI community.
Swati holds a Ph.D. in statistics and machine learning from the University of Delaware, a master’s in business analytics from IIT Delhi, and a bachelor’s degree in computer science. A committed lifelong learner and IEEE Senior Member, she continues to advance the field of responsible AI through cutting-edge research, mentorship, and active community engagement.
Dhirendra Kumar is a seasoned IT professional with over 22 years of diverse industry experience, including healthcare and infrastructure companies, currently serving as a senior cloud architect for a hedge fund company, building and designing distributed systems. His passion lies in crafting innovative solutions, showcasing a commitment to lifelong learning. Beyond his professional endeavors, Kumar is an enthusiastic contributor to open source projects within the Cloud Native Computing Foundation and other cloud-native initiatives. In his free time, he actively supports cutting-edge solutions that drive the industry forward.
He has also worked on books such as Implementing GitOps with Kubernetes, Cloud Native Development with Azure, and AWS Cloud Engineering Guide.
I would like to thank my wife, Geethashri, and my parents, for the patience and support they provided during my review of this book.
This section introduces Generative AI (GenAI) fundamentals, tracing its evolution from traditional neural networks to transformers, and outlines the complete GenAI project life cycle. It explores how containers and Kubernetes address the challenges of GenAI workloads, and provides a guide to getting started with Kubernetes in the cloud.
This part has the following chapters:
Chapter 1, Generative AI FundamentalsChapter 2, Kubernetes – Introduction and Integration with GenAIChapter 3, Getting Started with Kubernetes in the Cloud