E-Book
35,99 €

Kubernetes for Generative AI Solutions E-Book

Ashok Srirama

0,0

35,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Generative AI (GenAI) is revolutionizing industries, from chatbots to recommendation engines to content creation, but deploying these systems at scale poses significant challenges in infrastructure, scalability, security, and cost management.
This book is your practical guide to designing, optimizing, and deploying GenAI workloads with Kubernetes (K8s) the leading container orchestration platform trusted by AI pioneers. Whether you're working with large language models, transformer systems, or other GenAI applications, this book helps you confidently take projects from concept to production. You’ll get to grips with foundational concepts in machine learning and GenAI, understanding how to align projects with business goals and KPIs. From there, you'll set up Kubernetes clusters in the cloud, deploy your first workload, and build a solid infrastructure. But your learning doesn't stop at deployment. The chapters highlight essential strategies for scaling GenAI workloads in production, covering model optimization, workflow automation, scaling, GPU efficiency, observability, security, and resilience.
By the end of this book, you’ll be fully equipped to confidently design and deploy scalable, secure, resilient, and cost-effective GenAI solutions on Kubernetes.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 441

Veröffentlichungsjahr: 2025

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Kubernetes for Generative AI Solutions

A complete guide to designing, optimizing, and deploying Generative AI workloads on Kubernetes

Ashok Srirama

Sukirti Gupta

Kubernetes for Generative AI Solutions

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Portfolio Director: Kartikey Pandey

Relationship Lead: Prachi Rana

Project Manager: Sonam Pandey

Content Engineer: Sarada Biswas

Technical Editor: Simran Ali

Copy Editor: Safis Editing

Indexer: Manju Arasan

Proofreader: Sarada Biswas

Production Designer: Shankar Kalbhor

Growth Lead: Shreyans Singh

First published: June 2025

Production reference: 1230525

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK

ISBN 978-1-83620-993-5

www.packtpub.com

To my mother, my wife, and everyone whose presence shaped the person I’ve become.

To my special needs children, thank you for teaching me life’s true meaning: empathy, resilience, and humility.

– Ashok Srirama

To my wife and children, whose love fills my life with purpose and joy. To my parents, for all their guidance and unwavering support, and to my brother, for always being my ally, offering constant encouragement and strength.

– Sukirti Gupta

Foreword

Over the past few years, I’ve had the privilege of working closely with organizations that are pushing the limits of what’s possible with cloud-native technologies. From Serverless modernization to Kubernetes-scale container orchestration, one thing has become abundantly clear—Generative AI is no longer a futuristic concept. It’s here, it’s powerful, and it’s redefining how we build, deploy, and scale intelligent systems.

When I first reviewed this manuscript, I was struck by how comprehensive yet hands-on it is. Kubernetes for Generative AI Solutions does more than teach you how to run LLMs or deploy state-of-the-art models—it gives you a production-grade blueprint. From observability to cost optimization, from secure scaling to HA/DR patterns, this book meets developers, architects, and product leaders where they are—and takes them forward with confidence.

I’ve had the privilege of working with Ashok Srirama and Sukirti Gupta, who are recognized thought leaders in cloud-native technologies. Their combined experience at AWS and in modernizing large-scale systems is evident throughout this book. Ashok’s deep technical expertise in EKS and GenAI infrastructure, paired with Sukirti’s strategic perspective on product development and GTM expertise, make this book both visionary and grounded.

What I especially appreciated is the pragmatic tone—not just code, but also battle-tested patterns, modular Terraform stacks, observability practices, and real-world tips that could save you weeks, if not months, of trial and error. As someone who spends a lot of time enabling teams to scale with Kubernetes and Serverless on AWS, I found myself nodding along and learning from the practical insights this book delivers.

Whether you’re a solutions architect trying to productionize LLM apps, a start-up founder exploring OpenAI and Amazon Bedrock, or an enterprise leader looking to make sense of GPUs on Kubernetes, this book is a must-have on your desk.

Ashok and Sukirti have given us a gift—a guide not just to building GenAI systems, but to building them right.

Rajdeep Saha

Principal Solutions Architect, AWS

Bestselling Author, Mentor, Speaker at KubeCon, AWS re:Invent

Contributors

About the authors

Ashok Srirama is a principal specialist solutions architect at AWS, where he leads initiatives to architect scalable, secure, and cost-efficient container-based solutions for enterprise customers. With over 19 years of experience in IT, Ashok brings profound expertise in cloud architecture, Kubernetes, container platforms, and, most recently, Generative AI.

Before joining AWS, Ashok held pivotal cloud architecture roles at AIG and IBM, where he led digital transformation initiatives and cloud migration projects across insurance and communication sectors. His technical acumen spans across designing distributed architectures, infrastructure automation, and application modernization using containers and serverless technologies.

As a recognized thought leader in cloud-native architecture, Ashok has authored numerous technical publications, including 20+ official AWS blogs and technical guides on Amazon EKS networking, observability, security, and container CI/CD pipelines. He has presented at over 25+ public events, including AWS re:Invent, AWS Summits, and start-up CTO cohorts, sharing his expertise with the broader technical community.

Ashok’s commitment to technical excellence is reflected in his extensive certification portfolio, which encompasses all 12 AWS technical certifications and the complete suite of Kubernetes certifications from the Linux Foundation. His achievements have earned him the coveted AWS Gold Jacket and Kubestronaut accreditation.

Beyond his architectural work, Ashok is passionate about enabling developers to simplify the complexity of running GenAI workloads at scale using cloud-native tools.

Sukirti Gupta is a technologist and product management leader at Amazon Web Services (AWS), where he leads the adoption of Generative AI technologies across start-up ecosystems. With over 15 years of experience in cloud computing, AI/ML, and data center technologies, he has played influential roles in shaping product narratives and engineering solutions for high-impact workloads across AWS, AMD, and Intel.

At AWS, Sukirti leads initiatives that help start-ups integrate GenAI into their product strategy, enabling them to innovate with powerful infrastructure and tools. His previous roles include leading cloud product development at AMD and managing GTM strategy for Intel’s flagship computing platforms, where he helped drive billion-dollar revenue programs.

Sukirti holds a B.Tech. from IIT (BHU), Varanasi, an M.S. in electrical engineering from the University of Cincinnati, and an MBA in strategy and marketing from Santa Clara University.

In addition to his corporate work, Sukirti loves to mentor AI start-ups through IIT’s accelerator programs and frequently writes on Medium about GenAI trends and product leadership.

About the reviewers

Swati Tyagiis an AI/ML leader with over a decade of experience, specializing in Generative AI, large language models (LLMs), and responsible AI. She has contributed to impactful AI initiatives across finance, healthcare, and education, including in her current role as a senior machine learning engineer at JPMorgan Chase, where she focuses on deploying scalable and ethical AI solutions.

Swati’s expertise spans large-scale model deployment, explainability, bias mitigation, and hyper-personalization. She brings together deep technical knowledge in AI/ML with hands-on experience in MLOps, cloud-native architectures (AWS), and production-grade model development. Her leadership extends to advisory roles with LLM start-ups and AI education initiatives. She actively serves on technical program committees, editorial boards, and advisory panels across the AI community.

Swati holds a Ph.D. in statistics and machine learning from the University of Delaware, a master’s in business analytics from IIT Delhi, and a bachelor’s degree in computer science. A committed lifelong learner and IEEE Senior Member, she continues to advance the field of responsible AI through cutting-edge research, mentorship, and active community engagement.

Dhirendra Kumar is a seasoned IT professional with over 22 years of diverse industry experience, including healthcare and infrastructure companies, currently serving as a senior cloud architect for a hedge fund company, building and designing distributed systems. His passion lies in crafting innovative solutions, showcasing a commitment to lifelong learning. Beyond his professional endeavors, Kumar is an enthusiastic contributor to open source projects within the Cloud Native Computing Foundation and other cloud-native initiatives. In his free time, he actively supports cutting-edge solutions that drive the industry forward.

He has also worked on books such as Implementing GitOps with Kubernetes, Cloud Native Development with Azure, and AWS Cloud Engineering Guide.

I would like to thank my wife, Geethashri, and my parents, for the patience and support they provided during my review of this book.

Preface

Free Benefits with Your Book

Part 1: GenAI and Kubernetes Foundation

1 Generative AI Fundamentals

Artificial Intelligence versus GenAI

Evolution of machine learning

Transformer architecture

GenAI project life cycle

GenAI deployment stack

GenAI use cases

Summary

Appendix 1A – RNNs

Appendix 1B – Transformer mathematical models for the self-attention mechanism

Understanding the temperature parameter for GenAI use cases

2 Kubernetes – Introduction and Integration with GenAI

Understanding containers

Container terminology

Creating a container image

Why containers for GenAI models?

Building a GenAI container image

What is Kubernetes (K8s)?

Kubernetes architecture

Why K8s is a great fit for GenAI models

Summary

Appendix

3 Getting Started with Kubernetes in the Cloud

Advantages of running K8s in the cloud

Setting up a K8s cluster in the cloud

Prerequisites

Provisioning the Amazon EKS cluster

Deploying our first GenAI model in the K8s cluster

Summary

Part 2: Productionalizing GenAI Workloads Using K8s

4 GenAI Model Optimization for Domain-Specific Use Cases

Technical requirements

The need for domain-specific optimization

LLM model selection

The LangChain framework

Understanding RAG

How RAG works

Running a query

Model fine-tuning

Fine-tuning example

Summary

5 Working with GenAI on K8s: Chatbot Example

Technical requirements

GenAI use cases for e-commerce

Experimentation using JupyterHub

Fine-tuning Llama 3 in K8s

Data preparation

Creating a container image

Deploying the fine-tuning job

Deploying the fine-tuned model on K8s

Deploy a RAG application on K8s

Deploying a chatbot on K8s

Summary

6 Scaling GenAI Applications on Kubernetes

Scaling metrics

Conventional metrics

Custom metrics

HorizonalPodAutoscaler (HPA)

VerticalPodAutoscaler (VPA)

Combining HPA and VPA

KEDA

Cluster Autoscaler (CA)

Karpenter

Summary

7 Cost Optimization of GenAI Applications on Kubernetes

Understanding the key cost components

Kubecost

Cost optimization techniques

Compute best practices

Networking best practices

Storage best practices

Summary

Join the CloudPro Newsletter with 44000+ Subscribers

8 Networking Best Practices for Deploying GenAI on K8s

Understanding the Kubernetes networking model

Selecting the CNI networking mode for GenAI applications

Service implementation in K8s

Service health checks

Advanced traffic management with a service mesh

Securing GenAI workloads with Kubernetes’ network policies

Implementing network policies in a chatbot application

Service mesh versus K8s network policies

Optimizing network performance for GenAI

Kube-Proxy – IPTables versus IPVS

eBPF and SR-IOV

CoreDNS

Network latency and throughput enhancements

Summary

9 Security Best Practices for Deploying GenAI on Kubernetes

Technical requirements

Defense in depth

K8s security considerations

Supply chain security

Host security

Container runtime security

Network security

Secrets management

Additional considerations for GenAI apps

Data privacy and compliance

Secure model endpoints

Implementing security best practices in a chatbot app

Summary

10 Optimizing GPU Resources for GenAI Applications in Kubernetes

Technical requirements

GPUs and custom accelerators

Allocating GPU resources in K8s

Understanding GPU utilization

NVIDIA Data Center GPU Manager (DCGM)

GPU utilization challenges

Techniques for partitioning and sharing GPUs

NVIDIA MIG

NVIDIA MPS

GPU time-slicing

Scaling and optimization considerations

NVIDIA NIM

Summary

Part 3: Operating GenAI Workloads on K8s

11 GenAIOps: Data Management and the GenAI Automation Pipeline

Technical requirements

Overview of GenAI pipelines

GenAIOps on K8s

KubeFlow

MLflow

Argo Workflows

Ray

Deploying KubeRay on a K8s cluster

Comparing KubeFlow, MLFlow, and Ray

Data privacy, model bias, and drift monitoring

Methods to test bias and variance

Summary

Join the CloudPro Newsletter with 44000+ Subscribers

12 Observability – Getting Visibility into GenAI on K8s

Observability key concepts

Logs

Metrics

Traces

Monitoring tools in K8s

Fluentd and Fluent Bit

Loki

OpenTelemetry

Prometheus

Visualization and debugging

Grafana

LangChain observability

LangFuse

Summary

13 High Availability and Disaster Recovery for GenAI Applications

Designing for HA and DR

Resiliency in K8s

DR strategies in K8s

Additional K8s DR considerations

Summary

14 Wrapping Up: GenAI Coding Assistants and Further Reading

Technical requirements

GenAI-powered coding assistants

GenAI-powered observability and optimization

Amazon Q Developer walk-through with EKS

References for further reading

Summary

Stay Sharp in Cloud and DevOps – Join 44,000+ Subscribers of CloudPro

15 Unlock Your Exclusive Benefits

Index

Other Books You May Enjoy

Part 1:GenAI and Kubernetes Foundation

This section introduces Generative AI (GenAI) fundamentals, tracing its evolution from traditional neural networks to transformers, and outlines the complete GenAI project life cycle. It explores how containers and Kubernetes address the challenges of GenAI workloads, and provides a guide to getting started with Kubernetes in the cloud.

This part has the following chapters:

Chapter 1, Generative AI FundamentalsChapter 2, Kubernetes – Introduction and Integration with GenAIChapter 3, Getting Started with Kubernetes in the Cloud

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

Kubernetes for Generative AI Solutions E-Book

Ashok Srirama

Kubernetes for Generative AI Solutions

Foreword

Contributors

About the authors

About the reviewers

Table of Contents

Preface

Free Benefits with Your Book

Part 1: GenAI and Kubernetes Foundation

1

Generative AI Fundamentals

Artificial Intelligence versus GenAI

Evolution of machine learning

Transformer architecture

GenAI project life cycle

GenAI deployment stack

GenAI use cases

Summary

Appendix 1A – RNNs

Appendix 1B – Transformer mathematical models for the self-attention mechanism

Understanding the temperature parameter for GenAI use cases

2

Kubernetes – Introduction and Integration with GenAI

Understanding containers

Container terminology

Creating a container image

Why containers for GenAI models?

Building a GenAI container image

What is Kubernetes (K8s)?

Kubernetes architecture

Why K8s is a great fit for GenAI models

Summary

Appendix

3

Getting Started with Kubernetes in the Cloud

Advantages of running K8s in the cloud

Setting up a K8s cluster in the cloud

Prerequisites

Provisioning the Amazon EKS cluster

Deploying our first GenAI model in the K8s cluster

Summary

Part 2: Productionalizing GenAI Workloads Using K8s

4

GenAI Model Optimization for Domain-Specific Use Cases

Technical requirements

The need for domain-specific optimization

LLM model selection

The LangChain framework

Understanding RAG

How RAG works

Running a query

Model fine-tuning

Fine-tuning example

Summary

Further reading

5

Working with GenAI on K8s: Chatbot Example

Technical requirements

GenAI use cases for e-commerce

Experimentation using JupyterHub

Fine-tuning Llama 3 in K8s

Data preparation

Creating a container image

Deploying the fine-tuning job

Deploying the fine-tuned model on K8s

Deploy a RAG application on K8s

Deploying a chatbot on K8s

Summary

6

Scaling GenAI Applications on Kubernetes

Scaling metrics

Conventional metrics

Custom metrics

HorizonalPodAutoscaler (HPA)

VerticalPodAutoscaler (VPA)

Combining HPA and VPA

KEDA

Cluster Autoscaler (CA)