32,39 €
As AI agents evolve to take on complex tasks and operate autonomously, you need to learn how to build these next-generation systems. Author Valentina Alto brings practical, industry-grounded expertise in AI Agents in Practice to help you go beyond simple chatbots and create AI agents that plan, reason, collaborate, and solve real-world problems using large language models (LLMs) and the latest open-source frameworks.
In this book, you'll get a comparative tour of leading AI agent frameworks such as LangChain and LangGraph, covering each tool's strengths, ideal use cases, and how to apply them in real-world projects. Through step-by-step examples, you’ll learn how to construct single-agent and multi-agent architectures using proven design patterns to orchestrate AI agents working together. Case studies across industries will show you how AI agents drive value in real-world scenarios, while guidance on responsible AI will help you implement ethical guardrails from day one. The chapters also set the stage with a brief history of AI agents, from early rule-based systems to today's LLM-driven autonomous agents, so you understand how we got here and where the field is headed.
By the end of this book, you'll have the practical skills, design insights, and ethical foresight to build and deploy AI agents that truly make an impact.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 328
Veröffentlichungsjahr: 2025
AI Agents in Practice
Design, implement, and scale autonomous AI systems for production
Valentina Alto
AI Agents in Practice
Copyright © 2025 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing or its dealers and distributors will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Portfolio Director: Gebin George
Relationship Lead: Ali Abidi
Project Manager: Prajakta Naik
Content Engineer: Aditi Chatterjee
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Indexer: Pratik Shirodkar
Proofreader: Aditi Chatterjee
Production Designer: Alishon Falcon
Growth Lead: Nimisha Dua
First published: July 2025
Production reference: 1250725
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-80580-135-1
www.packtpub.com
To my family and friends—thank you for your support, patience, and encouragement throughout this journey.
– Valentina
Valentina Alto is a technical architect specializing in AI and intelligent apps at Microsoft Innovation Hub in Dubai. During her tenure at Microsoft, she covered different roles as a solution specialist, focusing on data, AI, and applications workloads within the manufacturing, pharmaceutical, and retail industries, and driving customers’ digital transformations in the era of AI. Valentina is an active tech author and speaker who contributes to books, articles, and events on AI and machine learning. Over the past two years, Valentina has published two books on generative AI and large language models, further establishing her expertise in the field.
I would like to thank my family and friends for their unwavering support, patience, and understanding throughout this process. Your encouragement has been invaluable.
I am also grateful to my colleagues and peers in the AI and technology community for the insightful discussions, feedback, and inspiration that have shaped my understanding of generative AI. Your contributions continue to push the boundaries of innovation.
A special thanks to Ali Abidi for giving me the opportunity to write this book in such an exciting moment in the era of AI agents. Special thanks to Prajakta Naik, Aditi Chatterjee, and Alishon Falcon for their valuable input and time reviewing this book, and to the entire Packt team for their support during the course of writing this book.
Amey Ramakant Mhadgut is a software engineer with over five years of industry experience and a master’s degree in computer science. He specializes in big data, generative AI, Python, AWS, and scalable software architecture. Amey brings both academic depth and practical insight to his reviews, offering thoughtful analysis of technical content.
Prudhvi Raj Dachapally, backed by 7+ years of industry experience, is a senior applied scientist at eBay, where he builds innovative AI solutions to enhance the recommendation experience. Previously, he led the AI team at Cyndx, developing semantic search engines and AI assistants powered by fine-tuned large language models for financial B2B products. He has 180+ citations and publications in top conferences, including EMNLP and CogSci. He holds a master’s degree from Indiana University Bloomington and remains active through mentorship and alumni engagement.
I would like to thank my wife, Sri Vyshnavi, for her constant support during this review process, and my parents, Subramanyam and Sri Neela, whose sacrifices and belief form the foundation of every opportunity I have today.
New frameworks, evolving architectures, research drops, production breakdowns—AI_Distilled filters the noise into a weekly briefing for engineers and researchers working hands-on with LLMs and GenAI systems. Subscribe now and receive a free eBook, along with weekly insights that help you stay focused and informed.
Subscribe at https://packt.link/TRO5B or scan the QR code below.
Contributors
Subscribe for a Free eBook
Preface
Who this book is for
What this book covers
To get the most out of this book
Disclaimer on AI usage
Get in touch
Join our Discord and Reddit space
Your Book Comes with Exclusive Perks – Here’s How to Unlock Them
Unlock Your Book’s Exclusive Benefits
How to unlock these benefits in three easy steps
Part 1: Foundations of AI Workflows and the Rise of AI Agents
Evolution of GenAI Workflows
Technical requirements
Understanding foundation models and the rise of LLMs
From narrow AI to foundation models
Under the hood of an LLM
How do we consume LLMs?
Latest significant breakthroughs
Small language models and fine-tuning
Model distillation
Reasoning models
DeepSeek
Road to AI agents
Text generation
Chat with your data
Multimodality
The need for an additional layer of intelligence: introducing AI agents
Summary
References
The Rise of AI Agents
Technical requirements
Evolution of agents from RPA to AI agents
Components of an AI agent
Different types of AI agents
Retrieval agents
Task agents
Autonomous agents
Summary
References
Subscribe for a Free eBook
Part 2: Designing, Building, and Scaling AI Agents
The Need for an AI Orchestrator
Introduction to AI orchestrators
Autonomy
Abstraction and modularity
Core components of an AI orchestrator
Workflow management
Memory and context handling
Tool and API integration
Error handling and monitoring
Security and compliance
Overview of the most popular AI orchestrators in the market
How to choose the right orchestrator for your AI agent
Summary
References
The Need for Memory and Context Management
Different types of memory
Short-term memory
Long-term memory
Semantic memory
Episodic memory
Procedural memory
In between short-term memory and long-term memory – the role of semantic caches
Managing context windows
Storing, retrieving, and refreshing memory
Temporal and spatial reasoning in AI agents
Popular tools to manage memory
LangMem
Mem0
LeTTA (formerly MemGPT)
Summary
References
Subscribe for a Free eBook
The Need for Tools and External Integrations
Technical requirements
The anatomy of an AI agent’s tools
Hardcoded and semantic functions
Hardcoded functions
Semantic functions
APIs and web services
Web APIs
Internal or enterprise APIs
Backend function APIs (service mesh or microservices)
Serverless functions/Lightweight APIs
Databases and knowledge bases
Structured data
Unstructured data
Synchronous versus asynchronous calls
Summary
References
Building Your First AI Agent with LangChain
Technical requirements
Introduction to the LangChain ecosystem
Build – the architectural foundation
Run – the operational layer
Manage – observability and iteration
Overview of out-of-the-box components
Use case – e-commerce AI agent
Scenario description
AskMamma’s building blocks
Developing the agent
Observability, traceability, and evaluation
Infusing the AI agent in the mobile app
Summary
References
Subscribe for a Free eBook
Multi-Agent Applications
Technical requirements
Introduction to multi-agent systems
Understanding and designing different workflows for your multi-agent system
Overview of multi-agent orchestrators
AutoGen
TaskWeaver
OpenAI Agents SDK
LangGraph
Building your first multi-agent application with LangGraph
Summary
References
Part 3: Road to an Open, Agentic Ecosystem
Orchestrating Intelligence: Blueprint for Next-Gen Agent Protocols
Technical requirements
What is a protocol?
Understanding the Model Context Protocol
Agent2Agent
Agent Commerce Protocol
Toward an agentic web
From the traditional web to the agentic web
Key components of NLWeb
Current progress and applications
Summary
References
Subscribe for a Free eBook
Navigating Ethical Challenges in Real-World AI
Ethical challenges in AI – fairness, transparency, privacy, and accountability
Fairness and bias
Transparency and explainability
Privacy and data protection
Accountability and liability
Safety and reliability
Agentic AI autonomy and its unique ethical challenges
Autonomy versus human control
Deception and manipulation
Unintended consequences and liability for agent actions
Responsible AI principles and practices
From principles to practice
Guardrails for safe and ethical AI
What are AI guardrails?
Content filtering and moderation in AI systems
Why content filtering is needed
How content filtering works
Ethical considerations in content moderation
Addressing the challenges: governance, regulations, and collaboration
Organizational governance and culture
Industry collaboration and self-regulation
Government regulations and policy
Summary
References
Other Books You May Enjoy
Join our Discord and Reddit space
Index
Cover
Index
In Part 1 of this book, we explore the evolution of AI development since the rise of large language models (LLMs) in late 2022, leading up to the emergence of AI agents as a new architectural paradigm.
This part begins by tracing how AI workflows have shifted—from simple API calls to more dynamic systems such as retrieval-augmented generation (RAG)—and highlights key technical breakthroughs such as fine-tuning, model distillation, and reinforcement learning from human feedback (RLHF). It introduces the concept of agentic behavior as the next frontier in building intelligent, goal-driven systems.
We then dive into what makes an AI agent: how it differs from traditional automation, the components that enable it (LLMs, tools, memory, and knowledge), and why the field is rapidly converging on more autonomous, interactive systems.
You’ll also gain an understanding of how AI agents build on the foundation of LLMs while incorporating new layers of intelligence and orchestration, setting the stage for more personalized, persistent, and task-oriented AI applications.
This part contains the following chapters:
Chapter 1, Evolution of GenAI WorkflowsChapter 2, The Rise of AI AgentsOver the past two years, large language models (LLMs) have reshaped the landscape of AI. From simple prompt-based interactions to complex applications across industries, LLMs have evolved rapidly, fueled by breakthroughs in architecture, training techniques, and fine-tuning strategies. As their capabilities evolved, the shift from ChatGPT to today’s agentic systems as of April 2025 marks a natural progression, where the addition of reasoning, planning, and action-taking capabilities represented a major technological leap.
This chapter explores the foundations of LLMs, how they’re built and consumed, and the differences between pre-trained and fine-tuned models. Most importantly, it sets the stage for the next leap forward: the emergence of AI agents.
In this chapter, we will cover the following topics:
Understanding foundation models and the rise of LLMsLatest significant breakthroughsRoad to AI agentsThe need for an additional layer of intelligence: introducing AI agentsBy the end of this chapter, you’ll have a clear understanding of how LLMs evolved, how they’re trained and deployed, and why the road to truly intelligent systems inevitably leads to the emergence of AI agents.
You can access the complete code for this chapter in the book’s accompanying GitHub repository at https://github.com/PacktPublishing/AI-Agents-in-Practice.
AI has undergone a fundamental transformation thanks to the emergence of foundation models—versatile, general-purpose models that can be adapted across a wide range of tasks. Among them, LLMs have taken center stage, redefining how we interact with machines through natural language.
Before the rise of foundation models, the field of AI was dominated by narrow AI—systems built to perform one specific task and nothing else. Each use case required a custom pipeline: a unique dataset, a dedicated model architecture, and a specialized training routine. If you wanted to classify emails as spam or not spam, you’d build a spam filter. If you needed to extract names and places from documents, you’d create a named entity recognizer. Want to summarize a news article? That would mean yet another bespoke model.
This fragmented approach had several drawbacks. Models were brittle—performing well only within the narrow domain they were trained for—and expensive to maintain. Any shift in the task or data distribution often meant retraining from scratch.
The introduction of foundation models marked a fundamental shift in how we build and think about AI systems. These models are trained on vast and diverse datasets that span multiple domains and tasks. The idea is to teach a single model a general understanding of the world—its language, structure, and patterns—during a large-scale pretraining phase. Once this general knowledge is embedded, the model can be adapted to specific tasks with minimal additional data and compute.
For example, instead of building a separate model for translating French to English, we can now take a pre-trained foundation model and fine-tune it on a smaller translation dataset. The pre-trained model already understands language syntax, grammar, and meaning. Fine-tuning simply aligns this understanding to a specific objective.
The key innovation behind foundation models is transfer learning. Rather than learning from scratch, these models transfer knowledge gained from general training to specific problems. This dramatically improves efficiency, reduces the amount of labeled data required, and leads to more robust and flexible AI systems.
Moreover, foundation models aren’t just about language. They apply across modalities: some models can process and generate not only text but also images, audio, or code.
In essence, foundation models act as a “base brain” for AI—trained once, then repurposed many times over. This scalability and adaptability have unlocked entirely new possibilities in how we build intelligent systems, setting the stage for more autonomous and interactive applications such as AI agents.
Now, we mentioned that foundation models are capable of managing a variety of data formats. Within the cluster of foundation models, we can find data-specific models that focus on only one data type, which is the case of LLMs.
Figure 1.1: Features of LLMs
LLMs are, in essence, the language-specialized version of foundation models. They’re built on deep neural network architectures—particularly transformers—and are trained to predict the next word in a sequence. But this seemingly simple goal unlocks surprisingly emergent behaviors. LLMs can carry on conversations, answer intricate questions, write code, and even simulate reasoning.
Definition
Emergent behaviors are complex capabilities that arise unexpectedly when a system reaches a certain scale, even though those capabilities weren’t explicitly programmed or anticipated. In the context of LLMs, these behaviors surface when models are scaled up in terms of data, parameters, and training time—unlocking new abilities that were absent in smaller versions.
As models scale, they begin to exhibit emergent properties, including the following:
In-context learning: LLMs can learn to perform a task simply by being shown a few examples in the prompt—without any fine-tuning. This was not seen in smaller models.Chain-of-thought reasoning: By generating intermediate reasoning steps, LLMs can solve multi-step problems such as math word problems or logical puzzles—something they previously struggled with.Analogical reasoning: They can solve analogy problems (e.g., “cat is to kitten as dog is to...”) in a way that resembles human cognitive processing.Arithmetic and logic: At scale, LLMs develop the ability to handle tasks such as multi-digit arithmetic or logic puzzles, even if these tasks weren’t part of their training objective.Understanding metaphors and humor: Advanced LLMs can interpret new metaphors or even attempt jokes—demonstrating an abstract grasp of language and nuance.Multi-task generalization: Rather than being trained for one specific task, they can simultaneously handle translation, summarization, question answering, and more—without task-specific training.These capabilities represent more than just better performance—they are qualitatively new behaviors that “emerge” only at scale, giving LLMs a surprisingly broad range of skills with real-world implications across domains.
At the core of every LLM is a powerful neural network architecture—most commonly, a transformer. These networks are built to process and understand patterns in data, particularly human language, by learning statistical relationships across billions of text examples. Though loosely inspired by the structure of the human brain, LLMs function purely through mathematics, passing information through interconnected layers that adapt as the model trains.
To make language computable, the first step is to convert it into numbers because neural networks can’t process raw text. This happens through two key steps—tokenization and embedding:
Tokenization breaks down sentences into smaller chunks called tokens. These could be full words or parts of words, depending on the model. For example, “The cat sat on the mat” might be split into individual words or smaller subword units, depending on the tokenizer used.Embedding takes these tokens and maps each one to a high-dimensional vector—a string of numbers that encodes its meaning and relationship to other words. These embeddings are learned during training so that similar words end up in similar regions of the model’s “semantic space.” This helps the model understand context and word usage, such as how “Paris” and “London” relate as cities.Figure 1.2: An example of embedding
Once the input is tokenized and embedded, it moves through the transformer network itself. Unlike traditional neural networks with just a few hidden layers, LLMs use dozens—or even hundreds—of stacked layers, each containing mechanisms called attention heads. These attention layers help the model decide which parts of the input are most relevant to a given prediction. For instance, when completing a sentence, the model learns to focus more on specific previous words that influence what should come next.
Training an LLM means teaching it to make better predictions over time. This is done through a method called backpropagation, where the model compares its predicted word to the correct one, calculates how far off it was, and then updates its internal parameters to reduce future errors.
Definition
Backpropagation is the core learning algorithm used to train neural networks. It works by comparing the model’s prediction to the correct answer, calculating the error (called the loss), and then adjusting the network’s internal parameters (weights) to reduce that error. This adjustment happens by “propagating” the error backward through the layers of the network—hence the name. Over time, this process helps the model make increasingly accurate predictions.
Let’s say you type The cat is on the.... The model predicts the next word by assigning probabilities to possible continuations such as mat, roof, or sofa. It doesn’t guess randomly—it relies on patterns it has seen during training.
This process is repeated across vast amounts of data—millions or billions of sentences—enabling the model to gradually capture the structure and rhythm of language. The result is a system capable of not just completing sentences but also holding conversations, solving problems, and responding with context-aware, often remarkably fluent language.
Once the training stage of an LLM is concluded, we need to understand how to predict the next token with this model, and that process is called inference.
In the context of machine learning and AI, inference refers to the process of running a trained model on new input data to generate predictions or responses. In LLMs, inference involves processing a prompt and producing text-based output, typically requiring significant computational resources, especially for large models.
LLMs can be typically accessed through APIs, allowing developers to use them without managing complex infrastructure. This approach simplifies integration, making AI-powered applications more scalable and cost-effective.
LLM providers such as OpenAI, Azure AI,and Hugging Face offer APIs that handle requests and return responses in real time. The process usually involves the following:
Authentication: Developers use API keys or OAuth tokens for secure access.Definition
Authentication is how developers prove that their application has permission to access an external service. This is usually done using either API keys or OAuth tokens. An API key is a unique string provided by the service—such as a password—that identifies the app. OAuth, on the other hand, is a more flexible system that allows users to grant specific permissions to apps, issuing temporary access tokens in return. Both methods ensure that only authorized users or systems can make requests, helping protect sensitive data and resources.
Figure 1.3: An example of an HTTP request to the LLM API
Note
Some LLM APIs support streaming responses, where the model outputs tokens incrementally rather than waiting to generate the full response before sending it. This approach helps mitigate the high latency often associated with large models. By delivering the first chunk of text quickly, streaming reduces perceived latency—the time the user waits before seeing any output—leading to a smoother, more responsive experience.
Now, a legitimate question might be: what if I want to run my model on my local computer? To answer this question, we first need to distinguish between the following:
Private LLMs: These are proprietary models developed by companies such as OpenAI, Anthropic, or Google. They’re closed source, meaning you can’t see or modify their underlying code. Those models are typically only accessible via APIs, and they come with a pay-per-use, token-based cost.Open source LLMs: Open source models, such as Meta’s LlaMA, Mistral, and Falcon, are freely available for anyone to download, modify, and deploy. This means developers can access the underlying trained parameters, run them on private infrastructure, and even use the underlying architecture to retrain the model from scratch.Even for open source LLMs, however, many developers opt to access these models via APIs provided by platforms such as Azure AI Foundry and Hugging Face Hub.
This approach offers several advantages:
Reduced infrastructure costs: Running LLMs independently demands significant computational resources, which can be cost-prohibitive. Utilizing APIs shifts this burden to the service provider, allowing developers to leverage powerful models without investing in expensive hardware.Scalability: API services can dynamically scale to handle varying workloads, ensuring consistent performance without manual intervention.Security andcompliance: Platforms such as Azure AI Foundry offer enterprise-grade security features, helping organizations meet compliance requirements and protect sensitive data.In the context of AI agents and, more broadly, AI-powered apps, the most adopted path is that of consuming LLMs via APIs. Exceptions might be related to disconnected scenarios (e.g., running LLMs at offshore sites or remote locations) or regulatory constraints in terms of data residency (the LLM must reside in a specific country where there is no public cloud available).
The field of GenAI has experienced rapid advancements over the past few years, with breakthroughs that pushed the boundaries of efficiency, adaptability, and reasoning capabilities. In the following sections, we are going to explore some of the latest techniques that significantly enhance GenAI models’ performance while reducing computational demands.
Small language models (SLMs) are becoming increasingly relevant as organizations seek efficient, cost-effective alternatives to large-scale AI systems.
SLMs are a streamlined category of GenAI models designed to efficiently process and generate natural language while using fewer computational resources than their larger counterparts. Unlike LLMs, which can have hundreds of billions of parameters, SLMs typically contain only a few million to a few billion parameters.
The reduced size of SLMs allows them to be deployed in environments with limited hardware capabilities, such as mobile devices, edge computing systems, and offline applications. By focusing on domain-specific tasks, SLMs can deliver performance comparable to LLMs within their specialized areas while being more cost-effective and energy-efficient.
SLMs can be designed as domain-specific models since their pre-training stage, or they can be adjusted and tailored after their first training (which will still be general-purpose, as LLMs). The process of further specializing a model on a specific domain is called fine-tuning.
The fine-tuning process involves using smaller, task-specific datasets to customize the foundation models for particular applications.
This approach differs from the first one because, with fine-tuning, the parameters of the pre-trained model are altered and optimized toward the specific task. This is done by training the model on a smaller labeled dataset that is specific to the new task. The key idea behind fine-tuning is to leverage the knowledge learned from the pre-trained model and fine-tune it to the new task, rather than training a model from scratch.
Figure 1.4: An illustration of the process of fine-tuning
In the preceding figure, you can see a schema on how fine-tuning works on OpenAI pre-built models. The idea is that you have a pre-trained model available with general-purpose weights or parameters. Then, you feed your model with custom data, typically in the form of “key-value” prompts and completions. In practice, you are providing your model with a set of examples of how it should answer (completions) to specific questions (prompts).
Here, you can see an example of how these key-value pairs might look:
{"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"} ...Once the training is done, you will have a customized model that is particularly suited for a given task, for example, the classification of your company’s documentation.
The major benefit of fine-tuning is that you can make pre-built models tailored to your use cases, without the need to retrain them from scratch, yet leveraging smaller training datasets and hence less training time and compute. At the same time, the model keeps its generative power and accuracy learned via the original training, the one that occurred on the massive dataset.
Fine-tuning is particularly valuable for SLMs, as it enables them to achieve high performance while maintaining their efficiency.
Several advanced fine-tuning techniques have been developed to optimize the process, especially for SLMs:
Low-rank adaptation (LoRA): This method inserts trainable low-rank matrices into the model’s layers, allowing adaptation to new tasks with minimal computational overhead. LoRA is highly efficient in terms of memory usage and is widely used to fine-tune large models on limited hardware.Adapter tuning: Instead of modifying the entire model, small neural network modules called adapters are added to each layer. During fine-tuning, only these adapters are updated, significantly reducing the number of trainable parameters while preserving the model’s pre-trained knowledge.Prefix tuning and prompt tuning: These techniques guide the model’s output by attaching