AI Agents in Practice - Valentina Alto - E-Book

AI Agents in Practice E-Book

Valentina Alto

0,0
32,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

As AI agents evolve to take on complex tasks and operate autonomously, you need to learn how to build these next-generation systems. Author Valentina Alto brings practical, industry-grounded expertise in AI Agents in Practice to help you go beyond simple chatbots and create AI agents that plan, reason, collaborate, and solve real-world problems using large language models (LLMs) and the latest open-source frameworks.
In this book, you'll get a comparative tour of leading AI agent frameworks such as LangChain and LangGraph, covering each tool's strengths, ideal use cases, and how to apply them in real-world projects. Through step-by-step examples, you’ll learn how to construct single-agent and multi-agent architectures using proven design patterns to orchestrate AI agents working together. Case studies across industries will show you how AI agents drive value in real-world scenarios, while guidance on responsible AI will help you implement ethical guardrails from day one. The chapters also set the stage with a brief history of AI agents, from early rule-based systems to today's LLM-driven autonomous agents, so you understand how we got here and where the field is headed.
By the end of this book, you'll have the practical skills, design insights, and ethical foresight to build and deploy AI agents that truly make an impact.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 328

Veröffentlichungsjahr: 2025

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



AI Agents in Practice

Design, implement, and scale autonomous AI systems for production

Valentina Alto

AI Agents in Practice

Copyright © 2025 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing or its dealers and distributors will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Portfolio Director: Gebin George

Relationship Lead: Ali Abidi

Project Manager: Prajakta Naik

Content Engineer: Aditi Chatterjee

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Indexer: Pratik Shirodkar

Proofreader: Aditi Chatterjee

Production Designer: Alishon Falcon

Growth Lead: Nimisha Dua

First published: July 2025

Production reference: 1250725

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-80580-135-1

www.packtpub.com

To my family and friends—thank you for your support, patience, and encouragement throughout this journey.

– Valentina

Contributors

About the author

Valentina Alto is a technical architect specializing in AI and intelligent apps at Microsoft Innovation Hub in Dubai. During her tenure at Microsoft, she covered different roles as a solution specialist, focusing on data, AI, and applications workloads within the manufacturing, pharmaceutical, and retail industries, and driving customers’ digital transformations in the era of AI. Valentina is an active tech author and speaker who contributes to books, articles, and events on AI and machine learning. Over the past two years, Valentina has published two books on generative AI and large language models, further establishing her expertise in the field.

I would like to thank my family and friends for their unwavering support, patience, and understanding throughout this process. Your encouragement has been invaluable.

I am also grateful to my colleagues and peers in the AI and technology community for the insightful discussions, feedback, and inspiration that have shaped my understanding of generative AI. Your contributions continue to push the boundaries of innovation.

A special thanks to Ali Abidi for giving me the opportunity to write this book in such an exciting moment in the era of AI agents. Special thanks to Prajakta Naik, Aditi Chatterjee, and Alishon Falcon for their valuable input and time reviewing this book, and to the entire Packt team for their support during the course of writing this book.

About the reviewers

Amey Ramakant Mhadgut is a software engineer with over five years of industry experience and a master’s degree in computer science. He specializes in big data, generative AI, Python, AWS, and scalable software architecture. Amey brings both academic depth and practical insight to his reviews, offering thoughtful analysis of technical content.

Prudhvi Raj Dachapally, backed by 7+ years of industry experience, is a senior applied scientist at eBay, where he builds innovative AI solutions to enhance the recommendation experience. Previously, he led the AI team at Cyndx, developing semantic search engines and AI assistants powered by fine-tuned large language models for financial B2B products. He has 180+ citations and publications in top conferences, including EMNLP and CogSci. He holds a master’s degree from Indiana University Bloomington and remains active through mentorship and alumni engagement.

I would like to thank my wife, Sri Vyshnavi, for her constant support during this review process, and my parents, Subramanyam and Sri Neela, whose sacrifices and belief form the foundation of every opportunity I have today.

Subscribe for a Free eBook

New frameworks, evolving architectures, research drops, production breakdowns—AI_Distilled filters the noise into a weekly briefing for engineers and researchers working hands-on with LLMs and GenAI systems. Subscribe now and receive a free eBook, along with weekly insights that help you stay focused and informed.

Subscribe at https://packt.link/TRO5B or scan the QR code below.

Contents

Contributors

Subscribe for a Free eBook

Preface

Who this book is for

What this book covers

To get the most out of this book

Disclaimer on AI usage

Get in touch

Join our Discord and Reddit space

Your Book Comes with Exclusive Perks – Here’s How to Unlock Them

Unlock Your Book’s Exclusive Benefits

How to unlock these benefits in three easy steps

Part 1: Foundations of AI Workflows and the Rise of AI Agents

Evolution of GenAI Workflows

Technical requirements

Understanding foundation models and the rise of LLMs

From narrow AI to foundation models

Under the hood of an LLM

How do we consume LLMs?

Latest significant breakthroughs

Small language models and fine-tuning

Model distillation

Reasoning models

DeepSeek

Road to AI agents

Text generation

Chat with your data

Multimodality

The need for an additional layer of intelligence: introducing AI agents

Summary

References

The Rise of AI Agents

Technical requirements

Evolution of agents from RPA to AI agents

Components of an AI agent

Different types of AI agents

Retrieval agents

Task agents

Autonomous agents

Summary

References

Subscribe for a Free eBook

Part 2: Designing, Building, and Scaling AI Agents

The Need for an AI Orchestrator

Introduction to AI orchestrators

Autonomy

Abstraction and modularity

Core components of an AI orchestrator

Workflow management

Memory and context handling

Tool and API integration

Error handling and monitoring

Security and compliance

Overview of the most popular AI orchestrators in the market

How to choose the right orchestrator for your AI agent

Summary

References

The Need for Memory and Context Management

Different types of memory

Short-term memory

Long-term memory

Semantic memory

Episodic memory

Procedural memory

In between short-term memory and long-term memory – the role of semantic caches

Managing context windows

Storing, retrieving, and refreshing memory

Temporal and spatial reasoning in AI agents

Popular tools to manage memory

LangMem

Mem0

LeTTA (formerly MemGPT)

Summary

References

Subscribe for a Free eBook

The Need for Tools and External Integrations

Technical requirements

The anatomy of an AI agent’s tools

Hardcoded and semantic functions

Hardcoded functions

Semantic functions

APIs and web services

Web APIs

Internal or enterprise APIs

Backend function APIs (service mesh or microservices)

Serverless functions/Lightweight APIs

Databases and knowledge bases

Structured data

Unstructured data

Synchronous versus asynchronous calls

Summary

References

Building Your First AI Agent with LangChain

Technical requirements

Introduction to the LangChain ecosystem

Build – the architectural foundation

Run – the operational layer

Manage – observability and iteration

Overview of out-of-the-box components

Use case – e-commerce AI agent

Scenario description

AskMamma’s building blocks

Developing the agent

Observability, traceability, and evaluation

Infusing the AI agent in the mobile app

Summary

References

Subscribe for a Free eBook

Multi-Agent Applications

Technical requirements

Introduction to multi-agent systems

Understanding and designing different workflows for your multi-agent system

Overview of multi-agent orchestrators

AutoGen

TaskWeaver

OpenAI Agents SDK

LangGraph

Building your first multi-agent application with LangGraph

Summary

References

Part 3: Road to an Open, Agentic Ecosystem

Orchestrating Intelligence: Blueprint for Next-Gen Agent Protocols

Technical requirements

What is a protocol?

Understanding the Model Context Protocol

Agent2Agent

Agent Commerce Protocol

Toward an agentic web

From the traditional web to the agentic web

Key components of NLWeb

Current progress and applications

Summary

References

Subscribe for a Free eBook

Navigating Ethical Challenges in Real-World AI

Ethical challenges in AI – fairness, transparency, privacy, and accountability

Fairness and bias

Transparency and explainability

Privacy and data protection

Accountability and liability

Safety and reliability

Agentic AI autonomy and its unique ethical challenges

Autonomy versus human control

Deception and manipulation

Unintended consequences and liability for agent actions

Responsible AI principles and practices

From principles to practice

Guardrails for safe and ethical AI

What are AI guardrails?

Content filtering and moderation in AI systems

Why content filtering is needed

How content filtering works

Ethical considerations in content moderation

Addressing the challenges: governance, regulations, and collaboration

Organizational governance and culture

Industry collaboration and self-regulation

Government regulations and policy

Summary

References

Other Books You May Enjoy

Join our Discord and Reddit space

Index

Landmarks

Cover

Index

Part 1

Foundations of AI Workflows and the Rise of AI Agents

In Part 1 of this book, we explore the evolution of AI development since the rise of large language models (LLMs) in late 2022, leading up to the emergence of AI agents as a new architectural paradigm.

This part begins by tracing how AI workflows have shifted—from simple API calls to more dynamic systems such as retrieval-augmented generation (RAG)—and highlights key technical breakthroughs such as fine-tuning, model distillation, and reinforcement learning from human feedback (RLHF). It introduces the concept of agentic behavior as the next frontier in building intelligent, goal-driven systems.

We then dive into what makes an AI agent: how it differs from traditional automation, the components that enable it (LLMs, tools, memory, and knowledge), and why the field is rapidly converging on more autonomous, interactive systems.

You’ll also gain an understanding of how AI agents build on the foundation of LLMs while incorporating new layers of intelligence and orchestration, setting the stage for more personalized, persistent, and task-oriented AI applications.

This part contains the following chapters:

Chapter 1, Evolution of GenAI WorkflowsChapter 2, The Rise of AI Agents

1

Evolution of GenAI Workflows

Over the past two years, large language models (LLMs) have reshaped the landscape of AI. From simple prompt-based interactions to complex applications across industries, LLMs have evolved rapidly, fueled by breakthroughs in architecture, training techniques, and fine-tuning strategies. As their capabilities evolved, the shift from ChatGPT to today’s agentic systems as of April 2025 marks a natural progression, where the addition of reasoning, planning, and action-taking capabilities represented a major technological leap.

This chapter explores the foundations of LLMs, how they’re built and consumed, and the differences between pre-trained and fine-tuned models. Most importantly, it sets the stage for the next leap forward: the emergence of AI agents.

In this chapter, we will cover the following topics:

Understanding foundation models and the rise of LLMsLatest significant breakthroughsRoad to AI agentsThe need for an additional layer of intelligence: introducing AI agents

By the end of this chapter, you’ll have a clear understanding of how LLMs evolved, how they’re trained and deployed, and why the road to truly intelligent systems inevitably leads to the emergence of AI agents.

Technical requirements

You can access the complete code for this chapter in the book’s accompanying GitHub repository at https://github.com/PacktPublishing/AI-Agents-in-Practice.

Understanding foundation models and the rise of LLMs

AI has undergone a fundamental transformation thanks to the emergence of foundation models—versatile, general-purpose models that can be adapted across a wide range of tasks. Among them, LLMs have taken center stage, redefining how we interact with machines through natural language.

From narrow AI to foundation models

Before the rise of foundation models, the field of AI was dominated by narrow AI—systems built to perform one specific task and nothing else. Each use case required a custom pipeline: a unique dataset, a dedicated model architecture, and a specialized training routine. If you wanted to classify emails as spam or not spam, you’d build a spam filter. If you needed to extract names and places from documents, you’d create a named entity recognizer. Want to summarize a news article? That would mean yet another bespoke model.

This fragmented approach had several drawbacks. Models were brittle—performing well only within the narrow domain they were trained for—and expensive to maintain. Any shift in the task or data distribution often meant retraining from scratch.

The introduction of foundation models marked a fundamental shift in how we build and think about AI systems. These models are trained on vast and diverse datasets that span multiple domains and tasks. The idea is to teach a single model a general understanding of the world—its language, structure, and patterns—during a large-scale pretraining phase. Once this general knowledge is embedded, the model can be adapted to specific tasks with minimal additional data and compute.

For example, instead of building a separate model for translating French to English, we can now take a pre-trained foundation model and fine-tune it on a smaller translation dataset. The pre-trained model already understands language syntax, grammar, and meaning. Fine-tuning simply aligns this understanding to a specific objective.

The key innovation behind foundation models is transfer learning. Rather than learning from scratch, these models transfer knowledge gained from general training to specific problems. This dramatically improves efficiency, reduces the amount of labeled data required, and leads to more robust and flexible AI systems.

Moreover, foundation models aren’t just about language. They apply across modalities: some models can process and generate not only text but also images, audio, or code.

In essence, foundation models act as a “base brain” for AI—trained once, then repurposed many times over. This scalability and adaptability have unlocked entirely new possibilities in how we build intelligent systems, setting the stage for more autonomous and interactive applications such as AI agents.

Now, we mentioned that foundation models are capable of managing a variety of data formats. Within the cluster of foundation models, we can find data-specific models that focus on only one data type, which is the case of LLMs.

Figure 1.1: Features of LLMs

LLMs are, in essence, the language-specialized version of foundation models. They’re built on deep neural network architectures—particularly transformers—and are trained to predict the next word in a sequence. But this seemingly simple goal unlocks surprisingly emergent behaviors. LLMs can carry on conversations, answer intricate questions, write code, and even simulate reasoning.

Definition

Emergent behaviors are complex capabilities that arise unexpectedly when a system reaches a certain scale, even though those capabilities weren’t explicitly programmed or anticipated. In the context of LLMs, these behaviors surface when models are scaled up in terms of data, parameters, and training time—unlocking new abilities that were absent in smaller versions.

As models scale, they begin to exhibit emergent properties, including the following:

In-context learning: LLMs can learn to perform a task simply by being shown a few examples in the prompt—without any fine-tuning. This was not seen in smaller models.Chain-of-thought reasoning: By generating intermediate reasoning steps, LLMs can solve multi-step problems such as math word problems or logical puzzles—something they previously struggled with.Analogical reasoning: They can solve analogy problems (e.g., “cat is to kitten as dog is to...”) in a way that resembles human cognitive processing.Arithmetic and logic: At scale, LLMs develop the ability to handle tasks such as multi-digit arithmetic or logic puzzles, even if these tasks weren’t part of their training objective.Understanding metaphors and humor: Advanced LLMs can interpret new metaphors or even attempt jokes—demonstrating an abstract grasp of language and nuance.Multi-task generalization: Rather than being trained for one specific task, they can simultaneously handle translation, summarization, question answering, and more—without task-specific training.

These capabilities represent more than just better performance—they are qualitatively new behaviors that “emerge” only at scale, giving LLMs a surprisingly broad range of skills with real-world implications across domains.

Under the hood of an LLM

At the core of every LLM is a powerful neural network architecture—most commonly, a transformer. These networks are built to process and understand patterns in data, particularly human language, by learning statistical relationships across billions of text examples. Though loosely inspired by the structure of the human brain, LLMs function purely through mathematics, passing information through interconnected layers that adapt as the model trains.

To make language computable, the first step is to convert it into numbers because neural networks can’t process raw text. This happens through two key steps—tokenization and embedding:

Tokenization breaks down sentences into smaller chunks called tokens. These could be full words or parts of words, depending on the model. For example, “The cat sat on the mat” might be split into individual words or smaller subword units, depending on the tokenizer used.Embedding takes these tokens and maps each one to a high-dimensional vector—a string of numbers that encodes its meaning and relationship to other words. These embeddings are learned during training so that similar words end up in similar regions of the model’s “semantic space.” This helps the model understand context and word usage, such as how “Paris” and “London” relate as cities.

Figure 1.2: An example of embedding

Once the input is tokenized and embedded, it moves through the transformer network itself. Unlike traditional neural networks with just a few hidden layers, LLMs use dozens—or even hundreds—of stacked layers, each containing mechanisms called attention heads. These attention layers help the model decide which parts of the input are most relevant to a given prediction. For instance, when completing a sentence, the model learns to focus more on specific previous words that influence what should come next.

Training an LLM means teaching it to make better predictions over time. This is done through a method called backpropagation, where the model compares its predicted word to the correct one, calculates how far off it was, and then updates its internal parameters to reduce future errors.

Definition

Backpropagation is the core learning algorithm used to train neural networks. It works by comparing the model’s prediction to the correct answer, calculating the error (called the loss), and then adjusting the network’s internal parameters (weights) to reduce that error. This adjustment happens by “propagating” the error backward through the layers of the network—hence the name. Over time, this process helps the model make increasingly accurate predictions.

Let’s say you type The cat is on the.... The model predicts the next word by assigning probabilities to possible continuations such as mat, roof, or sofa. It doesn’t guess randomly—it relies on patterns it has seen during training.

This process is repeated across vast amounts of data—millions or billions of sentences—enabling the model to gradually capture the structure and rhythm of language. The result is a system capable of not just completing sentences but also holding conversations, solving problems, and responding with context-aware, often remarkably fluent language.

How do we consume LLMs?

Once the training stage of an LLM is concluded, we need to understand how to predict the next token with this model, and that process is called inference.

In the context of machine learning and AI, inference refers to the process of running a trained model on new input data to generate predictions or responses. In LLMs, inference involves processing a prompt and producing text-based output, typically requiring significant computational resources, especially for large models.

LLMs can be typically accessed through APIs, allowing developers to use them without managing complex infrastructure. This approach simplifies integration, making AI-powered applications more scalable and cost-effective.

LLM providers such as OpenAI, Azure AI,and Hugging Face offer APIs that handle requests and return responses in real time. The process usually involves the following:

Authentication: Developers use API keys or OAuth tokens for secure access.

Definition

Authentication is how developers prove that their application has permission to access an external service. This is usually done using either API keys or OAuth tokens. An API key is a unique string provided by the service—such as a password—that identifies the app. OAuth, on the other hand, is a more flexible system that allows users to grant specific permissions to apps, issuing temporary access tokens in return. Both methods ensure that only authorized users or systems can make requests, helping protect sensitive data and resources.

Sending a request: A structured JSON request includes the model name, prompt, and parameters such as temperature (for randomness).Receiving a response: The API returns a generated text output along with metadata such as token usage.

Figure 1.3: An example of an HTTP request to the LLM API

Note

Some LLM APIs support streaming responses, where the model outputs tokens incrementally rather than waiting to generate the full response before sending it. This approach helps mitigate the high latency often associated with large models. By delivering the first chunk of text quickly, streaming reduces perceived latency—the time the user waits before seeing any output—leading to a smoother, more responsive experience.

Now, a legitimate question might be: what if I want to run my model on my local computer? To answer this question, we first need to distinguish between the following:

Private LLMs: These are proprietary models developed by companies such as OpenAI, Anthropic, or Google. They’re closed source, meaning you can’t see or modify their underlying code. Those models are typically only accessible via APIs, and they come with a pay-per-use, token-based cost.Open source LLMs: Open source models, such as Meta’s LlaMA, Mistral, and Falcon, are freely available for anyone to download, modify, and deploy. This means developers can access the underlying trained parameters, run them on private infrastructure, and even use the underlying architecture to retrain the model from scratch.

Even for open source LLMs, however, many developers opt to access these models via APIs provided by platforms such as Azure AI Foundry and Hugging Face Hub.

This approach offers several advantages:

Reduced infrastructure costs: Running LLMs independently demands significant computational resources, which can be cost-prohibitive. Utilizing APIs shifts this burden to the service provider, allowing developers to leverage powerful models without investing in expensive hardware.Scalability: API services can dynamically scale to handle varying workloads, ensuring consistent performance without manual intervention.Security andcompliance: Platforms such as Azure AI Foundry offer enterprise-grade security features, helping organizations meet compliance requirements and protect sensitive data.

In the context of AI agents and, more broadly, AI-powered apps, the most adopted path is that of consuming LLMs via APIs. Exceptions might be related to disconnected scenarios (e.g., running LLMs at offshore sites or remote locations) or regulatory constraints in terms of data residency (the LLM must reside in a specific country where there is no public cloud available).

Latest significant breakthroughs

The field of GenAI has experienced rapid advancements over the past few years, with breakthroughs that pushed the boundaries of efficiency, adaptability, and reasoning capabilities. In the following sections, we are going to explore some of the latest techniques that significantly enhance GenAI models’ performance while reducing computational demands.

Small language models and fine-tuning

Small language models (SLMs) are becoming increasingly relevant as organizations seek efficient, cost-effective alternatives to large-scale AI systems.

SLMs are a streamlined category of GenAI models designed to efficiently process and generate natural language while using fewer computational resources than their larger counterparts. Unlike LLMs, which can have hundreds of billions of parameters, SLMs typically contain only a few million to a few billion parameters.

The reduced size of SLMs allows them to be deployed in environments with limited hardware capabilities, such as mobile devices, edge computing systems, and offline applications. By focusing on domain-specific tasks, SLMs can deliver performance comparable to LLMs within their specialized areas while being more cost-effective and energy-efficient.

SLMs can be designed as domain-specific models since their pre-training stage, or they can be adjusted and tailored after their first training (which will still be general-purpose, as LLMs). The process of further specializing a model on a specific domain is called fine-tuning.

The fine-tuning process involves using smaller, task-specific datasets to customize the foundation models for particular applications.

This approach differs from the first one because, with fine-tuning, the parameters of the pre-trained model are altered and optimized toward the specific task. This is done by training the model on a smaller labeled dataset that is specific to the new task. The key idea behind fine-tuning is to leverage the knowledge learned from the pre-trained model and fine-tune it to the new task, rather than training a model from scratch.

Figure 1.4: An illustration of the process of fine-tuning

In the preceding figure, you can see a schema on how fine-tuning works on OpenAI pre-built models. The idea is that you have a pre-trained model available with general-purpose weights or parameters. Then, you feed your model with custom data, typically in the form of “key-value” prompts and completions. In practice, you are providing your model with a set of examples of how it should answer (completions) to specific questions (prompts).

Here, you can see an example of how these key-value pairs might look:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"} ...

Once the training is done, you will have a customized model that is particularly suited for a given task, for example, the classification of your company’s documentation.

The major benefit of fine-tuning is that you can make pre-built models tailored to your use cases, without the need to retrain them from scratch, yet leveraging smaller training datasets and hence less training time and compute. At the same time, the model keeps its generative power and accuracy learned via the original training, the one that occurred on the massive dataset.

Fine-tuning is particularly valuable for SLMs, as it enables them to achieve high performance while maintaining their efficiency.

Several advanced fine-tuning techniques have been developed to optimize the process, especially for SLMs:

Low-rank adaptation (LoRA): This method inserts trainable low-rank matrices into the model’s layers, allowing adaptation to new tasks with minimal computational overhead. LoRA is highly efficient in terms of memory usage and is widely used to fine-tune large models on limited hardware.Adapter tuning: Instead of modifying the entire model, small neural network modules called adapters are added to each layer. During fine-tuning, only these adapters are updated, significantly reducing the number of trainable parameters while preserving the model’s pre-trained knowledge.Prefix tuning and prompt tuning: These techniques guide the model’s output by attaching