29,99 €
Explore the transformative potential of GenAI in the application development lifecycle. Through concrete examples, you will go through the process of ideation and integration, understanding the tradeoffs and the decision points when integrating GenAI.
With recent advances in models like Google Gemini, Anthropic Claude, DALL-E and GPT-4o, this timely resource will help you harness these technologies through proven design patterns.
We then delve into the practical applications of GenAI, identifying common use cases and applying design patterns to address real-world challenges. From summarization and metadata extraction to intent classification and question answering, each chapter offers practical examples and blueprints for leveraging GenAI across diverse domains and tasks. You will learn how to fine-tune models for specific applications, progressing from basic prompting to sophisticated strategies such as retrieval augmented generation (RAG) and chain of thought.
Additionally, we provide end-to-end guidance on operationalizing models, including data prep, training, deployment, and monitoring. We also focus on responsible and ethical development techniques for transparency, auditing, and governance as crucial design patterns.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 277
Veröffentlichungsjahr: 2024
Generative AI Application Integration Patterns
Integrate large language models into your applications
Juan Pablo Bustos
Luis Lopez Soria
Generative AI Application Integration Patterns
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Senior Publishing Product Manager: Tushar Gupta
Acquisition Editor – Peer Reviews: Tejas Mhasvekar
Project Editor: Meenakshi Vijay
Content Development Editor: Shazeen Iqbal
Copy Editor: Safis Editing
Technical Editor: Gaurav Gavas
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Presentation Designer: Rajesh Shirsath
Developer Relations Marketing Executive: Maran Fernandes
First published: August 2024
Production reference: 1290824
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-83588-760-8
www.packt.com
The field of Artificial Intelligence is in the midst of bringing about a profound transformation in business. One of the pivotal areas of this transformation is in the integration with applications that are already running businesses, and adding value to them. This book, Generative AI Application Integration Patterns explores these recurrent themes as patterns of integration with generative AI. It serves as a timely guide for navigating the nuanced landscape of integrating GenAI into existing business applications. It peers into the fog of the immense potential of GenAI, and provides practical clarity that may help you revolutionize key competitive aspects of your business operations; areas like enhancing customer experiences to domains such as streamlining internal processes. By focusing on the practical aspects of integration, the authors equip readers with the background knowledge and tools they need to leverage this transformative technology even more effectively.
Juan and Luis delve into the underlying practical aspects of generative AI, and in doing so, provide a solid foundation for understanding and actualizing its capabilities and navigating its limitations. They explore the different architectural integration patterns that can be employed for more seamless integration, and consider how factors such as scalability, performance, and security should be taken into account. The practical case studies presented throughout the book showcase how successful implementations of GenAI can be realized across industries. These examples are exemplary blueprints that demonstrate how businesses can leverage this technology to achieve tangible outcomes.
In addition, this book addresses some of the critical considerations of responsible AI development and deployment. It emphasizes the importance of ethical considerations, data privacy, and bias mitigation, that help ensure that GenAI is utilized in a manner that aligns with ethical principles and societal values. This holistic approach helps readers not only gain technical expertise but also develop a deeper appreciation of the ethical challenges and implications of their work.
Generative AI Application Integration Patterns is a well-written, engaging and very relevant set of blueprints that technology and business leaders, as well as developers, should be aware of as they seek to integrate applications with the promise presented in GenAI. I encourage you to dive deep into the examples, reflect on the concepts presented in this book, and embark on the exciting journey of discovery and innovation in harnessing the potential of GenAI. The future of business is being shaped by AI, and this book is an essential companion on that path.
Dr. Ali Arsanjani
Director of Applied AI Engineering, Google
Juan Pablo Bustos is a seasoned technology professional specializing in artificial intelligence and machine learning. With a background in computer science, Juan has held leadership positions at major tech companies including Google, Stripe, and Amazon Web Services. His expertise spans AI services, solution architecture, and cloud computing. Juan is passionate about helping organizations leverage cutting-edge technologies to drive innovation and deliver value.
I’m deeply grateful to my wife Cinthia for her constant support, encouragement, and for being my sounding board for even my craziest ideas. Thanks to Penny and Andrew, my kids, for their patience. I’d like to acknowledge my father, Dr. Sergio Bustos, for encouraging me to pursue computer science. I’m indebted to Dr. Ali Arsanjani, Robert Love, and Todd Reagan for their invaluable mentorship and friendship. A special thanks to my friend Luis, my partner in crime for this book. Finally, I’d like to recognize Gemini, Claude, and ChatGPT for their invaluable help and for democratizing access to GenAI.
Luis Lopez Soria is an experienced software architect specializing in AI/ML. He has gained practical experience from top firms across heavily regulated industries like healthcare and finance, as well as big tech firms like AWS and Google. He brings a blended approach from his experience managing global partnerships, AI product development, and customer-facing roles. Luis is passionate about learning new technologies and using these to create business value.
I want to thank my parents and sister. Your unwavering support, willingness to lend an ear, and readiness to brainstorm ideas have been invaluable. To my grandfather Felix and uncle Ricardo: your presence and support by my side made this dream a reality.
A special thanks goes to Chris K. and Juan B., whose early influence on my career cannot be overstated. Your constant push for excellence and valuable input have left an indelible mark on my professional growth and, by extension, on this book.
To all of you, and the many others who have contributed in ways both big and small, I offer my heartfelt gratitude. This book stands as a testament to your belief in me and your ongoing support.
Aditi Khare holds 8+ years of experience in the AI research and product engineering space.
She is passionate about AI research, open source, and building production-grade AI products. She has worked for Fortune 50 product companies. She has completed a big data analytics course at the Indian Institute of Management, Ahmedabad, and a master’s in computer applications at K. J. Somaiya Institute of Management, Mumbai. In her spare time, she enjoys reading AI-related research papers and publishing research paper summaries through her LinkedIn newsletter. For more information about her, visit https://www.linkedin.com/in/aditi-khare-5840977b/.
I’d like to dedicate my contribution to this book to the loving memory of my beloved mom, the late Mrs. Shashi Khare, who has always been my inspiration and the reason for my achievements.
I’d like to thank my father Mr. Alok Khare and my brother Ayush Khare for being very supportive and acting as a guiding force in all my achievements.
Join our community’s Discord space for discussions with the authors and other readers:
https://packt.link/genpat
Preface
Who this book is for
What this book covers
To get the most out of this book
Get in touch
Introduction to Generative AI Patterns
From AI predictions to generative AI
Predictive AI vs generative AI use case ideation
A change in the paradigm
Predictive AI use case development – simplified lifecycle
Generative AI use case development – simplified lifecycle
General generative AI concepts
Generative AI model architectures
Techniques available to optimize foundational models
Techniques to augment your foundational model responses
Constant evolution across the generative AI space
Introducing generative AI integration patterns
Summary
Identifying Generative AI Use Cases
When to consider generative AI
Realizing business value
Identifying Generative AI use cases
Potential business-focused use cases
Generative AI deployment and hosting options
Summary
Designing Patterns for Interacting with Generative AI
Defining an integration framework
Entry point
Prompt pre-processing
Inference
Results post-processing
Selecting from amongst multiple outputs
Refining generated outputs
Results presentation
Logging
Summary
Generative AI Batch and Real-Time Integration Patterns
Batch and real-time integration patterns
Different pipeline architectures
Application integration patterns in the integration framework
Entry point
Prompt pre-processing
Inference
Result post-processing
Result presentation
Use case example – search enhanced by GenAI
Batch integration – document ingestion
Real-time integration – search
Summary
Integration Pattern: Batch Metadata Extraction
Use case definition
Architecture
Entry point
Prompt pre-processing
Inference
Result post-processing
Result presentation
Summary
Integration Pattern: Batch Summarization
Use case definition
Architecture
Entry point
Prompt pre-processing
Inference
Result post-processing
Result presentation
Summary
Integration Pattern: Real-Time Intent Classification
Use case definition
Architecture
Entry point
Prompt pre-processing
Inference
Result post-processing
Result presentation
Logging and monitoring
Summary
Integration Pattern: Real-Time Retrieval Augmented Generation
Use case definition
Architecture
Entry point
Prompt pre-processing
Inference
Result post-processing
Result presentation
Use case demo
The Gradio app
Summary
Operationalizing Generative AI Integration Patterns
Operationalization framework
Data layer
A real-world example: Part 1
Training layer
A real-world example: Part 2
Inference layer
A real-world example: Part 3
Operations layer
CI/CD and MLOps
Monitoring and observability
Evaluation and monitoring
Alerting
Distributed tracing
Logging
Cost optimization
Summary
Embedding Responsible AI into Your GenAI Applications
Introduction to responsible AI
Fairness in GenAI applications
Interpretability and explainability
Privacy and data protection
Safety and security in GenAI systems
Google’s approach to responsible AI
Google’s Secure AI Framework (SAIF)
Google’s Red Teaming approach
Anthropic’s approach to responsible AI
Summary
Other Books You May Enjoy
Index
Cover
Index
Once you’ve read Generative AI Application Integration Patterns, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily.
Follow these simple steps to get the benefits:
Scan the QR code or visit the link below:https://packt.link/free-ebook/9781835887608
Submit your proof of purchase.That’s it! We’ll send your free PDF and other benefits to your email directly.This chapter provides an overview of key concepts, techniques, and integration patterns related to generative AI that will empower you to harness these capabilities in real-world applications.
We will provide an overview of generative AI architectures, such as transformers and diffusion models, which are the basis for these generative models to produce text, images, audio, and more. You’ll get a brief introduction to specialized training techniques, like pre-training and prompt engineering, that upgrade basic language models into creative powerhouses.
Understanding the relentless pace of innovation in this space is critical due to new models and ethical considerations emerging constantly. We’ll introduce strategies for experimenting rapidly while ensuring responsible, transparent development.
The chapter also introduces common integration patterns for connecting generative AI into practical workflows. Whether crafting chatbots that leverage models in real time or performing batch enrichment of data, we will introduce prototyping blueprints to jumpstart building AI-powered systems.
By the end, you will have a one-thousand-foot view of which generative AI models are available, why experimentation is important, and how these integration patterns can help create value for your organization leveraging generative AI.
In a nutshell, the following main topics will be covered:
Interacting with AIPredictive AI vs generative AI use case ideationA change in the paradigmGeneral generative AI conceptsIntroduction to generative AI integration patternsThe intent of this section is to provide a brief overview of artificial intelligence, highlighting our initial experiences with it. In the early 2000s, AI started to become more tangible for consumers. For example, in 2001, Google introduced the “Did you mean?” feature (https://blog.google/intl/en-mena/product-updates/explore-get-answers/25-biggest-moments-in-search-from-helpful-images-to-ai/), which suggests spelling corrections. This was one of Google’s first applications of machine learning and one of the early AI features that the general public got to experience on a large scale.
Over the following years, AI systems became more sophisticated, especially in areas like computer vision, speech-to-text conversion, and text-to-speech synthesis. Working in the telecom industry helped me witness the innovation driven by speech-to-text in particular. Integrating speech-to-text capabilities into interactive voice response (IVR) systems led to better user experiences by allowing people to speak their requests rather than punch numbers into a keypad. For example, you could be calling a bank where you would be welcomed by a message asking you to say “balance” to check your balance, “open account” in order to open an account, etc. Nowadays we are seeing more and more implementations of AI, simplifying more complex and time-consuming tasks.
The exponential increase in available computing power, paired with the massive datasets needed to train machine learning models, unleashed new AI capabilities. In the 2010s, AI started matching and even surpassing human performance on certain tightly defined tasks like image classification.
The advent of generative AI has reignited interest and innovation in the AI field, introducing new approaches for exploring use cases and system integration. Models like Gemini, PaLM, Claude, DALL-E, OpenAI GPT, and Stable Diffusion showcase the ability of AI systems to generate synthetic text, images, and other media. The outputs exhibit creativity and imagination that capture the public’s attention. However, the powerful capabilities of generative models also highlight new challenges around system design and responsible deployment. There is a need to rethink integration patterns and architecture to support safe, robust, and cost-effective implementations. Specifically, issues around security, bias, toxicity, and misinformation must be addressed through techniques like dataset filtering, human-in-the-loop systems, enhanced monitoring, and immediate remediation.
As generative AI continues maturing, best practices and governance frameworks must evolve in tandem. Industry leaders have formed partnerships like the Content Authenticity Initiative to develop technical standards and policy guidance around the responsible development of the next iteration of AI. This technology’s incredible potential, from accelerating drug discovery to envisioning new products, can only be realized through a commitment to transparency, ethics, and human rights. Constructive collaboration that balances innovation with caution is imperative.
Generative AI marks an inflection point for the field. The ripples from this groundswell of creative possibility are just beginning to reach organizations and communities. Maintaining an open, evidence-driven dialogue around not just capabilities but also challenges lays a foundation for AI deployment that empowers people, unlocks new utility, and earns widespread trust.
We are witnessing an unprecedented democratization of generative AI capabilities through publicly accessible APIs from established companies like Google, Meta, and Amazon, and startups such as Anthropic, Mistral AI, Stability AI, and OpenAI. The table below summarizes several leading models that provide versatile foundations for natural language and image generation.
Just a few years ago, developing with generative AI required specialized expertise in deep learning and access to vast computational resources. Now, models like Gemini, Claude, GPT-4, DALL-E, and Stable Diffusion can be accessed via simple API calls at near-zero cost. The bar for experimentation has never been lower.
This commoditization has sparked an explosion of new applications leveraging these pre-trained models – from creative tools for content generation to process automation solutions infused with AI. Expect integrations with generative foundations across all industries in the coming months and years.
Models are becoming more knowledgeable, with broader capabilities and reasoning that will reduce hallucinations and increase accuracy across model responses. Multimodality is also gaining traction, with models able to ingest and generate content across text, images, audio, video, and 3D scenes. In terms of scalability, model size and context windows continue expanding exponentially; for example, Google’s Gemini 1.5 now supports a context window of 1 million tokens.
Overall, the outlook points to a future where generative AI will become deeply integrated into most technologies. These models introduce new efficiencies and automation potential and inspire creativity across nearly every industry imaginable.
The table below highlights some of the most popular LLMs and their providers. The purpose of the table is to highlight the vast number of options available on the market at the time of writing this book. We expect this table to quickly become outdated by the time of publication and highly encourage readers to dive deep into the model providers’ websites to stay up to date with any new launches.
Model
Provider
Landing Page
Gemini
https://deepmind.google/technologies/gemini
Claude
Anthropic
https://claude.ai/
ChatGPT
OpenAI
https://openai.com/blog/chatgpt
Stable Diffusion
Stability AI
https://stability.ai/
Mistral
Mistral AI
https://mistral.ai/
LLaMA
Meta
https://llama.meta.com/
Table 1.1: Overview of popular LLMs and their providers
Predictive AI refers to systems that analyze data to identify patterns and make forecasts or classifications about future events. In contrast, generative AI models create new synthetic content like images, text, or code based on the patterns gleaned from their training data. For example, with predictive AI, you can confidently identify if an image contains a cat or not, whereas with generative AI you can create an image of a cat from a text prompt, modify an existing image to include a cat where there was none, or generate a creative text blurb about a cat.
Product innovation focused on AI involves various phases of the product development lifecycle. With the emergence of generative AI, the paradigm has shifted away from initially needing to compile training data to train traditional ML models and toward leveraging flexible pre-trained models.
Foundational models like Google’s PaLM 2 and Gemini, OpenAI’s GPT and DALL-E, and Stable Diffusion provide broad foundations enabling rapid prototype development. Their versatile capabilities lower the barrier for experimenting with novel AI applications.
Where previously data curation and model training from scratch could take months before assessing viability, now proof-of-concept generation is possible within days without the need to fine-tune a foundation model.
This generative approach facilitates more iterative concept validation. After quickly building an initial prototype powered by the baseline model, developers can then collect niche training data and perform knowledge transfer via techniques like distillation to customize later versions; we will deep dive into the concept of distillation later in the book. The model’s primary foundation contains already encoded patterns useful for kickstarting and for iterations of new models.
In contrast, the predictive modeling approach requires upfront data gathering and training before any application testing. This more linear progression limits early-stage flexibility. However, predictive systems can efficiently learn specialized correlations and achieve a high level of confidence inference metrics once substantial data exists.
Leveraging versatile generative foundations supports rapid prototyping and use case exploration. But, later, custom predictive modeling boosts performance on narrow tasks with sufficient data. Blending these AI approaches capitalizes on their complementary strengths throughout the model deployment lifecycle.
Beyond the basic use – prompt engineering – of a foundational model, several auxiliary, more complex techniques can enhance its capabilities. Examples include Chain-of-Thought (CoT) and ReAct, which empower the model to not only reason about a situation but also define and evaluate a course of action.
ReAct, presented in the paper ReAct: Synergizing Reasoning and Acting in Language Models (https://arxiv.org/abs/2210.03629), addresses the current disconnect between LLMs’ language understanding and their ability to make decisions. While LLMs excel at tasks like comprehension and question answering, their reasoning and action-taking skills (for example, generating action plans or adapting to unforeseen situations) are often treated separately.
ReAct bridges this gap by prompting LLMs to generate both “reasoning traces,” detailing the model’s thought process, and task-specific actions in an interleaved manner. This tight coupling allows the model to leverage reasoning for planning, execution monitoring, and error handling, while simultaneously using actions to gather additional information from external sources like knowledge bases or environments. This integrated approach demonstrably improves LLM performance in both language and decision-making tasks.
For example, in question-answering and fact-verification tasks, ReAct combats common issues like hallucination and error propagation by utilizing a simple Wikipedia API. This interaction allows the model to generate more transparent and trustworthy solutions compared to methods lacking reasoning or action components. LLM hallucinations are defined as content generated that seems plausible yet factually unsupported. There are various papers that aim to address this phenomenon. For example, A survey of Hallucination in Large Language Models – Principles, Taxonomy, Challenges, and Open Questions deep dives into an approach to not only identify but also mitigate hallucinations. Another good example of a mitigation technique is covered in the paper Chain-of-Verification Reduces Hallucination in Large Language Models (https://arxiv.org/pdf/2309.11495.pdf). At the time of writing this book, hallucinations are a very rapidly changing field.
Both CoT and ReAct rely on prompting: feeding the LLM with carefully crafted instructions that guide its thought process. CoT, as presented in the paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (https://arxiv.org/abs/2201.11903), focuses on building a chain of reasoning steps, mimicking human thinking. Imagine prompting the model with: “I want to bake a cake. First, I need flour. Where can I find some?” The model responds with a potential source, like your pantry. This back-and-forth continues, building a logical chain of actions and decisions.
ReAct takes things a step further, integrating action into the reasoning loop. Think of it as a dynamic dance between thought and action. The LLM not only reasons about the situation but also interacts with the world, fetching information or taking concrete steps, and then updates its reasoning based on the results. It’s like the model simultaneously planning a trip and checking maps to adjust the route if it hits a roadblock.
This powerful synergy between reasoning and action unlocks a new realm of possibilities for LLMs. CoT and ReAct tackle challenges like error propagation (jumping to the wrong conclusions based on faulty assumptions) by allowing the model to trace its logic and correct course. They also improve transparency, making the LLM’s thought process clear and understandable.
In other words, large language models (LLMs) are like brilliant linguists, adept at understanding and generating text. But when it comes to real-world tasks demanding reasoning and action, they often stumble. Here’s where techniques like CoT and ReAct enter the scene, transforming LLMs into reasoning powerhouses.
Imagine an LLM helping diagnose a complex disease. CoT could guide it through a logical chain of symptoms and examinations, while ReAct could prompt it to consult medical databases or run simulations. This not only leads to more accurate diagnoses but also enables doctors to understand the LLM’s reasoning, fostering trust and collaboration.
These futuristic applications are what drive us to keep building and investing in this technology, which is very exciting. Before we dive deep into the patterns that are needed to leverage generative AI technology to generate business value, let’s take a step back and look at some initial concepts.
It feels like eons ago in tech years, but let’s rewind just a couple of years, back when if you were embarking on solving an AI problem, you couldn’t default to utilizing a pre-trained model through the web or a managed endpoint. The process was meticulous – you’d have to first clearly define the specific use case, identify what data you had available and could collect to train a custom model, select the appropriate algorithm and model architecture, train the model using specialized hardware and software, and validate if the outputs would actually help solve the task at hand. If all went well, you would have a model that would take a predefined input and also provide a predefined output.
The paradigm profoundly shifted with the advent of LLMs and large multimodal models. Suddenly, you could access a pre-trained model with billions of parameters and start experimenting right off the bat with these versatile foundational models where the inputs and outputs are dynamic in nature. After tinkering around, you’d then evaluate if any fine-tuning is necessary to adapt the model to your needs, rather than pre-training an entire model from scratch. And spoiler alert – in most cases, chances are you won’t even need to fine-tune a foundational model.
Another key shift relates to the early belief that one model would outperform all others and solve all tasks. However, the model itself is just the engine; you still need an entire ecosystem packaged together to provide a complete solution. Foundational models have certainly demonstrated some incredible capabilities beyond initial expectations. But we also observe that certain models are better suited for certain tasks. And running the same prompt through other models can produce very different outputs depending on the underlying model’s training datasets and architecture.
So, the new experimental path often focuses first on prompt engineering, response evaluation, and then fine-tuning the foundational model if gaps exist. This contrasts sharply with the previous flow of data prep, training, and experimentation before you could get your hands dirty. The bar to start creating with AI has never been lower.
In the following sections, we will explore the difference between the development lifecycle of predictive AI and generative AI use cases. In each section, we have provided a high-level visual representation of a simplified development lifecycle and an explanation of the thought process behind each approach.
Figure 1.1: Predictive AI use case development simplified lifecycle
Let’s dive into the process of developing a predictive AI model first. Everything starts with a good use case, and ROI (return on investment) is top of mind when evaluating AI use cases. Think about pain points in your business or industry that could be solved by predicting an outcome. It is very important to always keep an eye on feasibility – whether you can procure the data you need, etc.
Once you’ve landed on a compelling value-driven use case, next up is picking algorithms. You’ve got endless options here – decision trees, neural nets, regressions, random forests, and on and on. It is very important not to be swayed by the bias for the latest and greatest and to focus on the core requirements of your data and use case to narrow the options down. You can always switch it up or add additional experiments as you iterate through your testing.