32,39 €
Generative artificial intelligence technologies and services, including ChatGPT, are transforming our work, life, and communication landscapes. To thrive in this new era, harnessing the full potential of these technologies is crucial. Generative AI for Cloud Solutions is a comprehensive guide to understanding and using Generative AI within cloud platforms.
This book covers the basics of cloud computing and Generative AI/ChatGPT, addressing scaling strategies and security concerns. With its help, you’ll be able to apply responsible AI practices and other methods such as fine-tuning, RAG, autonomous agents, LLMOps, and Assistants APIs. As you progress, you’ll learn how to design and implement secure and scalable ChatGPT solutions on the cloud, while also gaining insights into the foundations of building conversational AI, such as chatbots. This process will help you customize your AI applications to suit your specific requirements.
By the end of this book, you’ll have gained a solid understanding of the capabilities of Generative AI and cloud computing, empowering you to develop efficient and ethical AI solutions for a variety of applications and services.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 438
Veröffentlichungsjahr: 2024
Generative AI for Cloud Solutions
Architect modern AI LLMs in secure, scalable, and ethical cloud environments
Paul Singh
Anurag Karuparti
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Niranjan Naikwadi
Publishing Product Manager: Nitin Nainani
Book Project Manager: Shambhavi Mishra
Senior Editor: Sushma Reddy
Technical Editor: Seemanjay Ameriya
Copy Editor: Safis Editing
Proofreader: Sushma Reddy
Indexer: Tejal Daruwale Soni
Production Designer: Gokul Raj S.T
DevRel Marketing Coordinator: Vinisha Kalra
First published: April 2024
Production reference: 1100424
Published by
Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-83508-478-6
www.packtpub.com
In loving memory of my late father, Jagtar Singh Tumber, who I owe all my eternal gratitude for supporting me regardless of life’s circumstances. You were my rock and always will be.And to my late father-in-law, Ramon Davila, for expecting the best out of us and always caring for our family. You were always our true patriarch! You are both loved, forever in our hearts, and never forgotten.
I would like to thank my family for their unwavering love and support during the development of this book, including my amazing wife, Mayra, and my children, Anthony and Alyssa – you both have a bright future ahead of you, so keep making intelligent choices in life! And to my beautiful mother, Kamla Devi, and three older brothers and sister, along with their respective spouses – you all define the epitome of a family and I truly appreciate your endearing love!
I would like to extend my deepest gratitude to Anurag for agreeing to help me coauthor this book. Without you, I am sure I would have had a difficult time crossing the finish line – you have helped not only with amazing content and ideas, but you have also helped keep things on track with this book project by helping us meet deadlines and managing many aspects of book authoring for us.
I also wanted to express my sincerest thanks to my manager at Microsoft, Dheepa Iyer, for giving me the initial push I needed at the start of this journey. It was the moment you stated, “If anyone can successfully do this, I know you can, Paul.” That was enough for me to hear, and the rest is history… Without your encouraging words, this book would likely not have transpired. You allow everyone you manage to want to do the best work they can!
Finally, I want to thank my many colleagues and peers at Microsoft. There are too many to list here, but here are just some of the most amazing folks I have had the pleasure of working with: John O. Sullivan and Christopher Tucci, you both are great at what you do, and it is an honor to have you as my colleagues. Last, but certainly not least, a thank you to my other amazing Microsoft colleagues Matthew Thanakit, Yi Yang, and Ram Dorairaj for your friendship, numerous collaborations, and making working at Microsoft that much more enjoyable!
– Paul Singh
I’d like to extend my heartfelt gratitude to my beautiful wife, Catherine, for her unwavering support and encouragement. Her incredible understanding and sacrifice have provided me with the time to write this book during countless weekends and late-night sessions, a gift for which I am eternally grateful. To my family, my parents Narayana and Sreelakshmi, my brother, Srinivas, and my sister-in-law Ramya, and to my in-laws, Tom and Lynn: your love and belief in me have been the bedrock of my journey. Each of you has played an integral role in bringing this book to fruition, and for that, I thank you from the bottom of my heart.
Special thanks to my mentor, Paul, for giving me an incredible opportunity to coauthor this work. His guidance, support, and partnership have been pivotal in realizing this achievement. Paul’s wisdom and encouragement have not only shaped this project but have also profoundly influenced my personal and professional growth. This collaboration stands as a testament to the power of mentorship and shared vision in bringing ideas to life.
My sincere appreciation extends to the leadership team and my colleagues at Microsoft for their unwavering support and collaboration. I am especially grateful to my fantastic colleagues, including those Paul has mentioned above, Vishnu Pamula, and Nadeem Ahmed – your expertise and dedication are truly remarkable. Working alongside such talented individuals has not only inspired me but also significantly contributed to my professional growth. Thank you all for being such an integral part of my journey.
– Anurag Karuparti
We both would like to also extend a heartful thanks to John Maeda, Microsoft Vice President, Design and AI, for his willingness to support us. You were always extremely insightful and enlightening in our sessions, so you were always our only choice to write our book’s foreword.
And to Svetlana Reznik, Microsoft CSA Director, for your support, encouragement, and guidance for us in our daily lives at Microsoft. In our eyes, you are our perpetual “Manager of the Year”!
Picture this: my 88-year-old mother, a vibrant soul who can text like a teenager – recently found herself in a tussle with AI-driven autocorrect on iMessage. As parents tend to do with us all, as their personal tech support line, she called me up and demanded that I turn off this feature. Why? Because it was doing things that she didn’t want it to do such as inserting words that weren’t her own. As my queen, I did her bidding and she was so happy to have it gone. And yet a couple weeks later she was asking for it back because she missed the convenience it offered for typing those long, repetitive words. It’s a story that beautifully encapsulates the delicate dance we all perform with AI technology: a journey of resistance, adaptation, and, ultimately, acceptance grounded in the benefits we gain versus the learning required to adapt.
Diving into this book by Singh and Karuparti, you’ll find yourself on a similar journey, exploring the vast landscapes where cloud computing meets the sophisticated capabilities of GPT-fueled AI capabilities. This isn’t just a tale of technical marvels; it’s a narrative about unlocking human potential, making space for creativity amidst the mundane, and reimagining what productivity looks like when we’re freed from the drudgery of routine tasks. Each chapter that’s been handcrafted and wittily illustrated with grounded examples from Paul and Anurag’s experiences, rich with insights and foresight, invites you to dream, to ponder, and to engage with a future shaped by the confluence of AI and our deepest human aspirations.
What truly stands out is the paradigm shift in computer programming driven by AI and cloud computing. This isn’t just a refresh of tools or techniques; it’s a complete overhaul of our foundational approach, shedding light on new terminology and systems that can easily seem enigmatic. For developers, this presents an exhilarating challenge: to learn, adapt, and innovate in ways that were unimaginable just a short while ago. In many ways, we’re all like my mother, on a roller coaster ride with this new AI revolution – but without our sons or daughters to make a technical support call. Luckily, however, we now have Paul and Anurag’s copious lessons to draw upon in Generative AI for Cloud Solutions <3. The authors, through their meticulous work, offer not just a guide but a companion for this journey, providing the insights and encouragement needed to navigate the complexities of this new frontier.
Accepting Paul and Anurag’s generous invitation to developers throughout Generative AI for Cloud Solutions, I feel that we’re all better equipped to navigate the AI and cloud computing revolution with their 10,000 hours of practice as our convenient guide. Their book promises not just a deep dive into technical mastery but an inspiring journey toward embracing change, reminiscent of Eric Shinseki’s words: “If you don’t like change, you’re going to like irrelevance even less.”
As you venture through the many fun-filled chapters in this book, try to embrace the challenges and opportunities with the same open-hearted adaptability my mother showed toward autocorrect. This isn’t just about keeping pace with technology – it’s about thriving in a future where our human creativity and AI’s capabilities are inextricably linked. There’s an entire emergent chain of tooling and processes that are wonderfully demystified within this book, and I for one certainly feel better prepared for what comes next. I wish this feeling of confidence to you, as a fellow practitioner who has also begun this path to becoming an AI engineer.
John Maeda, PhD/MBA
Microsoft Vice President, Design and AI
Redmond, WA
Paul Singh is currently a Principal Cloud Solution Architect (CSA), working at Microsoft for over 10 years. Having been selected as one of the very first 10 CSAs when the role was first created, Paul has helped shape the role ever since, including being on the national hiring committee(s) as well as helping create the very first Azure Architecture exam. Paul has earned many honors and awards along the way, while also gaining over 30 different technical certifications, and helping some of the largest cloud customers with complex scenarios and solutions.
Anurag Sirish Karuparti is a seasoned senior cloud solution architect specializing in AI at Microsoft’s Azure practice. Anurag holds a master’s degree in information management (data science) from Syracuse University and has a background in computer engineering. With over 10 years of experience in the industry, Anurag has become a trusted expert in the fields of the cloud, data, and advanced analytics. Anurag holds multiple Azure certifications and is certified across major cloud platforms. Throughout his career, he has successfully designed and implemented cutting-edge solutions, leveraging the power of artificial intelligence to drive innovation and transform businesses. Prior to joining Microsoft, Anurag gained valuable experience working as a manager in the emerging technologies practices of renowned consulting firms such as EY and PwC.
Soumo Chakraborty is an associate director and solutions architect for data and AI practice at Kyndryl. He has 17 years of experience in leading transformation projects such as platform and data migration, AIOps, MLOps, and now, generative AI. His technical breadth has evolved from the days of on-premises IT infrastructure to cutting-edge technologies using artificial intelligence and machine learning, which makes him a trusted client partner. He leads the solutioning of complex data and machine learning deals, provides consultation to first-of-a-kind generative AI proposals, and delivers innovation to clients. He advocates ethical AI practices and applies them to business use cases. Soumo holds one patent in the area of machine learning.
Manoj Palaniswamy Vasanth is a principal architect and director with over 20 years of experience in the areas of enterprise data analytics and management, data and AI strategy, SAP data analytics, generative AI, LLMOps, and hybrid and cloud IT infrastructure. He has led many cross-cultural technical teams across the globe in developing and deploying scalable data and AI solutions, driving transformative change and facilitating data-driven decision-making. At Kyndryl, Manoj currently plays a technical leadership role within the Global Apps Data and AI practice and is responsible for helping customers modernize their data platforms and realize the value of data for their business. He holds two patents in the area of machine learning and workload optimization on VMs.
Reeta Patil is a skilled professional with extensive experience in object-oriented programming, full stack development, software development life cycle (SDLC), and database management. She possesses a strong understanding of cloud infrastructure and excels in maintaining systems on platforms such as AWS and OCI. Reeta is proficient in web application development, with expertise in JavaScript, React, and Angular. Additionally, she has a background in research, specifically in data analysis, machine learning, and natural language processing. Currently, Reeta works at Oracle, bringing her diverse skill set to contribute to innovative projects and solutions.
The authors acknowledge the use of cutting-edge AI, such as ChatGPT, with the sole aim of enhancing the language and clarity within the book, thereby ensuring a smooth reading experience for readers. It's important to note that the content itself has been crafted by the author and edited by a professional publishing team.
Generative AI – the world has been buzzing about this profound concept recently. Everywhere you turn, whether you are watching the nightly news, listening to some of the brightest business leaders adopting technology, or following the global markets, generative AI is at the forefront of these conversations. This revolutionary technology is being interwoven with all industries, economies, businesses, and organizations at an unprecedented rate.
While the concepts of both artificial intelligence and, more recently, generative AI have been around for quite some time, both entered mainstream knowledge with the introduction of an extremely powerful conversational chatbot, known as ChatGPT.
ChatGPT, introduced in late 2022, interacts with a user or application in a conversational way at a level and precision society has not seen before. We have had chatbots for a very long time, however, ChatGPT broke the mold and essentially catapulted humanity into the “age of AI.”
The technology behind ChatGPT, and thus generative AI, which we will cover in this book, makes it possible to accomplish profound things never seen before – such as answering follow-up questions, admitting its mistakes, challenging incorrect thoughts and suggestions, and even rejecting inappropriate requests, to help protect us. ChatGPT has grabbed the attention of everyone – even those not in the technology industry – due to its powerful knowledge capabilities and speed and precision in responses.
Generative AI is already touching the lives of many of us, with many not even knowing it. And this growth trend will not slow down any time soon. In fact, we expect almost all future careers and jobs to have a basic requirement of core experience/expertise with, plus working knowledge of, some AI, and with a bonus skill of AI/GenAI implementation. This book will serve as your fundamental guide to prepare you for today and for tomorrow.
In this book, we begin in areas where you’ll gain basic knowledge of generative AI and subsequently what it takes to build a successful cloud solution around this AI technology. We’ll use the Microsoft Azure AI Cloud and OpenAI lens for our examples, due to both market leadership and also because we are both currently employed at Microsoft. We do take a holistic, industry-wide approach where the knowledge and concepts can be applied across any cloud solution provider/vendor.
We hope you enjoy reading this book as much as we have had the pleasure of writing it! And please note, although generative AI can create content such as books, this book was created and written by us, the authors, not by the very technology we cover, generative AI (with the exception of the fun, generated comic strips in each chapter!).
This book is primarily aimed at technologists or general readers who would like to get a better understanding of generative AI and how to apply it to a cloud environment.
This book assumes you have little to no knowledge of generative AI, as we build from a basic understanding to some of the more complex concepts and patterns that a cloud environment may present.
The target audience of this content is as follows:
Technologists, including solution architects, cloud developers, data scientists, technology managers, and technical business leaders who want to understand the broader picture of generative AI as well as strategies for an effective, robust, and scalable generative AI solution/service.Businesses and organizations who want to make the most of AI/generative AI.Casual readers who want to learn more about generative AI and ChatGPT.The book offers a structured narrative, starting with an introduction to generative AI and its integration with cloud computing. This is followed by an exploration of the model layer, diving deeper into the intricacies of Large Language Models (LLMs), including the evolution of Natural Language Processing (NLP) and the advent of transformer models. It discusses techniques such as fine-tuning and Retrieval-Augmented Generation (RAG) for augmenting model knowledge. The book then discusses prompt engineering methods. Moving on to the application level, it covers the development framework and strategies, emphasizing scaling, security, safety, and compliance with responsible AI principles. The concluding section provides foresight into the future trajectory of generative AI. Here is the outline of the chapters in this book:
Chapter 1, Cloud Computing Meets Generative AI: Bridging Infinite Impossibilities, introduces the concept of LLMs, what ChatGPT is based on, and their significance in conversational and generative AI. It examines the generative capabilities of LLMs, such as text generation and creative writing. The chapter concludes by exploring the practical applications of LLMs and their future directions in virtual assistants, content creation, and beyond.
Chapter 2, NLP Evolution and Transformers: Exploring NLPs and LLMs, takes you on a journey through the evolution of transformers – the heart of LLMS, from preceding technology known as Natural Language Processing (NLP) to how a powerful new paradigm has now been created using NLP and LLMs.
Chapter 3, Fine-Tuning: Building Domain-Specific LLM Applications, talks about the benefits of fine-tuning, different techniques of fine-tuning, how to align models to human values with RLHF, evaluating fine-tuned models, and real-life examples of fine-tuning success.
Chapter 4, RAGs to Riches: Elevating AI with External Data, discusses the fundaments of vector databases and how they play a critical role in building a Retrieval-Augmented Generation (RAG) based application. We will also explore chunking strategy evaluation techniques along with a real-life case study.
Chapter 5, Effective Prompt Engineering Strategies: Unlocking Wisdom Through AI, takes a look at prompt engineering with ChatGPT and some techniques to not only make prompts more effective but also understand some of the ethical dimensions of prompting.
Chapter 6, Developing and Operationalizing LLM-Based Cloud Applications: Exploring Dev Frameworks and LLMOps, uses a software application developer lens to focus on areas that would support developer activities such as programmatic application development frameworks, allowing for AI-enabled applications. We will also look at the lifecycle management of generative AI models in addition to operationalizing the management of generative AI models, along with exciting topics such as agents, autonomous agents, and assistant APIs.
Chapter 7, Deploying ChatGPT in the Cloud: Architecture Design and Scaling Strategies, explores how to scale a large deployment of a generative AI cloud solution. You’ll gain an understanding of limits, design patterns, and error handling while taking a look at areas and categories that ensure a large-scale generative AI application or service will be robust enough to handle a large number of prompts.
Chapter 8, Security and Privacy Considerations for Gen AI: Building Safe and Secure LLMs, uncovers existing and emerging security threats related to GenAI models, and how to mitigate them, by applying security controls or other techniques to ensure a safe, secure environment. We will also cover a concept known as red-teaming, as well as auditing and reporting.
Chapter 9, Responsible Development of AI Solutions: Building with Integrity and Care, delves into the essential components required to construct a secure generative AI solution, emphasizing the key principles of responsible AI and addressing the challenges of LLMs through these principles. It also explores the escalating concern over deepfakes, their harmful impacts on society, and strategies for developing applications with a responsible AI-first approach. Additionally, it examines the current global regulatory trends and the burgeoning start-up ecosystem in this domain.
Chapter 10, The Future of Generative AI: Trends and Emerging Use Cases, is one of the most exciting chapters in this book, discussing the future of generative AI solutions, highlighting hot emerging trends such as the rise of small language models, offering predictions, exploring the integration of LLMs on edge devices, and examining the impact of quantum computing and the path to AGI.
While knowledge of Artificial Intelligence (AI) or Generative AI (GenAI) is not required, having some familiarity with either will help grasp some of the concepts covered in this book.
You should have a basic understanding of cloud computing and related technologies. While we focus on the Microsoft Azure cloud platform, due to their market leadership in this space, many of the concepts will also include open source concepts and ideas or can be transformed for other cloud service providers.
Software/hardware covered in the book
Operating system requirements
Access to GitHub repository
Any modern device with internet access.
Microsoft Azure cloud subscription
To help go into depth on some of the more intricate concepts of this book, we have created additional hands-on labs on a GitHub site (details follow). While access to GitHub and, subsequently, the Azure cloud is not required for this book, it may be helpful for some, especially those who would like to apply their knowledge.
If you are using the digital version of this book, we advise you to type the code yourself. Doing so will help you avoid any potential errors related to the copying and pasting of code.
You can download the hands-on labs and example code files for this book from GitHub at https://github.com/PacktPublishing/Generative-AI-for-Cloud-Solutions. If there are any updates to the hands-on labs or any updates to any code, this will be updated in the GitHub repository referenced above.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “The term foundation models was coined by Stanford in 2021 in the paper “On the Opportunities and Risks of Foundation Models” (https://arxiv.org/pdf/2108.07258.pdf). ”
A block of code is set as follows:
from langchain.text_splitter import ( RecursiveCharacterTextSplitter, Language, )Any command-line input or output is written as follows:
['Ladies and Gentlemen, esteemed colleagues, and honored guests. Esteemed leaders and distinguished members', 'emed leaders and distinguished members of the community. Esteemed judges and advisors. My fellow citizens.', '. My fellow citizens. Last year, unprecedented challenges divided us. This year, we stand united,', ', we stand united, ready to move forward together']Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “There are already countless transformer models, such as GPT,Llama 2, Dolly, BERT,BART,T5, and so on.”
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you’ve read Generative AI for Cloud Solutions, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link belowhttps://packt.link/free-ebook/978-1-83508-478-6
Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directlyThis part introduces Generative AI through the lens of Large Language Models (LLMs), highlighting the substantial impetus this domain has received from advancements in cloud computing. The progressive evolution of Natural Language Processing (NLP) culminated in the development of the Transformer architecture, a pivotal foundation for LLMs. We will detail its innovative mechanisms and core principles. Additionally, we will also explore the journey of turning visionary AI ideas into tangible realities.
This part contains the following chapters:
Chapter 1, Cloud Computing Meets Generative AI: Bridging Infinite ImpossibilitiesChapter 2, NLP Evolution and Transformers: Exploring NLPs and LLMsDuring the last few decades, we have seen unprecedented progress in the world of artificial intelligence (AI) and machine learning (ML) due to the rise of computing, especially cloud computing, and the massive influx of data from the digital revolution. In 2022, the subset of AI known as generative AI emerged as a significant turning point. We have surpassed an inflection point in AI and we believe this will boost incredible productivity and growth in society in the coming years. This is the field of conversational AI powered by large language models (LLMs), a fascinating paradigm where computers learn and generate human-like text, images, audio, and video, engaging with us in increasingly interactive and intelligent ways. The transformative potential of LLMs, epitomized by models, such as OpenAI’s GPT-based ChatGPT, marks a major shift in how we interact with technology. Generative AI models now have improved accuracy and effectiveness. Use cases that were unattainable for non-technical users in businesses a couple of years ago are now readily implementable. Additionally, the easy availability of open source models, which can be tailored to specific business requirements, coupled with access to high-performance GPUs via cloud computing, has played a crucial role in propelling the advancement of generative AI.
This chapter aims to provide a comprehensive introduction to conversational and generative AI and delve into its fundamentals and powerful capabilities. ChatGPT, a very powerful conversational AI agent, is built on an LLM; hence, to fully understand how ChatGPT works and to learn how to implement it in your applications or services to harness its power, it’s necessary to understand the evolution of conversational AI systems and the broader context of LLMs.
We will cover the following main topics in this chapter:
Evolution of conversation AIIntroduction to generative AITrending models and business applicationsDeep dive: open source vs closed source modelsCloud computing for scalability, cost optimization, and automationFrom vision to value: navigating the journey to productionUnderstanding the evolution of conversational AI is crucial for learning generative AI as it provides foundational knowledge and context. This historical perspective reveals how AI technologies have progressed from simple, rule-based systems to complex machine learning and deep learning models that are core to both conversational and generative AI.
This section explores the evolution of conversational AI, culminating in an in-depth look at LLMs, the technological backbone of contemporary chatbots.
Conversational AI refers to technologies that enable machines to engage in human-like dialogue, comprehend complex commands, and respond intelligently. This is achieved through machine learning and natural language processing capabilities, enabling the system to learn, understand, and improve over time. The following figure demonstrates one such conversation:
Figure 1.1 – Conversations with Alexa
For instance, a customer interacts with a conversational AI to book a flight. They might say, “I’d like a flight to New York next Friday.” The system comprehends the request, asks for any further specific details (such as departure city or preferred time), and delivers the results, all without human intervention.
Some popular conversational AI systems include Microsoft’s Cortana, Amazon Alexa, Apple’s Siri, and Google Assistant, which can respond to complex commands and respond intelligently.
Exploring the evolution of conversational AI, from rule-based chatbots to AI-powered systems, is vital as it offers historical context, highlights the technological advancements from the 1960s and the historical challenges, and sets the stage for understanding how LLMs have revolutionized natural language interactions. The following figure depicts the conversational AI timeline:
Figure 1.2 – Timeline showing the evolution of chatbots
Chatbots that were initially developed during the 1960s operated on a rule-based system. Eliza, the first chatbot software, was created by Joseph Weizenbaum at MIT’s Artificial Intelligence Laboratory in 1966. It used pattern matching and substitution technology. Users interacted with Eliza through a text-based platform, with the chatbot’s responses being based on scripted templates. Like Eliza, the first-generation chatbots were rule-based. They utilized pattern-matching techniques to align user inputs with predetermined responses. The chatbot’s conversation flows were mapped out by developers who decided how it should respond to anticipated customer inquiries. Responses were formulated based on predefined rules and written in languages such as artificial intelligence markup language (AIML), Rivescript, Chatscript, and others. These chatbots, typically used as FAQ agents, could answer simple questions or common queries about a specific situation.
However, rule-based systems hadsignificant limitations:
Rule-based systems required manual design, forcing developers to program each response
They were effective only in the scenarios for which they were specifically trained
It was difficult for developers to anticipate and program all possible responses
These chatbots were unable to identify grammatical or syntactic errors in user inputs, often resulting in misunderstandings
They were unable to learn from interactions or generate new responses, limiting their adaptability and intelligence
Despite their speed, the inability to understand context or user intents made interactions feel mechanical rather than conversational
This mechanical interaction often led to user frustration with systems that failed to accurately understand and meet their needs
Over time, there has been a significant increase in demand for intelligent, real-time, and personalized interactions in customer support services. As a result, rule-based chatbots have evolved into AI-powered chatbots that offer advanced features such as human-like voice, intent extraction, sentiment analysis, contextual semantic search, grammatical analysis, learning over time, and scalability to allow for seamless integration with more demanding applications and services.
In contrast to rule-based systems, AI-based systems utilize natural language processing to facilitate natural conversations and extract context from user inputs. They can also learn from past interactions aka context. Recently, deep learning has significantly advanced conversational AI, even surpassing human performance in some tasks, attributed to its incredible reasoning engine. This has decreased the reliance on extensive linguistic knowledge and rule-based techniques when building language services. As a result, AI-based systems have seen widespread adoption across various industries, including media, entertainment, telecommunications, finance, healthcare, and retail, to name a few.
Current conversational AI systems, leveraging LLMs such as GPT-4-Turbo, differ significantly from traditional rule-based systems in their approach and capabilities:
While rule-based systems rely on predefined rules and responses, limiting them to specific, anticipated interactions, LLMs harness extensive datasets and advanced reasoning abilities to produce responses that are not only natural and varied but also highly context-aware
They are also multimodal, which means they can understand and respond to multiple forms of communication such as text, voice, image, or video
These exceptional reasoning abilities enable them to handle tasks with increased efficiency and sophistication, leading to conversations that closely mimic human interaction and understanding
Let’s take the scenario of a customer service interaction as an example to highlight the differences between traditional rule-based systems and modern conversational AI systems that use LLMs, such as GPT-4.
The following is a rule-based system example:
Customer: "I want to return a gift I received without a receipt. Can you help me?" Rule-Based Chatbot: "Please enter your order number to proceed with a return."In this case, the rule-based chatbot is programmed to ask for an order number as a part of its return process script. It can’t handle the nuance of the customer’s situation where they don’t have a receipt. It’s stuck in its predefined rules and can’t adapt to the unexpected scenario.
The following is an LLM-powered conversational AI example:
Customer: "I want to return a gift I received without a receipt. Can you help me?" LLM-Powered Chatbot: "Certainly! Gifts can often be returned without a receipt by verifying the purchaser's details or using a gift return code. Do you have the purchaser's name or email, or a gift return code?"The LLM-powered chatbot, on the other hand, understands the context of not having a receipt and offers alternative methods for returning the item. It does not require the customer to stick to a strict script but instead adapts to the context of the conversation and provides a helpful response. This showcases the advanced reasoning capabilities of LLMs, allowing for more natural, flexible, and human-like conversations.
LLM-powered chatbots also possess inherent limitations, including difficulties in generating accurate up-to-date information, a tendency to hallucinate, and the reproduction of biases present in their training data. We explore these limitations throughout this book, along with strategies to mitigate and eliminate them.
GenAI-based chatbots can also execute tasks or actions with the help of agents. LLM agents are programs that enhance standard LLMs by connecting to external tools, such as APIs and plugins, and assist in planning and executing tasks. They often interact with other software and databases for complex tasks, such as chatbot scheduling meetings and needing access to calendars and emails. When a user requests a meeting, the chatbot, utilizing its LLM, comprehends the request’s specifics, such as time, participants, and purpose. It then autonomously interacts with the employees’ digital calendars and email systems to find a suitable time slot, considering everyone’s availability. Once it identifies an appropriate time, the chatbot schedules the meeting and sends invites via email, managing the entire process without human intervention. This showcases the chatbot’s ability to perform complex, multi-step tasks efficiently, blending language understanding and reasoning with practical action in a business environment. We will learn more about LLM agents in Chapter 6.
ChatGPT, launched in November 2022 by OpenAI, attracted 100 million users within just two months due to its advanced language capabilities and broad applicability across various tasks.
In the upcoming section, we will delve into the fundamentals of LLMs as the driving force behind modern chatbots and their significance.
Generative AI refers to a field of AI (as stated in the preceding figure) that focuses on creating or generating new content, such as images, text, music, video, code, 3D objects, or synthetic data that is not directly copied or replicated from existing data. It involves training deep learning models to understand patterns and relationships within a given dataset and then using that knowledge to generate novel and unique content. The following is a visualization of what generative AI is:
Figure 1.3 – What is generative AI?
It is a broad field whose primary function is to generate novel content. Examples of generative AI models include image generation models such as DALL-E and MidJourney, text generation models such as GPT-4, PaLM, and Claude, code generation models such as Codex, audio generation tools such as MusicLM, and video generation models suchas SORA.
Generative AI has reached an inflection point in recent times, and this can be attributed to three key factors:
Size and variety of datasets: The surge in available data due to the digital revolution has been crucial for training AI models to generate human-like content.Innovative deep learning models: Advancements in model architectures such as generative adversarial networks (GANs) and transformer-based models facilitate the learning of complex patterns, resulting in high-quality AI-generated outputs. The research paper “Attention Is All You Need” (https://arxiv.org/abs/1706.03762) introduced transformer architecture, enabling significantly more efficient and powerful models for natural language processing, which became foundational for the development of advanced generative AI models. Progress has also been significantly fueled by the availability of open source state-of-the-art pre-trained models via platforms such as the Hugging Face Community.Powerful computing: Advancements in hardware such as Nvidia GPUs and access to computing through cloud computing have enabled the training of complex AI models, driving advancements in generative AI.There are various types of generative AI models with different underlying architectures. Among them, VAEs, diffusion models, GANs, and autoregressive models are particularly popular. While we won’t delve into every model architecture extensively as it is outside the scope of this book. In Chapter 2, we will focus on a more detailed discussion of ChatGPT’s LLM architecture, which utilizes an autoregressive-based transformer architecture.
Moving from the topic of generative AI, we now turn our attention to foundation models. Often used interchangeably with LLMs, these models are the driving force behind the success and possibilities of generative AI. The remarkable strides made in foundation models have been instrumental in propelling the advancements we witness today in generative AI applications. Their development has not only enabled more sophisticated AI capabilities but has also set the stage for a new era of innovation and possibilities in AI.
The term foundation models was coined by Stanford in 2021 in the paper “On the Opportunities and Risks of Foundation Models” (https://arxiv.org/pdf/2108.07258.pdf). Foundation models are a class of large-scale model that are pre-trained on vast amounts of data across various domains and tasks. They serve as a base for further fine-tuning or adaptation to a wide range of downstream tasks, not limited to language but including vision, sound, and other modalities. The term foundation signifies that these models provide a foundational layer of understanding and capabilities upon which specialized models can be built. They are characterized by their ability to learn and generalize from the training data to a variety of applications, sometimes with little to no additional training data. The model is as follows:.
Figure 1.4 – Foundation models
LLMs, on the other hand, are a subset of foundation models that specifically deal with natural language processing tasks. They are trained in large text corpora and are designed to understand, generate, and translate language at a scale and sophistication that closely resembles human language understanding. LLMs are trained on massive amounts of data, such as books, articles, and the internet. For example, ChatGPT’s base model was trained on 45 TB of data.
LLMs such as GPTs use transformer architecture to process text sequences, training themselves to predict the next word in a given sequence. Through exposure to vast amounts of text, these models adjust their internal weights based on the difference between predicted and actual words, a process known as backpropagation. Over time, by repeatedly refining these weights across multiple layers of attention mechanisms, they capture intricate statistical patterns and dependencies in the language, enabling them to generate contextually relevant text. In Chapter 2, we will delve deeper into the transformer architecture of LLMs that enables the ChatGPT application.
LLMs traditionally refer to models that handle large-scale language tasks; the principles and architecture underlying them can be, and are being, extended to other domains such as image generation. This expansion of capabilities reflects the versatility and adaptability of the transformer-based models that power both LLMs and their multimodal counterparts.
Models such as DALL-E, for instance, are sometimes referred to as LLMs due to their foundation in transformer architecture, which was originally developed for language tasks. However, DALL-E is more accurately described as a multimodal AI model because it understands both text and images and can generate images from textual descriptions.
In the process of creating LLM-based AI applications, it is crucial to understand the core attributes of LLMs, such as model parameters, licensing model, privacy, cost, quality, and latency. It is important to note that there isn’t a flawless model, and making tradeoffs might be necessary to align with the specific business requirements of the application. The following content concentrates only on vital considerations when designing LLM applications.
For example, in the context of LLMs, model parameters are akin to internal notes that guide predictions based on learned data patterns. For example, if an LLM frequently encounters the phrase “sunny weather” during training, it adjusts its parameters to strengthen the connection between “sunny” and “weather.” These adjustments are like turning knobs to increase the likelihood of predicting “weather” after “sunny” in new sentences. Thus, the model’s parameters encode relationships between words, enabling it to generate contextually relevant text based on its training.
The number of parameters indicates the model’s size and complexity, with larger models generally capable of capturing more complex patterns and nuances in language but requiring more computational resources.Understanding the parameters in LLMs is crucial for interpreting model behavior, customizing and adapting the model, and evaluating and comparing different models.Smaller models are more fine-tunable because of the lower number of parameters as compared to larger models.While designing applications, it’s crucial to understand whether a smaller model can fulfill the needs of a particular use case by means of fine-tuning/in-context learning or whether a larger model is necessary. For example, smaller models such as GPT-3.5 and FLAN-T5 typically come with lower costs as compared to GPT-4 and often prove highly efficient with fine-tuning or in-context learning, especially in specific tasks such as conversation summarization.The core attributes mentioned provide an excellent starting point for shortlisting models based on business requirements. However, it’s important to understand that some LLMs may exhibit more biases and a higher tendency to hallucinate. In Chapter 3, we discuss industry-leading benchmarks that will help you make informed decisions considering these limitations.
Generative AI broadly refers to AI systems that can create new content, such as text, image, audio, or video. Foundation models are a subset of generative AI, characterized by their large scale and versatility across multiple tasks, often trained on extensive and diverse datasets. LLMs, a type of foundation model, specifically focus on understanding and generating human language, exemplified by systems such as GPT-3.5-Turbo and Llama 2.
Foundation models can be applied to a variety of AI tasks beyond language, such as image recognition, whereas LLMs are specifically focused on language-related tasks.
In practice, the terms can sometimes be used interchangeably when the context is clearly about language tasks, but it’s important to know that the concept of foundation models was originally supposed to be broader and encompass a wider range of AI capabilities.
However, now, as LLMs such as GPT-4 Turbo are extending to multimodal capabilities, this difference between foundation models and LLMs has been narrowing.
Generative AI encompasses a wide array
