Building AI Intensive Python Applications - Rachelle Palmer - E-Book

Building AI Intensive Python Applications E-Book

Rachelle Palmer

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The era of generative AI is upon us, and this book serves as a roadmap to harness its full potential. With its help, you’ll learn the core components of the AI stack: large language models (LLMs), vector databases, and Python frameworks, and see how these technologies work together to create intelligent applications.
The chapters will help you discover best practices for data preparation, model selection, and fine-tuning, and teach you advanced techniques such as retrieval-augmented generation (RAG) to overcome common challenges, such as hallucinations and data leakage. You’ll get a solid understanding of vector databases, implement effective vector search strategies, refine models for accuracy, and optimize performance to achieve impactful results. You’ll also identify and address AI failures to ensure your applications deliver reliable and valuable results. By evaluating and improving the output of LLMs, you’ll be able to enhance their performance and relevance.
By the end of this book, you’ll be well-equipped to build sophisticated AI applications that deliver real-world value.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 422

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Building AI Intensive Python Applications

Create intelligent apps with LLMs and vector databases

Rachelle Palmer

Ben Perlmutter

Ashwin Gangadhar

Nicholas Larew

Sigfrido Narváez

Thomas Rueckstiess

Henry Weller

Richmond Alake

Shubham Ranjan

Building AI Intensive Python Applications First Edition

First Edition

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publisher: Vishal Bodwani

Acquisition Editor: Sathya Mohan

Lead Development Editors: Siddhant Jain

Development Editor: Asma Khan

Copy Editor: Safis Editing

Associate Project Manager: Yash Basil

Proofreader: Safis Editing

Production Designer: Deepak Chavan

Production reference: 1060924

Published by Packt Publishing Ltd.

Grosvenor House, 11 St Paul’s Square, Birmingham, B3 1RB, UK.

ISBN 978-1-83620-725-2

www.packtpub.com

Contributors

About the authors

Rachelle Palmer is the Product Leader for Developer Database Experience and Developer Education at MongoDB, overseeing the driver client libraries, documentation, framework integrations, and MongoDB University. She has built sample applications for MongoDB in Java, PHP, Rust, Python, Node.js, and Ruby. Rachelle joined MongoDB in 2013 and was previously the Director of the Technical Services Engineering team, creating and managing the team that provided support and CloudOps to MongoDB Atlas.

Ben Perlmutter is a Senior Engineer on the Education AI team at MongoDB. He applies AI technologies such as LLMs, embedding models, and vector databases to improve MongoDB’s educational experience. His team built the MongoDB AI chatbot, which uses retrieval-augmented generation to help thousands of users a week learn about MongoDB. Ben formerly worked as a technical writer specializing in developer-focused documentation.

Ashwin Gangadhar is a Senior Solutions Architect at MongoDB with over a decade of experience in data-driven solutions for e-commerce, HR analytics, and finance. He holds a master’s in controls and signal processing and specializes in search relevancy, computer vision, and NLP. Passionate about continuous learning, Ashwin explores new technologies and innovative solutions. Born and raised in Bengaluru, India, he enjoys traveling, exploring cultures through cuisine, and playing the guitar.

Nicholas Larew is a Senior Engineer on MongoDB’s Education AI team. He works on MongoDB’s AI chatbot, including the open-source framework that powers it, and MongoDB’s content generation and dataset curation efforts. Before working in AI, Nicholas wrote and maintained documentation and sample applications for MongoDB’s developer-facing products.

Sigfrido Narváez is an Executive Solution Architect at MongoDB where he works on AI projects, database migration, and app modernization. His customers span the Americas and LATAM for entertainment, gaming, financial, and other verticals. Named a MongoDB Master in 2015, he speaks at conferences such as GDC, QCon, and re:Invent, sharing the sample apps he has built in Python and other languages using MongoDB Atlas and leading AI technologies.

Thomas Rueckstiess is a Senior Staff Research Scientist and Head of the Machine Learning Research Group at MongoDB. Thomas holds a PhD in machine learning, specializing in neural networks and reinforcement learning, transformers, and structured data modeling. He joined MongoDB in 2012 and was previously the Lead Engineer for MongoDB Compass and Atlas Charts.

Henry Weller is the dedicated Product Manager for Atlas Vector Search, focusing on the query features and scalability of the service, as well as developing best practices for users. He helped launch Atlas Vector Search from public preview into general availability in 2023 and continues to lead the delivery of core features for the service. Henry joined MongoDB in 2022 and was previously a data engineer and backend robotics software engineer.

Richmond Alake is an AI/ML Developer Advocate at MongoDB, creating technical learning content for developers building AI applications. His background includes machine learning architecture, optimizing data pipelines, and developing mobile experiences with deep learning. Richmond specializes in generative AI and computer vision, focusing on practical applications and efficient implementations across AI domains. He guides developers on best practices for AI solutions.

Shubham Ranjan is a Product Manager at MongoDB for Python and a core contributing member to AI initiatives at MongoDB. He is also a Python developer and has published over 700 technical articles on topics ranging from data science and machine learning to competitive programming. Since joining MongoDB in 2019, Shubham has held several roles, progressing from a Software Engineer to a Product Manager for multiple products.

About the reviewers

Arek Borucki, a recognized MongoDB Champion and certified MongoDB SME, has been working with MongoDB technology since 2016. As a principal SRE engineer, he works closely with technologies such as MongoDB, Elasticsearch, PostgreSQL, Kafka, Kubernetes, Terraform, AWS, and GCP. He has worked with renowned companies such as Amadeus, Deutsche Bank, IBM, Nokia, and Beamery. Arek is also a certified Kubernetes administrator and developer, an active speaker at international conferences, a co-author of questions for the MongoDB Associate DBA exam, a MongoDB data modeler, and a MongoDB Atlas administrator. He is also the co-author of the book Mastering MongoDB 7.0.

Chris Bush is the Director of Engineering in Education at MongoDB, where he leads the AI team responsible for building the docs chatbot using ChatGPT and Atlas Vector Search. Originally from Canada and now based in New York, his lifelong passion for language and technology has guided him through a diverse career in software development and technical writing. He has the notable achievement of winning the Best Writer award in 4th grade.

Colleen Day is a Curriculum Designer at MongoDB. She has been a contributor to a variety of learning content for MongoDB University, most recently Introduction to MongoDB and Atlas Search. She holds a master’s degree in English literature from NYU and is passionate about using writing as a vehicle to teach. Colleen has spent her career focused on educational publishing and technical content development. Prior to MongoDB, she was Senior Managing Editor for boot camp courses on data science and fintech, creating courses for developers of all levels.

Robin Taconet is a seasoned Product Leader in the tech industry with over 10 years of experience in leading roles at distinguished organizations, such as MongoDB, Meta, and Salesforce. He has successfully spearheaded the development of solutions for billions of users globally. At MongoDB, he also serves as a cybersecurity SME for the US Public Sector. His areas of expertise include cybersecurity, AI, and the cloud. Robin holds three master’s degrees, including an MSc in computer science with a focus on cybersecurity from Telecom Paris. He is an active public speaker, judge, technical reviewer, and advisor in the AI and cybersecurity community. You can follow and connect with Robin on LinkedIn (https://www.linkedin.com/in/robin-taconet).

Note from Author

I want to be free. I want to be independent. I want to be powerful. I want to be creative. I want to be alive.

—Bing Chat to The New York Times

Have you ever wondered how Siri can understand (almost) anything you ask, or how a Tesla keeps itself from veering off the road? While it may seem like magic, there is tried-and-true science behind it: machine learning (ML) and artificial intelligence (AI).

Based on the May 2023 Stack Overflow survey (https://survey.stackoverflow.co/2023/), 44% of developers use AI tools and 56% have used Copilot, which was launched in October 2021. A mind-blowing 83% have used ChatGPT, which was launched in November 2022. ChatGPT registered more than a billion users in its first six months of operation, making it the fastest-adopted technology in history.

While authoring this book, I found myself becoming surprisingly philosophical. What is thought? What is consciousness? What, even, is a brain? Strange for someone who considers themselves immune to anything other than data. I want this book to exclusively offer how-to guidance on building whatever you prefer, so we will not talk about what AI means for us as a society, or us as engineers. But, the questions persist, and I confess that prior to now I had mostly ignored them. Lately, however, I have found myself staring blankly into space and pondering whether machine sentience is much different than that of humans.

While AI initially gained traction within the developer community, it is now spreading rapidly to non-technical users across various fields, including business, finance, and marketing. AI is widely recognized as a game-changer for business operations and will transform how organizations function, impacting every department from HR and IT to legal, content creation, and marketing.

AI is compelling not just because of its potential, but because of its user experience. For example, ChatGPT has a clean interface that requires no training, provides instantaneous results, and improves productivity with minimal input. Tasks that would take hours of research are now accomplished in moments.

Like any rapidly evolving ecosystem, there is high growth in the field of AI/ML applications. Almost daily, there are new tools, integrations, and insights. Because of the high demand, it is shockingly easy to enter this groundbreaking field, even for beginners. If you are that novice, just embarking on your learning journey, welcome! For the more seasoned engineer, who knows that all that glitters is not gold… it is still (nevertheless) very exciting—I have not personally seen this amount of innovation and enthusiasm since mobile phones were able to finally have apps on them!

As is the case with any technology, AI is not without its downsides nor is it without repercussions. Particularly for engineers like us, AI has inaugurated a time of intense learning and tremendous change. I will give an example that I think we can all identify with: Stack Overflow.

For the past 15+ years, the developer forums of Stack Overflow, coupled with specialized documentation, have been fundamental resources for learning how to code or use available technologies. Developers would read the advice and try to replicate it, learning as they went and consuming the opinions of many individuals along the way. How we learned—by reading and tinkering—fundamentally shaped the sort of developers we became.

With tools such as ChatGPT, Copilot, Gemini, and so on, we now have a different, faster way to find and consume the information we need. We no longer must parse through dozens of pieces of advice and knit them together into something workable. In the long term, this trend may result in developers becoming less likely to access the official documentation, code examples, and troubleshooting guides… or rely on the knowledge of other developers around the world. More likely, they will seek guidance and consume information generated by AI. Today, the AI applications we can access are only referencing all this other material and summarizing it, but there’s no reason this will always be true, especially once AI is creating the code it references. This has all sorts of outcomes, many of them good. But the one that most comes to mind for me personally is about teaching.

In my career, I have been lucky to have many teachers. Whether that teacher was sitting right next to me with a keyboard during code review or speaking in front of me in a YouTube video, I have learned primarily from you, other developers. And for that, I am eternally grateful. I do not know what the future will be like if my teachers are only machines. I hope, if nothing else, they come with sarcasm and memes. Surely, they won’t stay up late playing CATAN after teaching me the perils of squash and merging in Git. Or maybe they will.

My point is, my developer experience was fundamentally a human one, imperfect and uneven, but the bumps along the way had merit. It is not always pointless to fail on your first (or fifth) attempt. Once you’ve built an AI application, it will alter your users’ experiences and their behavior. It may help them use your product faster, but it may also mean that they understand things less deeply, because they didn’t have to take the long, bumpy route to comprehension.

In this book, you will learn about generative AI (GenAI), then how to build a GenAI application using Python. We will cover not just how to build an application but also how to improve, manipulate, and monitor it. Though suitable for beginners, this book will have insights for those already building GenAI applications, particularly in operations and security. We will approach AI as both a remarkable technology and a potential risk, acknowledging its benefits and challenges.

Finally, at the end of this book, we provide a long list of links and resources from our research as well as articles you may find useful and interesting as you begin to understand this fascinating technology. Remember, with great power comes great responsibility. Let’s dive in.

Rachelle Palmer

Director, Product Management

MongoDB, Inc.

Table of Contents

Prefacexv

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Download a free PDF copy of this book

1

Getting Started with Generative AI1

Technical requirements

Defining the terminology

The generative AI stack

Python and GenAI

OpenAI API

MongoDB with Vector Search

Important features of generative AI

Why use generative AI?

The ethics and risks of GenAI

Summary

2

Building Blocks of Intelligent Applications11

Technical requirements

Defining intelligent applications

The building blocks of intelligent applications

LLMs – reasoning engines for intelligent apps

Use cases for LLM reasoning engines

Diverse capabilities of LLMs

Multi-modal language models

A paradigm shift in AI development

Embedding models and vector databases – semantic long-term memory

Embedding models

Vector databases

Model hosting

Your (soon-to-be) intelligent app

Sample application – RAG chatbot

Implications of intelligent applications for software engineering

Summary

Part 125

Foundations of AI: LLMs, Embedding Models, Vector Databases, and Application Design25

3

Large Language Models27

Technical requirements

Probabilistic framework

n-gram language models

Machine learning for language modelling

Artificial neural networks

Training an artificial neural network

ANNs for natural language processing

Tokenization

Embedding

Predicting probability distributions

Dealing with sequential data

Recurrent neural networks

Transformer architecture

LLMs in practice

The evolving field of LLMs

Prompting, fine-tuning, and RAG

Summary

4

Embedding Models47

Technical requirements

What is an embedding model?

How do embedding models differ from LLMs?

When to use embedding models versus LLMs

Types of embedding models

Choosing embedding models

Task requirements

Dataset characteristics

Computational resources

Vector representations

Embedding model leaderboards

Embedding models overview

Do you always need an embedding model?

Executing code from LangChain

Best practices

Summary

5

Vector Databases65

Technical requirements

What is a vector embedding?

Vector similarity

Exact versus approximate search

Measuring search

Graph connectivity

Navigable small worlds

How to search a navigable small world

Hierarchical navigable small worlds

The need for vector databases

How vector search enhances AI models

Case studies and real-world applications

Okta – natural language access request (semantic search)

One AI – language-based AI (RAG over business data)

Novo Nordisk – automatic clinical study generation (advanced RAG/RPA)

Vector search best practices

Data modeling

Deployment

Summary

6

AI/ML Application Design91

Technical requirements

Data modeling

Enriching data with embeddings

Considering search use cases

Data storage

Determining the type of database cluster

Determining IOPS

Determining RAM

Final cluster configuration

Performance and availability versus cost

Data flow

Handling static data sources

Storing operational data enriched with vector embeddings

Freshness and retention

Real-time updates

Data lifecycle

Adopting new embedding models

Security and RBAC

Best practices for AI/ML application design

Summary

Part 2115

Building Your Python Application: Frameworks, Libraries, APIs, and Vector Search115

7

Useful Frameworks, Libraries, and APIs117

Technical requirements

Python for AI/ML

AI/ML frameworks

LangChain

LangChain semantic search with score

Semantic search with pre-filtering

Implementing a basic RAG solution with LangChain

LangChain prompt templates and chains

Key Python libraries

pandas

PyMongoArrow

PyTorch

AI/ML APIs

OpenAI API

Hugging Face

Summary

8

Implementing Vector Search in AI Applications141

Technical requirements

Information retrieval with MongoDB Atlas Vector Search

Vector search tutorial in Python

Vector Search tutorial with LangChain

Building RAG architecture systems

Chunking or document-splitting strategies

Simple RAG

Advanced RAG

Summary

Part 3169

Optimizing AI Applications: Scaling, Fine-Tuning, Troubleshooting, Monitoring, and Analytics169

9

LLM Output Evaluation171

Technical requirements

What is LLM evaluation?

Component and end-to-end evaluations

Model benchmarking

Evaluation datasets

Defining a baseline

User feedback

Synthetic data

Evaluation metrics

Assertion-based metrics

Statistical metrics

LLM-as-a-judge evaluations

RAG metrics

Human review

Evaluations as guardrails

Summary

10

Refining the Semantic Data Model to Improve Accuracy203

Technical requirements

Embeddings

Experimenting with different embedding models

Fine-tuning embedding models

Embedding metadata

Formatting metadata

Including static metadata

Extracting metadata programmatically

Generating metadata with LLMs

Including metadata with query embedding and ingested content embeddings

Optimizing retrieval-augmented generation

Query mutation

Extracting query metadata for pre-filtering

Formatting ingested data

Advanced retrieval systems

Summary

11

Common Failures of Generative AI231

Technical requirements

Hallucinations

Causes of hallucinations

Implications of hallucinations

Sycophancy

Causes of sycophancy

Implications of sycophancy

Data leakage

Causes of data leakage

Implications of data leakage

Cost

Types of costs

Tokens

Performance issues in generative AI applications

Computational load

Model serving strategies

High I/O operations

Summary

12

Correcting and Optimizing Your Generative AI Application247

Technical requirements

Baselining

Training and evaluation datasets

Few-shot prompting

Retrieval and reranking

Late interaction strategies

Query rewriting

Testing and red teaming

Testing

Red teaming

Information post-processing

Other remedies

Summary

Appendix: Further Reading263

Index269

Why subscribe?

Other Books You May Enjoy276

Packt is searching for authors like you

Download a free PDF copy of this book

1

Getting Started with Generative AI

There are a plethora of options for building generative AI (GenAI) applications. The landscape is, quite frankly, overwhelming to navigate, and many of the tools that satisfy one criterion may fall short in another. GenAI applications evolve so quickly that within weeks of this book being published, some of the new AI companies might no longer exist. Therefore, this chapter focuses on long-lived, high-level concepts related to technologies that are used to create GenAI applications.

You will learn ways from which your next web development project might benefit. This chapter will examine not just what these ways are but how they work, which will give you a broader understanding and perspective of GenAI. This should help you decide when to use GenAI and how, as well as make the applications you create generally more accurate.

By the end of this chapter, you will have a good understanding of the benefits that individual AI/ML stack components bring to a development project, how they relate to each other, and why GenAI technologies are a revolution in software—both in terms of the data handled and desired functionalities.

This chapter gives an introduction to GenAI and provides a quick overview of the following topics:

Definitions for common terminologyA GenAI stack of choicePython and GenAIThe OpenAI APIAn introduction to MongoDB Vector SearchImportant features of GenAIWhy use GenAI?The ethics and risks of GenAI

Technical requirements

This book has sample code for a basic Python application. To recreate it, it is recommended that you have the following:

The latest version of PythonA local development environment on your device for your application serverA MongoDB Atlas cloud account to host your database. You can register for one at https://www.mongodb.com/cloud/atlas/registerVS Code or an IDE of your choiceAn OpenAI API key

Defining the terminology

For the true beginner, let’s start with defining some key terms: AI, ML, and GenAI. You will come across these terms repeatedly in this book, so it helps to have a strong conceptual foundation of these terms:

Artificial intelligence (AI) refers to the ability of machines to perform tasks that would normally require human intelligence. This includes tasks such as perception, reasoning, learning, and decision making. The journey of AI has evolved significantly from early speculative ideas to the sophisticated technologies of today. Figure 1.1 shows a timeline of the development of AI.

Figure 1.1: A timeline of AI

Machine learning (ML) is a subset of AI that involves the use of algorithms to automatically learn from data and improve over time. Essentially, it’s a way for machines to learn and adapt without being explicitly programmed. Most often used in fields that require advanced analysis of thousands of data points, ML is most useful in medical diagnostics, market analysis, and military intelligence. Effectively, ML identifies hidden or complex patterns in data that would be impossible for a human to see and then can make suggestions for the next steps or actions.Generative AI (GenAI) is the ability to create text, images, audio, video, and other content in response to a user prompt. It powers chatbots, virtual assistants, language translators, and other similar services. These systems use algorithms trained on vast amounts of data, such as text and images from the internet, to learn patterns and relationships. This enables them to generate new content that is similar but not identical to the underlying training data. For instance, large language models (LLMs) use training data to learn patterns in written language. GenAI can then use these models to emulate a human writing style.

The generative AI stack

A stack combines tools, libraries, software, and solutions to create a unified and integrated approach. The GenAI stack includes programming languages, LLM providers, frameworks, databases, and deployment solutions. Though the GenAI stack is relatively new, it already has many variations and options for engineers to choose from.

Let’s discuss what you need to build a functional GenAI application. The bare minimum requirements are the following, as also shown in Figure 1.2:

An operating system: Usually, this is Unix/Linux based.A storage layer: An SQL or NoSQL database. This book uses MongoDB.A vector database capable of storing embeddings: This book uses MongoDB, which stores its embeddings within your data or content, rather than in a separate database.A web server: Apache and Nginx are quite popular.A development environment: This could be Node.js/JavaScript, .NET, Java, or Python. This book uses Python throughout the examples with a bit of JavaScript where needed.

Figure 1.2: A basic GenAI stack

If you want to learn more about the AI stack, you can find detailed information at www.mongodb.com/resources/basics/ai-stack.

Python and GenAI

Python was conceived in the late 1980s by Guido van Rossum and officially released in 1991. Over the decades, Python has evolved into a versatile language, beloved by developers for its clean syntax and robust functionality. It has a clean syntax that is easy to understand, making it an ideal choice for beginner developers.

Although it is not entirely clear why, fairly early on, the Python ecosystem began introducing more libraries and frameworks that were tailored to ML and data science. Libraries and frameworks such as TensorFlow, Keras, PyTorch, and scikit-learn provided powerful tools for developers in these fields. Analysts who were less technical were still able to get started with Python with relative ease. Due to its interoperability, Python seamlessly integrated with other programming languages and technologies, making it easier to integrate with data pipelines and web applications.

GenAI, with its demands for high computational power and sophisticated algorithms, finds a perfect partner in Python. Here are some examples that readily come to mind:

Libraries such as Pandas and NumPy allow efficient manipulation and analysis of large datasets, a fundamental step in training generative modelsFrameworks such as TensorFlow and PyTorch offer pre-built components to design and train complex neural networksTools such as Matplotlib and Seaborn enable detailed visualization of data and model outputs, aiding in understanding and refining AI modelsFrameworks such as Flask and FastAPI make deploying your GenAI models as scalable web services straightforward

Python has a rich ecosystem that is easy to use and allows you to quickly get started, making it an ideal programming language for GenAI projects. Now, let’s talk more about the other pieces of technology you’ll be using throughout the rest of the book.

OpenAI API

The first, and most important, tool of this book is the OpenAI API. In the following chapters, you’ll learn more about each component of the GenAI stack—and the most critical to be familiar with is OpenAI. While we’ll cover other LLM providers, the one used in our examples and code repository will be OpenAI.

The OpenAI API, launched in mid-2020, provides developers with access to their powerful models, allowing integration of advanced NLP capabilities into applications. Through this API, developers gain access to some of the most advanced AI models in existence, such as GPT-4. These models are trained on vast datasets and possess unparalleled capabilities in natural language understanding and response generation.

Moreover, OpenAI’s infrastructure is built to scale. As your project grows and demands more computational power, OpenAI ensures that you can scale effortlessly without worrying about the underlying hardware or system architecture. OpenAI’s models excel at NLP tasks, including text generation, summarization, translation, and sentiment analysis. This can be invaluable for creating content, chatbots, virtual assistants, and more.

Much of the data from the internet and internal conversations and documentation is unstructured. OpenAI, as a company, has used that data to train an LLM, and then offered that LLM as a service, making it possible for you to create interactive GenAI applications without hosting or training your own LLM. You’ll learn more about LLMs in Chapter 3, Large Language Models.

MongoDB with Vector Search

Much has been said about how MongoDB serves the use case of unstructured data but that the world’s data is fundamentally relational. It can be argued that no data is meaningful until humans deem it so, and that the relationships and structure of that data are determined by humans as well. For example, several years ago, a researcher at a leading space exploration company made this memorable comment in a meeting:

“We scraped text content from websites and PDF documents primarily, and we realized it didn’t really make sense to try and cram that data into a table.”

MongoDB thrives with the messy, unstructured content that characterizes the real world—.txt files, Markdown, PDFs, HTML, and so on. MongoDB is flexible enough to have the structure that engineers deem is best suited for purpose, and because of that flexibility, it is a great fit for GenAI use cases.

For that reason, it is much easier to use a document database for GenAI than it is to use a SQL database.

Another reason to use MongoDB is for its vector search capabilities. Vector search ensures that when you store a phrase in MongoDB, it converts that data into an array. This is called a vector. Vectors are numerical representations of data and their context, as shown in Figure 1.3. The number of these dimensions is referred to as an embedding, and the more of them you have, the better off you are.

Figure 1.3: Example of a vector

After you’ve created embeddings for a piece of data, a mathematical process will identify which vectors are closest or nearest to each other, and you can then infer that the data is related. This allows you to return related words instead of only exact matches. For instance, if you are looking for pets, you could find cats, dogs, parakeets, and hamsters—even though those terms are not the exact word pets. Vectors are what allow you to receive results that are related in meaning or context or are alike, without being an exact match.

MongoDB stores your data embeddings alongside the data itself. Storing the embeddings together makes the consequent queries faster. It is easiest to visualize vector search via an example with explanations of how it works along the way. You will learn more about vector search in Chapter 8, Implementing Vector Search in AI Applications.

Important features of generative AI

When asked to list the most important capability of GenAI applications, ChatGPT, which is arguably the most popular GenAI application in existence, said the following:

Content Creation: Generative AI can craft text, images, music, and even videos. It can pen articles, generate realistic images from textual descriptions, compose music, and create video content, opening endless possibilities for creative industries.

That response took 1.5 seconds to generate, and most people would agree with it. GenAI applications can create content for you and your users with lightning speed. Whether it’s text, video, images, artwork, or even Java code, GenAI is able to easily draft foundational content that can then be edited by professionals.

But there are other key features of GenAI applications that merit calling out as well:

Language translation: With remarkable proficiency, GenAI can translate languages in real time, preserving context and nuance, and facilitating seamless communication across linguistic barriers.Personalization: In the realm of marketing and customer service, GenAI can tailor experiences and content to individual users. When given proper context, it can analyze preferences and behaviors to deliver personalized recommendations, emails, and customer interactions.Simulation and modeling: In scientific research and engineering, GenAI can simulate complex systems and phenomena. It aids in predicting molecular behaviors, climate patterns, and even economic trends by generating realistic models based on vast datasets.Data augmentation: For ML, GenAI can produce synthetic data to augment training sets. This is invaluable in scenarios where real data is scarce or biased, allowing for the creation of diverse and balanced datasets to improve model performance. This is incredibly useful for testing purposes, particularly in software testing.

And perhaps most importantly, it can accept prompting in natural language (such as in English) to do these tasks. This makes performing tasks you previously found difficult incredibly easy. You may use GenAI to accomplish multiple and varied tasks in a day, such as reviewing a pull request, guiding you through some tasks for Golang, and generating illustrations for the interior artwork of a book.

Why use generative AI?

Each of the preceding abilities is compelling and important, and when used correctly and in combination, revolutionary. Put simply, there is no industry where GenAI cannot play a role. By rapidly aggregating and summarizing a wide range of content and simplifying searching, GenAI improves the user experience of finding ideas and building knowledge. It can help gather new information, summarize it, and recraft it into content. It can help speed up or even automate administrative tasks, and exponentially increase output.

But beyond all of that, the experience of using GenAI is an order of magnitude better than what is available today. Consider, for example, a customer service bot. Many of you will be familiar with this flow:

The customer first encounters a long menu of options: If you want to talk to sales or support, press 1. For billing, press 2. For administration, press 3. For orders, press 4.. When the customer has a question that does not neatly fit into any category, they may press 4 anyway.Upon pressing 4, they are then routed to a support page that does not have the answer they seek. They click a button that says, No, this did not answer my question.They search the knowledge base themselves, perhaps never finding the answer and reaching out via phone.

Imagine being able to type what you wanted and the bot responding in a natural way—not routing you to a page but just giving you the answer. Imagine even further that the user can then chat with the bot to say they want to modify the address on their order, and the bot is able to do that from within the chat window, having a multi-step dialogue with the user to confirm and record their new information.

It is a wholly new, more pleasing experience for the customer!

The ethics and risks of GenAI

Despite those benefits, there are risks and concerns about the use of AI. In some fields, the outcry against AI is substantial and has merit. Art generated by AI, for example, flooded the internet’s marketplaces, displacing artists and illustrators who make their living off their craft. There are questions about whether using AI to write a book gives a person the right to call themselves an author. There are no clear-cut answers here; from our own experience, the authors of this book believe that GenAI accelerates, rather than replaces, the existing paradigms of work done today. But that may not always remain true. As AI improves, it may be more likely to replace the humans who are using it.

The risks of GenAI are considerable, and some of them are not well understood. Even the ones that are well understood, such as hallucinations, are difficult to identify for users, and harder still to combat. You can read more about the challenges of GenAI in Chapter 11, Common Failures of Generative AI, along with recommendations on how to mitigate them in Chapter 12, Correcting and Optimizing Your Generative AI Application.

Summary

This chapter laid the background for the GenAI application, from describing the role of each component to their strengths. You learned some key definitions and were introduced to the basics of the AI stack. By now, you also know why Python is a great choice for building GenAI applications and why you will be using the OpenAI API and MongoDB with Vector Search to build your GenAI application. Finally, you also saw some significant use cases for GenAI and learned why you should use GenAI in the first place, while also remaining mindful of the ethics and risks of using it. Since you’re reading this, I’ll assume that the case was compelling—that you’re still interested and ready to explore.

In the next chapter, you will get a fast-paced, concise, and actionable overview of the building blocks of GenAI applications in more detail and learn how to get started.

2

Building Blocks of Intelligent Applications

In the rapidly evolving landscape of software development, a new class of applications is emerging: intelligent applications. Intelligent applications are a superset of traditional full stack applications. These applications use artificial intelligence (AI) to deliver highly personalized, context-aware experiences that go beyond the capabilities of traditional software.

Intelligent applications understand complex, unstructured data and use this understanding to make decisions and create natural, adaptive interactions.

The goal of this chapter is to provide you with an overview of the logical and technical building blocks of intelligent applications. The chapter explores how intelligent applications extend the capability of traditional full-stack applications, the core structures that define them, and how these components function to create dynamic, context-aware experiences. By the end of this chapter, you will understand how these components fit together to form an intelligent application.

This chapter covers the following topics:

The building blocks of intelligent applicationsLLMs as reasoning engines for intelligent applicationsVector embedding models and vector databases as semantic long-term memoryModel hosting infrastructure

Technical requirements

This chapter is theoretical. It covers the logical components of intelligent applications and how they fit together.

This chapter assumes fundamental knowledge of traditional full stack application development components, such as servers, clients, databases, and APIs.

Defining intelligent applications

Traditional applications typically consist of a client-side user interface, a server-side backend, and a database for data storage and retrieval. They perform tasks following a strict set of instructions. Intelligent applications require a client, server, and database as well, but they augment the traditional stack with AI components.

Intelligent applications stand out by understanding complex, unstructured data to enable natural, adaptive interactions and decision-making. Intelligent applications can engage in open-ended interactions, generate novel content, and make autonomous decisions.

Examples of intelligent applications include the following:

Chatbots that provide natural language responses based on external data using retrieval-augmented generation (RAG). For example, Perplexity.ai (https://www.perplexity.ai/) is an AI-powered search engine and chatbot that provides users with AI-generated answers to their queries based on sources retrieved from the web.Content generators that let you use natural language prompts to create media such as images, video, and audio. There are a variety of intelligent content generators focusing on different media types, such as Suno (https://suno.com/) for text-to-song, Midjourney (https://www.midjourney.com/home) for text-to-image, and Runway (https://runwayml.com/) for text-to-video.Recommendation systems that use customer data to provide personalized suggestions based on their preferences and history. These suggestions can be augmented with natural language to further personalize the customer experience. An example of this is Spotify’s AI DJ (https://support.spotify.com/us/article/dj/), which creates a personalized radio station, including LLM-generated DJ interludes, based on your listening history.

These examples are a few early glances at the new categories of intelligent applications that developers have only started to build. In the next section, you will learn more about the core components of intelligent applications.

The building blocks of intelligent applications

At the heart of intelligent applications are two key building blocks:

The reasoning engine: The reasoning engine is the brain of an intelligent application, responsible for understanding user input, generating appropriate responses, and making decisions based on available information. The reasoning engine is typically powered by large language models (LLMs)—AI models that perform text completion. LLMs can understand user intent, generate human-like responses, and perform complex cognitive tasks.Semantic memory: Semantic memory refers to the application’s ability to store and retrieve information in a way that preserves its meaning and relationships, enabling the reasoning engine to access relevant context as needed.

Semantic memory consists of two core components:

AI vector embedding model: AI vector embedding models represent the semantic meaning of unstructured data, such as text or images, in large arrays of numbers.Vector database: Vector databases efficiently store and retrieve vectors to support semantic search and context retrieval.

The reasoning engine can retrieve and store relevant information from the semantic memory, using unstructured data to inform its outputs.

The LLMs and embedding models that power intelligent applications have different hardware requirements than traditional applications, especially at scale. Intelligent applications require specialized model hosting infrastructure that can handle the unique hardware and scalability requirements of AI workloads. Intelligent applications also incorporate continuous learning, safety monitoring, and human feedback to ensure quality and integrity.

LLMs are the vital organ for intelligent applications. The next section will provide a deeper understanding of the role of LLMs in intelligent applications.

LLMs – reasoning engines for intelligent apps

LLMs are the key technology of intelligent applications, unlocking whole new classes of AI-powered systems. These models are trained on vast amounts of text data to understand language, generate human-like text, answer questions, and engage in dialogue.

LLMs undergo continuous improvement with the release of new models. featuring billions or trillions of parameters and enhanced reasoning, memory, and multi-modal capabilities.

Use cases for LLM reasoning engines

LLMs have emerged as a powerful general-purpose technology for AI systems, analogous to the central processing unit (CPU) in traditional computing. Much like CPUs, LLMs serve as general-purpose computational engines that can be programmed for many tasks and play a similar role in language-based reasoning and generation. The general-purpose nature of LLMs lets developers use their capabilities for a wide range of reasoning tasks.

A crop of techniques to leverage the diverse abilities of LLMs have emerged, such as:

Prompt engineering: Using carefully crafted prompts, developers can steer LLMs to perform a wide range of language tasks. A key advantage of prompt engineering is its iterative nature. Since prompts are fundamentally just text, it’s easy to rapidly experiment with different prompts and see the results. Advanced prompt engineering techniques, such as chain-of-thought prompting (which encourages the model to break down its reasoning into a series of steps) and multi-shot prompting (which provides the model with example input/output pairs), can further enhance the quality and reliability of LLM-generated text.Fine-tuning: Fine-tuning involves starting with a pre-trained general-purpose model and further training it on a smaller dataset relevant to the target task. This can yield better results than prompt engineering alone, but it comes with certain caveats, such as being more expensive and time-consuming. You should only fine-tune after exhausting what you can achieve through prompt engineering.Retrieval augmentation: Retrieval augmentation interfaces LLMs with external knowledge, allowing them to draw on up-to-date, domain-specific information. In this approach, relevant information is retrieved from a knowledge base and injected into the prompt, enabling the LLM to generate contextually relevant outputs. Retrieval augmentation mitigates the limitations of the static pre-training of LLMs, keeping their knowledge updated and reducing the likelihood of the model hallucinating incorrect information.

With these techniques, you can use LLMs for a diverse array of tasks. The next section explores current use cases for LLMs.

Diverse capabilities of LLMs

While fundamentally just language models, LLMs have shown surprising emergent capabilities (https://arxiv.org/pdf/2307.06435). As of writing in spring 2024, state-of-the-art language models are capable of performing tasks of the following categories:

Text generation and completion: Given a prompt, LLMs can generate coherent continuations, making them useful for tasks such as content creation, text summarization, and code completion.Open-ended dialogue and chat: LLMs can engage in back-and-forth conversations, maintaining context and handling open-ended user queries and follow-up questions. This capability is foundational for chatbots, virtual assistants, tutoring systems, and similar applications.Question answering: LLMs can provide direct answers to user questions, perform research, and synthesize information to address queries.Classification and sentiment analysis: LLMs can classify text into predefined categories and assess sentiment, emotion, and opinion. This enables applications such as content moderation and customer feedback analysis.Data transformation and extraction: LLMs can map unstructured text into structured formats and extract key information, such as named entities, relationships, and events. This makes LLMs valuable for tasks such as data mining, knowledge graph construction, and robotic process automation (RPA).

As LLMs continue to grow in scale and sophistication, new capabilities are constantly emerging, often in surprising ways that were not directly intended by the original training objective.

For example, the ability of GPT-3 to generate functioning code was an unexpected discovery. With advancements in the field of LLMs, we can expect to see more impressive and versatile capabilities emerge, further expanding the potential of intelligent applications.

Multi-modal language models

Multi-modal language models hold particular promise for expanding the capabilities of language models. Multi-modal models can process and generate images, speech, and video in addition to text, and have become an important component of intelligent applications.

Examples of new application categories made possible with multi-modal models include the following:

Creating content based on multiple input types, such as a chatbot where users can provide both images and text as inputs.Advanced data analysis, such as a medical diagnosis tool that analyzes X-rays along with medical records.