Building Data-Driven Applications with LlamaIndex - Andrei Gheorghiu - E-Book

Building Data-Driven Applications with LlamaIndex E-Book

Andrei Gheorghiu

0,0
32,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Discover the immense potential of Generative AI and Large Language Models (LLMs) with this comprehensive guide. Learn to overcome LLM limitations, such as contextual memory constraints, prompt size issues, real-time data gaps, and occasional ‘hallucinations’. Follow practical examples to personalize and launch your LlamaIndex projects, mastering skills in ingesting, indexing, querying, and connecting dynamic knowledge bases. From fundamental LLM concepts to LlamaIndex deployment and customization, this book provides a holistic grasp of LlamaIndex's capabilities and applications. By the end, you'll be able to resolve LLM challenges and build interactive AI-driven applications using best practices in prompt engineering and troubleshooting Generative AI projects.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 532

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Building Data-Driven Applications with LlamaIndex

A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications

Andrei Gheorghiu

Building Data-Driven Applications with LlamaIndex

As this ebook edition doesn't have fixed pagination, the page numbers below are hyperlinked for reference only, based on the printed edition of this book.

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Niranjan Naikwadi

Publishing Product Manager: Nitin Nainani

Book Project Manager: Aparna Ravikumar Nair

Content Development Editor: Priyanka Soam

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Shankar Kalbhor

DevRel Marketing Coordinator: Vinishka Kalra

First published: May 2024

Production reference: 1150424

Published by

Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK

ISBN 978-1-83508-950-7

www.packtpub.com

For the past six months, the focus required to create this book has sadly kept me away from the people I love. To my family and friends, your understanding and support have been my harbor in the storm of long hours and endless revisions.

Andreea, your love has been the gentle beacon guiding me through this journey. To my daughter Carla and every young reader out there: never stop learning! Life is a journey with so many possible destinations. Make sure you are the one choosing yours. My dear friends at ITAcademy, you guys rock! Thanks for supporting me along the way. Also, finalizing this book would not have been possible without the dedicated efforts and unwavering commitment of the Packt team. I extend my heartfelt gratitude to everyone involved in this project.

– Andrei Gheorghiu

Contributors

About the author

Andrei Gheorghiu is a seasoned IT professional and accomplished trainer at ITAcademy with over two decades of experience as a trainer, consultant, and auditor. With an impressive array of certifications, including ITIL Master, CISA, ISO 27001 Lead Auditor, and CISSP, Andrei has trained thousands of students in IT service management, information security, IT governance, and audit. His consulting experience spans the implementation of ERP and CRM systems, as well as conducting security assessments and audits for various organizations. Andrei’s passion for groundbreaking innovations drives him to share his vast knowledge and offer practical advice on leveraging technology to solve real-world challenges, particularly in the wake of recent advancements in the field. As a forward-thinking educator, his main goal is to help people upskill and reskill in order to increase their productivity and remain relevant in the age of AI.

About the reviewers

Rajesh Chettiar, holding a specialization in AI and ML, brings over 13 years of experience in machine learning, Generative AI, automation, and ERP solutions. He is passionate about keeping up with cutting-edge advancements in AI and is committed to improving his skills to foster innovation.

Rajesh resides in Pune with his parents, his wife, Pushpa, and his son, Nishith. In his free time, he likes to play with his son, watch movies with his family, and go on road trips. He also has a fondness for listening to Bollywood music.

Elliot helped write some of the LlamaIndexTS (Typescript version of LlamaIndex) codebase. He is actively looking to take on new generative AI projects (as of early 2024), he is available on GitHub and Linkedin.

I thank the Lord for everything. Thank you, Dad, Mom, and twin sister for your amazing support. Thank you to my friends who gave me their honest opinions and helped me grow. Thank you Yi Ding at LlamaIndex for helping me start this GenAI journey, and Yujian Tang for introducing me to Yi and always being supportive of open-source. Finally, thank you to everyone who has reached out to talk about generative AI; I learn new things every day from each of you

Srikannan Balakrishnan is an experienced AI/ML professional and a technical writer with a passion for translating complex information into simpler forms. He has a background in data science, including AI/ML, which fuels his ability to understand the intricacies of the subject matter and present it in a way that is accessible to both technical and non-technical audiences. He also has experience in Generative AI and has worked with different clients to solve their business problems with the power of data and AI. Beyond his technical expertise, he is a skilled communicator with a keen eye for detail. He is dedicated to crafting user-friendly documentation that empowers readers to grasp new concepts and navigate complex systems with confidence.

Arijit Das is an experienced Data Scientist with over 5 years of commercial experience, providing data-driven solutions to Fortune 500 clients across the US, UK, and EU. With expertise in Finance, Banking, Logistics, and HR management, Arijit excels in the Data Science lifecycle, from data extraction to model deployment and MLOps. Proficient in Supervised and Unsupervised ML techniques, including NLP, Arijit is currently focused on implementing cutting-edge ML practices at Citi globally.

Table of Contents

Preface

Part 1: Introduction to Generative AI and LlamaIndex

1

Understanding Large Language Models

Introducing GenAI and LLMs

What is GenAI?

What is an LLM?

Understanding the role of LLMs in modern technology

Exploring challenges with LLMs

Augmenting LLMs with RAG

Summary

2

LlamaIndex: The Hidden Jewel - An Introduction to the LlamaIndex Ecosystem

Technical requirements

Optimizing language models – the symbiosis of fine-tuning, RAG, and LlamaIndex

Is RAG the only possible solution?

What LlamaIndex does

Discovering the advantages of progressively disclosing complexity

An important aspect to consider

Introducing PITS – our LlamaIndex hands-on project

Here’s how it will work

Preparing our coding environment

Installing Python

Installing Git

Installing LlamaIndex

Signing up for an OpenAI API key

Discovering Streamlit – the perfect tool for rapid building and deployment!

Installing Streamlit

Finishing up

One final check

Familiarizing ourselves with the structure of the LlamaIndex code repository

Summary

Part 2: Starting Your First LlamaIndex Project

3

Kickstarting Your Journey with LlamaIndex

Technical requirements

Uncovering the essential building blocks of LlamaIndex – documents, nodes, and indexes

Documents

Nodes

Manually creating the Node objects

Automatically extracting Nodes from Documents using splitters

Nodes don’t like to be alone – they crave relationships

Why are relationships important?

Indexes

Are we there yet?

How does this actually work under the hood?

A quick recap of the key concepts

Building our first interactive, augmented LLM application

Using the logging features of LlamaIndex to understand the logic and debug our applications

Customizing the LLM used by LlamaIndex

Easy as 1-2-3

The temperature parameter

Understanding how Settings can be used for customization

Starting our PITS project – hands-on exercise

Let’s have a look at the source code

Summary

4

Ingesting Data into Our RAG Workflow

Technical requirements

Ingesting data via LlamaHub

An overview of LlamaHub

Using the LlamaHub data loaders to ingest content

Ingesting data from a web page

Ingesting data from a database

Bulk-ingesting data from sources with multiple file formats

Parsing the documents into nodes

Understanding the simple text splitters

Using more advanced node parsers

Using relational parsers

Confused about node parsers and text splitters?

Understanding chunk_size and chunk_overlap

Including relationships with include_prev_next_rel

Practical ways of using these node creation models

Working with metadata to improve the context

SummaryExtractor

QuestionsAnsweredExtractor

TitleExtractor

EntityExtractor

KeywordExtractor

PydanticProgramExtractor

MarvinMetadataExtractor

Defining your custom extractor

Is having all that metadata always a good thing?

Estimating the potential cost of using metadata extractors

Follow these simple best practices to minimize your costs

Estimate your maximal costs before running the actual extractors

Preserving privacy with metadata extractors, and not only

Scrubbing personal data and other sensitive information

Using the ingestion pipeline to increase efficiency

Handling documents that contain a mix of text and tabular data

Hands-on – ingesting study materials into our PITS

Summary

5

Indexing with LlamaIndex

Technical requirements

Indexing data – a bird’s-eye view

Common features of all Index types

Understanding the VectorStoreIndex

A simple usage example for the VectorStoreIndex

Understanding embeddings

Understanding similarity search

OK, but how does LlamaIndex generate these embeddings?

How do I decide which embedding model I should use?

Persisting and reusing Indexes

Understanding the StorageContext

The difference between vector stores and vector databases

Exploring other index types in LlamaIndex

The SummaryIndex

The DocumentSummaryIndex

The KeywordTableIndex

The TreeIndex

The KnowledgeGraphIndex

Building Indexes on top of other Indexes with ComposableGraph

How to use the ComposableGraph

A more detailed description of this concept

Estimating the potential cost of building and querying Indexes

Indexing our PITS study materials – hands-on

Summary

Part 3: Retrieving and Working with Indexed Data

6

Querying Our Data, Part 1 – Context Retrieval

Technical requirements

Learning about query mechanics – an overview

Understanding the basic retrievers

The VectorStoreIndex retrievers

The DocumentSummaryIndex retrievers

The TreeIndex retrievers

The KnowledgeGraphIndex retrievers

Common characteristics shared by all retrievers

Efficient use of retrieval mechanisms – asynchronous operation

Building more advanced retrieval mechanisms

The naive retrieval method

Implementing metadata filters

Using selectors for more advanced decision logic

Understanding tools

Transforming and rewriting queries

Creating more specific sub-queries

Understanding the concepts of dense and sparse retrieval

Dense retrieval

Sparse retrieval

Implementing sparse retrieval in LlamaIndex

Discovering other advanced retrieval methods

Summary

7

Querying Our Data, Part 2 – Postprocessing and Response Synthesis

Technical requirements

Re-ranking, transforming, and filtering nodes using postprocessors

Exploring how postprocessors filter, transform, and re-rank nodes

SimilarityPostprocessor

KeywordNodePostprocessor

PrevNextNodePostprocessor

LongContextReorder

PIINodePostprocessor and NERPIINodePostprocessor

MetadataReplacementPostprocessor

SentenceEmbeddingOptimizer

Time-based postprocessors

Re-ranking postprocessors

Final thoughts about node postprocessors

Understanding response synthesizers

Implementing output parsing techniques

Extracting structured outputs using output parsers

Extracting structured outputs using Pydantic programs

Building and using query engines

Exploring different methods of building query engines

Advanced uses of the QueryEngine interface

Hands-on – building quizzes in PITS

Summary

8

Building Chatbots and Agents with LlamaIndex

Technical requirements

Understanding chatbots and agents

Discovering ChatEngine

Understanding the different chat modes

Implementing agentic strategies in our apps

Building tools and ToolSpec classes for our agents

Understanding reasoning loops

OpenAIAgent

ReActAgent

How do we interact with agents?

Enhancing our agents with the help of utility tools

Using the LLMCompiler agent for more advanced scenarios

Using the low-level Agent Protocol API

Hands-on – implementing conversation tracking for PITS

Summary

Part 4: Customization, Prompt Engineering, and Final Words

9

Customizing and Deploying Our LlamaIndex Project

Technical requirements

Customizing our RAG components

How LLaMA and LLaMA 2 changed the open source landscape

Running a local LLM using LM Studio

Routing between LLMs using services such as Neutrino or OpenRouter

What about customizing embedding models?

Leveraging the Plug and Play convenience of using Llama Packs

Using the Llama CLI

Using advanced tracing and evaluation techniques

Tracing our RAG workflows using Phoenix

Evaluating our RAG system

Introduction to deployment with Streamlit

HANDS-ON – a step-by-step deployment guide

Deploying our PITS project on Streamlit Community Cloud

Summary

10

Prompt Engineering Guidelines and Best Practices

Technical requirements

Why prompts are your secret weapon

Understanding how LlamaIndex uses prompts

Customizing default prompts

Using advanced prompting techniques in LlamaIndex

The golden rules of prompt engineering

Accuracy and clarity in expression

Directiveness

Context quality

Context quantity

Required output format

Inference cost

Overall system latency

Choosing the right LLM for the task

Common methods used for creating effective prompts

Summary

11

Conclusion and Additional Resources

Other projects and further learning

The LlamaIndex examples collection

Moving forward – Replit bounties

The power of many – the LlamaIndex community

Key takeaways, final words, and encouragement

On the future of RAG in the larger context of generative AI

A small philosophical nugget for you to consider

Summary

Index

Other Books You May Enjoy

Part 1:Introduction to Generative AI and LlamaIndex

As this ebook edition doesn't have fixed pagination, the page numbers below are hyperlinked for reference only, based on the printed edition of this book.

This first part begins by introducing generative AI and Large Language Models (LLMs), discussing their ability to produce human-like text, their limitations, and how Retrieval-Augmented Generation (RAG) can address these issues by enhancing accuracy, reasoning, and relevance. We then progress to understand how LlamaIndex leverages RAG to bridge the gap between LLMs’ extensive knowledge and proprietary data, elevating the potential of interactive AI applications.

This part has the following chapters:

Chapter 1, Understanding Large Language ModelsChapter 2, LlamaIndex: The Hidden Jewel - An Introduction to the LlamaIndex Ecosystem

1

Understanding Large Language Models

If you are reading this book, you have probably explored the realm of large language models (LLMs) and already recognize their potential applications as well as their pitfalls. This book aims to address the challenges LLMs face and provides a practical guide to building data-driven LLM applications with LlamaIndex, taking developers from foundational concepts to advanced techniques for implementing retrieval-augmented generation (RAG) to create high-performance interactive artificial intelligence (AI) systems augmented by external data.

This chapter introduces generative AI (GenAI) and LLMs. It explains how LLMs generate human-like text after training on massive datasets. We’ll also overview LLM capabilities, limitations such as outdated knowledge potential for false information, and lack of reasoning. You’ll be introduced to RAG as a potential solution, combining retrieval models using indexed data with generative models to increase fact accuracy, logical reasoning, and context relevance. Overall, you’ll gain a basic LLM understanding and learn about RAG as a way to overcome some LLM weaknesses, setting the stage for utilizing LLMs practically.

In this chapter, we will cover the following main topics:

Introducing GenAI and LLMsUnderstanding the role of LLMs in modern technologyExploring challenges with LLMsAugmenting LLMs with RAG

Introducing GenAI and LLMs

Introductions are sometimes boring, but here, it is important for us to set the context and help you familiarize yourself with GenAI and LLMs before we dive deep into LlamaIndex. I will try to be as concise as possible and, if the reader is already familiar with this information, I apologize for the brief digression.

What is GenAI?

GenAI refers to systems that are capable of generating new content such as text, images, audio, or video. Unlike more specialized AI systems that are designed for specific tasks such as image classification or speech recognition, GenAI models can create completely new assets that are often very difficult – if not impossible – to distinguish from human-created content.

These systems use machine learning (ML) techniques such as neural networks (NNs) that are trained on vast amounts of data. By learning patterns and structures within the training data, generative models can model the underlying probability distribution of the data and sample from this distribution to generate new examples. In other words, they act as big prediction machines.

We will now discuss LLMs, which are one of the most popular fields in GenAI.

What is an LLM?

One of the most prominent and rapidly advancing branches of GenAI is natural language generation (NLG) through LLMs (Figure 1.1):

Figure 1.1 – LLMs are a sub-branch of GenAI

LLMs are NNs that are specifically designed and optimized to understand and generate human language. They are large in the sense that they are trained on massive amounts of text containing billions or even trillions of words scraped from the internet and other sources. Larger models show increased performance on benchmarks, better generalization, and new emergent abilities. In contrast with earlier, rule-based generation systems, the main distinguishing feature of an LLM is that it can produce novel, original text that reads naturally.

By learning patterns from many sources, LLMs acquire various language skills found in their training data – from nuanced grammar to topic knowledge and even basic common-sense reasoning. These learned patterns allow LLMs to extend human-written text in contextually relevant ways. As they keep improving, LLMs create new possibilities for automatically generating natural language (NL) content at scale.

During the training process, LLMs gradually learn probabilistic relationships between words and rules that govern language structure from their huge dataset of training data. Once trained, they are able to generate remarkably human-like text by predicting the probability of the next word in a sequence, based on the previous words. In many cases, the text they generate is so natural that it makes you wonder: aren’t we humans just a similar but more sophisticated prediction machine? But that’s a topic for another book.

One of the key architectural innovations is the transformer (that is the T in GPT), which uses an attention mechanism to learn contextual relationships between words. Attention allows the model to learn long-range dependencies in text. It’s like if you’re listening carefully in a conversation, you pay attention to the context to understand the full meaning. This means they understand not just words that are close together but also how words that are far apart in a sentence or paragraph relate to each other.

Attention allows the model to selectively focus on relevant parts of the input sequence when making predictions, thus capturing complex patterns and dependencies within the data. This feature makes it possible for particularly large transformer models (with many parameters and trained on massive datasets) to demonstrate surprising new abilities such as in-context learning, where they can perform tasks with just a few examples in their prompt. To learn more about transformers and Generative Pre-trained Transformer (GPT), you can refer to Improving Language Understanding with unsupervised learning– Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever (https://openai.com/research/language-unsupervised).

The best-performing LLMs such as GPT-4, Claude 2.1, and Llama 2 contain trillions of parameters and have been trained on internet-scale datasets using advanced deep learning (DL) techniques. The resulting model has an extensive vocabulary and a broad knowledge of language structure such as grammar and syntax, and about the world in general. Thanks to their unique traits, LLMs are able to generate text that is coherent, grammatically correct, and semantically relevant. The outputs they produce may not always be completely logical or factually accurate, but they usually read convincingly like being written by a human. But it’s not all about size. The quality of data and training algorithms – among others – can also play a huge role in the resulting performance of a particular model.

Many models feature a user interface that allows for response generation through prompts. Additionally, some offer an API for developers to access the model programmatically. This method will be our primary focus in the upcoming chapters of our book.

Next up, we’ll talk about how LLMs are making big changes in tech. They’re helping not just big companies but everyone. Curious? Let’s keep reading.

Understanding the role of LLMs in modern technology

Oh! What good times we are living in. There has never been a more favorable era for small businesses and entrepreneurs. Given the enormous potential of this technology, it’s a real miracle that, instead of ending up strictly under the control of large corporations or governments, it is literally within everyone’s reach. Now, it’s truly possible for almost anyone – even a non-technical person – to realize their ideas and solve problems that until now seemed impossible to solve without a huge amount of resources.

The disruptive potential that LLMs have – in almost all industries – is enormous.

It’s true: there are concerns that this technology could replace us. However, technology’s role is to make lives easier, taking over repetitive activities. As before, we’ll likely do the same things, only much more efficiently and better with LLMs’ help. We will do more with less.

I would dare say that LLMs have become the foundation of NLG technology. They can already power chatbots, search engines, coding assistants, text summarization tools, and other applications that synthesize written text interactively or automatically. And their capabilities keep advancing rapidly with bigger datasets and models.

And then, there are also the agents. These automated wonders are capable of perceiving and interpreting stimuli from the digital environment – and not just digital – to make decisions and act accordingly. Backed by the power of an LLM, intelligent agents can solve complex problems and fundamentally change the way we interact with technology. We’ll cover this topic in more detail throughout Chapter 8, Building Chatbots and Agents with LlamaIndex.

Despite their relatively short existence, LLMs have already proven to be remarkably versatile and powerful. With the right techniques and prompts, their output can be steered in useful directions at scale. LLMs are driving innovation in numerous fields as their generative powers continue to evolve. Their capabilities keep expanding from nuanced dialog to multimodal intelligence. And, at the moment, the LLM-powered wave of innovation across industries and technologies shows no signs of slowing down.

The Gartner Hype Cycle model serves as a strategic guide for technology leaders, helping them evaluate new technologies not just on their merits but also in the context of their organization’s specific needs and goals (https://www.gartner.com/en/research/methodologies/gartner-hype-cycle).

Judging by current adoption levels, LLMs are currently well into the Slope of Enlightenment stage, ready to take off into the Plateau of Productivity – where mainstream adoption really starts to take off (Figure 1.2). Companies are becoming more pragmatic about their application, focusing on specialized use cases where they offer the most value:

Figure 1.2 – The Gartner Hype Cycle

But, unlike other more specific technologies, LLMs are rather a new form of infrastructure – a kind of ecosystem where new concepts will be able to manifest and, undoubtedly, revolutionary applications will be born.

This is their true potential, and this is the ideal time to learn how to take advantage of the opportunities they offer.

Before we jump into innovative solutions that could maximize LLMs’ capabilities, let’s take a step back and look at some challenges and limitations.

Exploring challenges with LLMs

Not all the news is good, however. It’s time to also discuss the darker side of LLMs.

These models do have important limitations and some collateral effects too. Here is a list of the most important ones, but please consider it non-exhaustive. There may be others not included here, and the order is arbitrarily chosen:

They lack access to real-time data.LLMs are trained on a static dataset, meaning that the information they have is only as up to date as the data they were trained on, which might not include the latest news, scientific discoveries, or social trends.This limitation can be critical when users seek real-time or recent information, as the LLMs might provide outdated or irrelevant responses. Furthermore, even if they cite data or statistics, these numbers are likely to have changed or evolved, leading to potential misinformation.

Note

While recent features introduced by OpenAI, for example, allow the underlying LLM to integrate with Bing to retrieve fresh context from the internet, that’s not an inherent feature of the LLM but rather an augmentation provided by the ChatGPT interface.

This lack of real-time updating also means that LLMs – by themselves – are not suited for tasks such as live customer service queries that may require real-time access to user data, inventory levels, or system statuses, for example. They have no intrinsic way of distinguishing factual truth from falsehoods.Without proper monitoring, they can generate convincing misinformation. And trust me – they don’t do it on purpose. In very simple terms, LLMs are basically just looking for words that fit together.Check out Figure 1.3 for an example of how one of the previous versions of the GPT-3.5 model would produce false information:

Figure 1.3 – Screenshot from a GPT 3.5-turbo-instruct playground

As these models stochastically (randomly) generate text, their outputs are not guaranteed to be completely logical, factual, or harmless. Also, the training data inherently biases the model, and LLMs may generate toxic, incorrect, or nonsensical text without warning. Since this data sometimes includes unsavory elements of online discourse, LLMs risk amplifying harmful biases and toxic content present in their training data.

Note

While this kind of result may be easily achieved in a playground environment, using an older AI model, OpenAI’s ChatGPT interface uses newer models and employs additional guardrails, thus making these kinds of responses much less probable.

They also cannot maintain context and memory over long documents.An interaction with a vanilla-flavor, standard LLM can prove to be a charm for simple topics or a quick question-and-answer session. But go beyond the context window limit of the model, and you’ll soon experience its limitations as it struggles to maintain coherence and may lose important details from earlier parts of the conversation or document. This can result in fragmented or incomplete responses that may not fully address the complexities of a long-form interaction or in-depth analysis, just like a human suffering from short-term memory loss.

Note

Although recently released AI models such as Anthropic’s Claude 2.1 and Google’s Gemini Pro 1.5 have dramatically raised the bar in terms of context window limit, ingesting an entire book and running inference on such a large context may prove to be prohibitive from a cost perspective.

LLMs also exhibit unpredictable failures in reasoning and fact retention. Take a look at Figure 1.4 for a typical logic reasoning problem that proves to be challenging even for newer models such as GPT-4:

Figure 1.4 – Screenshot from a GPT-4 playground

In this example, the answer is wrong because the only scenario that fits is if Emily is the one telling the truth. The treasure would then be neither in the attic nor in the basement.Their capabilities beyond fluent text generation remain inconsistent and limited. Blindly trusting their output without skepticism invites errors.The complexity of massive LLMs also reduces transparency into their functioning.The lack of interpretability makes it hard to audit for issues or understand exactly when and why they fail. All you get is the output, but there’s no easy way of knowing the actual decision process that led to that output or the documented fact in which that particular output is grounded. As such, LLMs still require careful governance to mitigate risks from biased, false, or dangerous outputs.As with many other things out there, it turns out we cannot really call them sustainable. At least not yet.Their massive scale makes them expensive to train and environmentally costly due to huge computing requirements. And it’s not just the training itself but also their usage. According to some estimates, “the water consumption of ChatGPT has been estimated at 500 milliliters for a session of 20-50 queries” – AMPLIFY, VOL. 36, NO. 8: Arthur D. Little’s Greg Smith, Michael Bateman, Remy Gillet, and Eystein Thanisch (https://www.cutter.com/article/environmental-impact-large-language-models). This is not negligible by any means. Think about the countless failed attempts to get an answer from an LLM, then multiply that with the countless users exercising their prompt engineering skills every minute.And here’s some more bad news: as models advance in complexity and training techniques, LLMs are rapidly becoming a huge source of machine-generated text.So huge, in fact, that according to predictions, it will end up almost entirely replacing human-generated text (Brown, Tom B. et al. (2020). Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]. https://arxiv.org/abs/2005.14165).In a way, this means they may become the victims of their own success. As more and more data is generated by AI, it gradually contaminates the training of new models, decreasing their capabilities.As in biology, any ecosystem that cannot maintain a healthy diversity in its genetic pool will gradually degrade.

I saved the good news for last.

What if I told you there is at least one solution that can partially address almost all these problems?

In many ways, a language model is very similar to an operating system. It provides a foundational layer upon which applications can be built. Just as an operating system manages hardware resources and provides services for computer programs, LLMs manage linguistic resources and provide services for various NL processing (NLP) tasks. Using prompts to interact with them is much like writing code using an Assembly Language. It’s a low-level interaction. But, as you’ll soon find out, there are more sophisticated and practical ways of using LLMs to their full potential.

It’s time to talk about RAG.

Augmenting LLMs with RAG

Coined for the first time in a 2020 paper, Lewis, Patrick et al. (2005). “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”. arXiv:2005.11401 [cs.CL] (https://arxiv.org/abs/2005.11401), published by several researchers from Meta, RAG is a technique that combines the powers of retrieval methods and generative models to answer user questions. The idea is to first retrieve relevant information from an indexed data source containing proprietary knowledge and then use that retrieved information to generate a more informed, context-rich response using a generative model (Figure 1.5):

Figure 1.5 – A RAG model

Let’s have a look at what this means in practice:

Much better fact retention: One of the advantages of using RAG is its ability to pull from specific data sources, which can improve fact retention. Instead of relying solely on the generative model’s own knowledge – which is mostly generic – it refers to external documents to construct its answers, increasing the chances that the information is accurate.Improved reasoning: The retrieval step allows RAG models to pull in information that is specifically related to the question. In general, this would result in more logical and coherent reasoning. This could help overcome limitations in reasoning that many LLMs face.Context relevance: Because it pulls information from external sources based on the query, RAG can be more contextually accurate than a standalone generative model, which has to rely only on its training data and might not have the most up-to-date or contextually relevant information. Not only that, but you could also get an actual quote from the model regarding the source of the actual knowledge used in the answer.Reduced trust issues: While not foolproof, the hybrid approach means that RAG could, in principle, be less prone to generating completely false or nonsensical answers. That means an increased probability of receiving a valid output.Validation: It’s often easier to validate the reliability of the retrieved documents in an RAG setup by setting up a mechanism to provide a reference to the original information used for generating a response. This could be a step toward more transparent and trustworthy model behavior.

A word of caution

Even if RAG makes LLMs better and more reliable, it doesn’t completely fix the issue of them sometimes giving wrong or confusing answers. There is no silver bullet that will completely eliminate all the issues mentioned previously. It’s still a good idea to double-check and evaluate their outputs, and we’ll talk about ways of doing that later in the book. Because, as you may already know or you’ve probably guessed by now, LlamaIndex is one of the many ways of augmenting LLM-based applications using RAG. And a very effective one, I should add.

While some LLM providers have started introducing RAG components into their API, such as OpenAI’s Assistants feature, using a standalone framework such as LlamaIndex provides many more customization options. It also enables the usage of local models, enabling self-hosted solutions and greatly reducing costs and privacy concerns associated with a hosted model.

Summary

In this chapter, we covered a quick introduction to GenAI and LLMs. You learned how LLMs such as GPT work and some of their capabilities and limitations. A key takeaway is that while powerful, LLMs have weaknesses – such as the potential for false information and lack of reasoning – that require mitigation techniques. We discussed RAG as one method to overcome some LLM limitations.

These lessons provide useful background on how to approach LLMs practically while being aware of their risks. At the same time, you learned the importance of techniques such as RAG to address LLMs’ potential downsides.

With this introductory foundation in place, we are now ready to dive into the next chapter where we will explore the LlamaIndex ecosystem. LlamaIndex offers an effective RAG framework to augment LLMs with indexed data for more accurate, logical outputs. Learning to leverage LlamaIndex tools will be the natural next step to harness the power of LLMs in a proficient way.