Generative AI with Amazon Bedrock - Shikhar Kwatra - E-Book

Generative AI with Amazon Bedrock E-Book

Shikhar Kwatra

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The concept of generative artificial intelligence has garnered widespread interest, with industries looking to leverage it to innovate and solve business problems. Amazon Bedrock, along with LangChain, simplifies the building and scaling of generative AI applications without needing to manage the infrastructure.
Generative AI with Amazon Bedrock takes a practical approach to enabling you to accelerate the development and integration of several generative AI use cases in a seamless manner. You’ll explore techniques such as prompt engineering, retrieval augmentation, fine-tuning generative models, and orchestrating tasks using agents. The chapters take you through real-world scenarios and use cases such as text generation and summarization, image and code generation, and the creation of virtual assistants. The latter part of the book shows you how to effectively monitor and ensure security and privacy in Amazon Bedrock.
By the end of this book, you’ll have gained a solid understanding of building and scaling generative AI apps using Amazon Bedrock, along with various architecture patterns and security best practices that will help you solve business problems and drive innovation in your organization.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 486

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Generative AI with Amazon Bedrock

Build, scale, and secure generative AI applications using Amazon Bedrock

Shikhar Kwatra

Bunny Kaushik

Generative AI with Amazon Bedrock

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

The authors acknowledge the use of cutting-edge AI, such as ChatGPT, with the sole aim of enhancing the language and clarity within the book, thereby ensuring a smooth reading experience for readers. It’s important to note that the content itself has been crafted by the author and edited by a professional publishing team.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Niranjan Naikwadi

Publishing Product Manager: Tejashwini R

Book Project Manager: Neil D’mello

Senior Editor: Mudita S

Technical Editor: Nithik Cheruvakodan

Copy Editor: Safis Editing

Proofreader: Mudita S

Indexer: Manju Arasan

Production Designer: Alishon Mendonca

DevRel Marketing Coordinator: Vinishka Kalra

First published: July 2024

Production reference: 1120724

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-80324-728-1

www.packtpub.com

To my amazing parents, Beenu and Man Mohan Kwatra, whose consistent support and endless encouragement have been the bedrock of my journey. I am truly blessed to have you by my side.

To my beloved wife, Avani Bajaj, who has been my confidante, and my partner in every sense of the word. This work is a testament to your enduring faith in me.

To my brother, Sidharth, who has always stood by me through the most challenging times. Thank you for always being there for me.

- Shikhar Kwatra

To my wife, Titiksha, whose unwavering love and encouragement have been the pillars of my success. Thank you for believing in me and being by my side.

To my sister, a constant source of joy and laughter and being a guiding light.

To my parents who have been constant support throughout my life. Your blessings and faith in me have been my greatest strength.

- Bunny Kaushik

Contributors

About the authors

Shikhar Kwatra, a senior AI/ML solutions architect at Amazon Web Services, holds the distinction of being the world’s Youngest Master Inventor with over 500 patents in AI/ML and IoT domains. He serves as a technical journal reviewer, book author, and educator. Shikhar earned his Master’s in Electrical Engineering from Columbia University. With over a decade of experience spanning startups to large-scale enterprises, he specializes in architecting scalable, cost-efficient cloud environments and supports GSI partners in developing strategic industry solutions. Beyond his professional pursuits, Shikhar finds joy in playing the guitar, composing music, and practicing mindfulness.

A heartfelt thank you to all who have supported me on this journey, particularly my family and friends. Your belief in me has made all the difference, turning challenges into opportunities and dreams into reality. I am deeply grateful for your presence in my life and the role you’ve played in my success. Thank you for being my constant source of inspiration and strength.

Bunny Kaushik is an AWS solution architect and ML specialist who loves to build solutions and help customers innovate on the AWS platform. He is an Amazon SageMaker SME, generative AI hero, and ML thought leader within AWS. He has over 10 years of experience working as an ML specialist and managing projects across different teams and organizations. Outside of work, he enjoys swimming, playing volleyball, rock climbing, and exploring new places.

I want to thank the people who have been close to me and supported me, especially my wife, Titiksha, my sister, and my parents.

About the reviewers

David Bounds is a serial hobbyist and an ongoing technologist. With more than 25 (!!) years working in tech, David has worked in startups, global enterprises, and everything in between. David has a strong focus on DevOps practices and brings those to machine learning via both MLOps and FMOps, combining people, process, and technology to operationalize workloads. David fixes more things than they break.

For Michelle. Everything I have been able to do has been because of her.

Jordan Fields is a software engineer at Amazon Web Services. He is a founding member of the team responsible for Guardrails on Amazon Bedrock, where he innovates solutions that enable customers to tailor foundational models to their specific business needs. With a Master’s degree in Data Science and a Bachelor’s in Mathematics, Jordan combines deep technical expertise with a passion for shaping the future of artificial intelligence for enterprise applications. His expertise in machine learning and software development, highlighted by his previous work on Amazon Lex where he worked on the automatic speech recognition team, has empowered him to enhance and safeguard complex AI systems, ensuring their effective and ethical implementation.

I extend my deepest gratitude to my parents, who have always encouraged me to pursue my passions. Thank you for your unwavering support and belief in my journey.

Mitesh Mangaonkar is at the forefront of data engineering and data platforms, spearheading transformative generative AI applications. As a tech leader at Airbnb, he architects innovative data pipelines using advanced technologies, fueling trust and safety initiatives. His tenure at AWS saw him guide Fortune-500 firms through cloud data warehouse migrations and craft robust, scalable data platforms for enterprise analytical and machine learning systems.

An innovator with a rich blend of deep data engineering knowledge and AI enthusiasm, Mitesh is driving the evolution of next-gen data solutions. As a pivotal influencer in data engineering and governance, he has enlightened several data analytics and AI conferences with his insights.

Table of Contents

Preface

Part 1: Amazon Bedrock Foundations

1

Exploring Amazon Bedrock

Understanding the generative AI landscape

What are FMs?

What is Amazon Bedrock?

FMs in Amazon Bedrock

Amazon Titan FMs

AI21 Labs – Jurassic-2

Anthropic Claude

Cohere

Meta Llama 2 and Llama 3

Mistral AI

Stability AI – Stable Diffusion

Evaluating and selecting the right FM

Generative AI capabilities of Amazon

Amazon SageMaker

Amazon Q

Generative AI use cases with Amazon Bedrock

Summary

2

Accessing and Utilizing Models in Amazon Bedrock

Technical requirements

Accessing Amazon Bedrock

Chat playground

Text playground

Image playground

API-based approach

Using Amazon Bedrock APIs

ListFoundationModels

GetFoundationModel

InvokeModel

InvokeModelWithResponseStream

Converse API

Amazon Bedrock integration points

Amazon Bedrock with LangChain

Creating a LangChain custom prompt template

PartyRock

Summary

3

Engineering Prompts for Effective Model Usage

Technical requirements

What is prompt engineering?

Components of prompts

Prompt engineering applications

Unlocking prompt engineering techniques

Zero-shot prompting

Few-shot prompting

Chain-of-thought prompting

ReAct prompting

Designing prompts for Amazon Bedrock models

Prompting Anthropic Claude 3

Prompting Mistral models

Prompt guidance for Amazon Titan text models

AI21 Labs – instruct models

Prompting Meta Llama models

Prompt guidance for Stability AI – Stable Diffusion

Understanding best practices in prompt engineering

Summary

4

Customizing Models for Enhanced Performance

Technical requirements

Why is customizing FMs important?

Understanding model customization

PEFT

Hyperparameter tuning

Preparing the data

Creating a custom model

Components of model customization

APIs

Analyzing the results

Metrics for training and validation

Inference

Guidelines and best practices

Summary

5

Harnessing the Power of RAG

Technical requirements

Decoding RAG

What is RAG?

Importance of RAG

Key applications

How does RAG work?

Components of RAG

Implementing RAG with Amazon Bedrock

Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Base setup

API calls

Implementing RAG with other methods

Using LangChain

Other GenAI systems

Advanced RAG techniques

Query handler – query reformulation and expansion

Hybrid search and retrieval

Embedding and index optimization

Retrieval re-ranking and filtering

Limitations and future directions

Summary

Part 2: Amazon Bedrock Architecture Patterns

6

Generating and Summarizing Text with Amazon Bedrock

Technical requirements

Generating text

Text generation applications

Text generation systems with Amazon Bedrock

Generating text using prompt engineering

Summarizing text

Summarization of small files

Summarization of large files

Creating a secure serverless solution

Summary

7

Building Question Answering Systems and Conversational Interfaces

Technical requirements

QA overview

Potential QA applications

QA systems with Amazon Bedrock

Document ingestion with Amazon Bedrock

QA on small documents

QA for large documents on knowledge bases

QA implementation patterns with Amazon Bedrock

Conversational interfaces

Chatbot using Amazon Bedrock

Empowering chatbot development with Amazon Bedrock and the LangChain framework

Crafting context-aware conversational interfaces – the fundamental pillars

A context-aware chatbot architectural flow

Summary

8

Extracting Entities and Generating Code with Amazon Bedrock

Technical requirements

Entity extraction – a comprehensive exploration

Deep learning approaches

Rule-based systems

Hybrid approaches

Industrial use cases of entity extraction – unleashing the power of unstructured data

Entity extraction with Amazon Bedrock

Structuring prompts for entity extraction

Incorporating context and domain knowledge

Leveraging few-shot learning

Iterative refinement and evaluation

Code generation with LLMs – unleashing the power of AI-driven development

The code generation process

Benefits of code generation with Amazon Bedrock

Limitations and considerations

Use cases and examples

Prompt engineering examples with Amazon Bedrock

Summary

9

Generating and Transforming Images Using Amazon Bedrock

Technical requirements

Image generation overview

What are GANs and VAEs?

Real-world applications

Multimodal models

Stable Diffusion

Titan Image Generator G1

Titan Multimodal Embeddings

Anthropic Claude 3 – Sonnet, Haiku, and Opus

Multimodal design patterns

Text-to-image

Image search

Image understanding

Image-to-image patterns

Ethical considerations and safeguards

Summary

10

Developing Intelligent Agents with Amazon Bedrock

Technical requirements

What are Agents?

Features of agents

Practical applications of Agents – unleashing the potential

GenAI agent personas, roles, and use-case scenarios

Amazon Bedrock integration with LangChain Agents

Agents for Amazon Bedrock

Unveiling the inner workings of GenAI agents with Amazon Bedrock

Advancing reasoning capabilities with GenAI – a primer on ReAct

Practical use case and functioning with Amazon Bedrock Agents

Deploying an Agent for Amazon Bedrock

Summary

Part 3: Model Management and Security Considerations

11

Evaluating and Monitoring Models with Amazon Bedrock

Technical requirements

Evaluating models

Using Amazon Bedrock

Automatic model evaluation

Model evaluation results

Using human evaluation

Monitoring Amazon Bedrock

Amazon CloudWatch

Bedrock metrics

Model invocation logging

AWS CloudTrail

EventBridge

Summary

12

Ensuring Security and Privacy in Amazon Bedrock

Technical requirements

Security and privacy overview

Data encryption

AWS IAM

Deny access

Principle of least privilege

Model customization

Securing the network

Network flow

On-demand architecture

Provisioned throughput architecture

Model customization architecture

Ethical practices

Veracity

Intellectual property

Safety and toxicity

Guardrails for Amazon Bedrock

How does Guardrails for Amazon Bedrock work?

Content filters

Denied topics

Word filters

Sensitive information filters

Blocked messaging

Testing and deploying guardrails

Using guardrails

Summary

Index

Other Books You May Enjoy

Preface

Generative AI has been on everyone’s mind since the release of ChatGPT. People across the globe are amazed by its potential and industries are looking to innovate and solve business problems using Generative AI.

In April 2023, Amazon officially announced its new Generative AI service called Amazon Bedrock, which simplifies the building and scaling of Generative AI applications without managing the infrastructure.

This book takes you on a journey of Generative AI with Amazon Bedrock and empowers you to accelerate the development and integration of several Generative AI use cases in a seamless manner. You will explore techniques such as prompt engineering, retrieval augmentation, fine-tuning generative models, and orchestrating tasks using agents. The latter part of the book covers how to effectively monitor and ensure security and privacy in Amazon Bedrock. The book is put together in a way that starts from intermediate to advanced topics, and every effort has been put into it to make it easy to follow with practical examples.

By the end of this book, you will have a great understanding of building and scaling Generative AI applications with Amazon Bedrock and will understand several architecture patterns and security best practices that will help you solve several business problems and be able to innovate in your organization.

Who this book is for

This book is targeted toward generalist application engineers, solution engineers and architects, technical managers, Machine Learning (ML) advocates, data engineers, and data scientists who are looking to either innovate in their organization or solve business use cases using Generative AI. You are expected to have a basic understanding of AWS APIs and core AWS services for ML.

What this book covers

Chapter 1, Exploring Amazon Bedrock, provides an introduction to Amazon Bedrock, starting with exploring the Generative AI landscape, foundation models offered by Amazon Bedrock, guidelines for selecting the right model, additional Generative AI capabilities, and potential use cases.

Chapter 2, Accessing and Utilizing Models in Amazon Bedrock, provides different ways to access and utilize Amazon Bedrock and its capabilities, covering different interfaces, core APIs, code snippets, Bedrock’s integration with LangChain to build customized pipelines, chaining multiple models, and insights into Amazon Bedrock’s playground called PartyRock.

Chapter 3, Engineering Prompts for Effective Model Usage, explores the art of prompt engineering, its various techniques, and best practices for crafting effective prompts to harness the power of Generative AI models on Amazon Bedrock. It equips you with a comprehensive understanding of prompt engineering principles, enabling you to design prompts that elicit desired outcomes from Bedrock’s models.

Chapter 4, Customizing Models for Enhanced Performance, provides a comprehensive guide on customizing foundation models using Amazon Bedrock to enhance their performance for domain-specific use cases. It covers the rationale behind model customization, data preparation techniques, the process of creating custom models, analyzing results, and best practices for successful model customization.

Chapter 5, Harnessing the Power of RAG, explores the Retrieval Augmented Generation (RAG) approach, which enhances language models by incorporating external data sources to mitigate hallucination issues. It dives into the integration of RAG with Amazon Bedrock, including the implementation of knowledge bases, and provides hands-on examples of using RAG APIs and real-world scenarios. Additionally, the chapter covers alternative methods for implementing RAG, such as using LangChain orchestration and other Generative AI systems, and discusses the current limitations and future research directions with Amazon Bedrock in the context of RAG.

Chapter 6, Generating and Summarizing Text with Amazon Bedrock, dives into the architecture patterns, where you will learn how to leverage Amazon Bedrock’s capabilities for generating high-quality text content and summarizing lengthy documents, and explores various real-world use cases.

Chapter 7, Building Question Answering Systems and Conversational Interfaces, covers architectural patterns for question answering on small and large documents, conversation memory, embeddings, prompt engineering techniques, and contextual awareness techniques to build intelligent and engaging chatbots and question-answering systems.

Chapter 8, Extracting Entities and Generating Code with Amazon Bedrock, explores the applications of entity extraction across various domains, provides insights into implementing it using Amazon Bedrock, and investigates the underlying principles and methodologies behind Generative AI for code generation, empowering developers to streamline their workflows and enhance productivity.

Chapter 9, Generating and Transforming Images Using Amazon Bedrock, dives into the world of image generation using Generative AI models available on Amazon Bedrock. It explores real-world applications of image generation, multimodal models available within Amazon Bedrock, design patterns for multimodal systems, and ethical considerations and safeguards provided by Amazon Bedrock.

Chapter 10, Developing Intelligent Agents with Amazon Bedrock, provides you with a comprehensive understanding of agents, their benefits, and how to leverage tools such as LangChain to build and deploy agents tailored for Amazon Bedrock, enabling you to harness the power of Generative AI in real-world industrial use cases.

Chapter 11, Evaluating and Monitoring Models with Amazon Bedrock, provides guidance on how to effectively evaluate and monitor the Generative AI models of Amazon Bedrock. It covers automatic and human evaluation methods, open source tools for model evaluation, and leveraging services such as CloudWatch, CloudTrail, and EventBridge for real-time monitoring, auditing, and automation of the Generative AI lifecycle.

Chapter 12, Ensuring Security and Privacy in Amazon Bedrock, explores robust security and privacy measures implemented by Amazon Bedrock, ensuring the protection of your data and enabling responsible AI practices. It covers topics such as data localization, isolation, encryption, access control through AWS Identity and Access Management (IAM), and the implementation of guardrails for content filtering and safeguarding against misuse and aligning with safe and responsible AI policies.

To get the most out of this book

You will need to have basic knowledge of Python and AWS. Having a basic understanding of Generative AI and the ML workflow would be an advantage.

Software/hardware covered in the book

Operating system requirements

Python

Linux-based OS

Amazon Web Services

Jupyter-based notebooks, such as Amazon SageMaker

This book requires you to have access to an Amazon Web Services (AWS) account. If you don’t have it already, you can go to https://aws.amazon.com/getting-started/ and create an AWS account.

Secondly, you will need to install and configure AWS Command-Line Interface (CLI) (https://aws.amazon.com/cli/) after you create an account, which will be needed to access Amazon Bedrock foundation models from your local machine.

Thirdly, since the majority of code cells that we will execute are based in Python, setting up an AWS Python SDK (Boto3) (https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html) will be required. You can carry out the Python setup in the following ways: install it on your local machine, use AWS Cloud9, utilize AWS Lambda, or leverage Amazon SageMaker.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Generative-AI-with-Amazon-Bedrock. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter/X handles. Here is an example: “You can specify the chunking strategy in the ChunkingConfiguration object”.

A block of code is set as follows:

#import the main packages and libraries import os import boto3 import botocore

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

Entity Types: Company, Product, Location

Any command-line input or output is written as follows:

[Person: Michael Jordan], [Organization: Chicago Bulls], [Location: NBA]

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “In the Models tab, you can select Create Fine-tuning job.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Generative AI with Amazon Bedrock, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781803247281

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

Part 1: Amazon Bedrock Foundations

This part establishes the fundamental principles and practices for effectively leveraging Amazon Bedrock. We begin by exploring the suite of foundation models offered by Amazon Bedrock, providing insights into their capabilities and optimal use cases. The book then progresses to advanced techniques in prompt engineering, a critical skill for maximizing the potential of large language models. We explore strategies for model customization, allowing users to tailor these tools to their specific needs and domains. We also examine the implementation of RAG, a cutting-edge approach that significantly enhances model performance by integrating external knowledge sources.

This part contains the following chapters:

Chapter 1, Exploring Amazon BedrockChapter 2, Accessing and Utilizing Models in Amazon BedrockChapter 3, Engineering Prompts for Effective Model UsageChapter 4, Customizing Models for Enhanced PerformanceChapter 5, Harnessing the Power of RAG

1

Exploring Amazon Bedrock

People across the globe have been amazed by the potential of generative AI, and industries across the globe are looking to innovate in their organizations and solve business use cases through generative AI.

This chapter will introduce you to a powerful generative AI service known as Amazon Bedrock. We’ll begin by providing an overview of the generative AI landscape. Then, we’ll examine the challenges industries face with generative AI and how Amazon Bedrock addresses those challenges effectively. After, we’ll explore the various foundation models (FMs) that are currently offered by Amazon Bedrock and help you assess which model is suitable for specific scenarios. Additionally, we’ll cover some of Amazon’s additional generative AI capabilities beyond FMs. By the end of this chapter, you will have a solid understanding of Amazon Bedrock’s generative AI offerings, model selection criteria, and the broader generative AI capabilities available from Amazon.

The following topics will be covered in the chapter:

Understanding the generative AI landscapeWhat are FMs?What is Amazon Bedrock?FMs in Amazon BedrockEvaluating and selecting the right FMGenerative AI capabilities of AmazonGenerative AI use cases with Amazon Bedrock

Understanding the generative AI landscape

Since the advent of ChatGPT, organizations across the globe have explored a plethora of use cases that generative AI can solve for them. They have built several innovation teams and teams of data scientists to build and explore various use cases, including summarizing long documents, extracting information from documents, and performing sentiment analysis to gauge satisfaction or discontent toward a product or service. If you have been working in the machine learning (ML) or natural language processing (NLP) field, you may be familiar with how a language model works – by understanding the relationship between the words in documents. The main objective of these language models is to predict the next probable word in a sentence.

If you look at the sentence John loves to eat, a natural language model is trying to predict what the next word or token in the sequence will be. Here, the next probable word seems to be ice-cream, with a 9.4% chance, as shown in Figure 1.1:

Figure 1.1 – Sentence sequencing prediction

Language models can do this by converting every word into a numerical vector, also known as embeddings. Similar words will be closer in the vector space, while dissimilar words will be positioned spatially distant from each other. For instance, the word phone will be far apart from the word eat since the semantic meanings of these words are different.

Early NLP techniques such as bag-of-words models with Term Frequency - Inverse Document Frequency (TF-IDF) scoring and n-gram analysis had some limitations for language modeling tasks. TF-IDF, which determines word importance based on frequency, does not account for semantic context within sentences. N-grams, representing adjacent words or characters, do not generalize well for out-of-vocabulary terms. What was needed to advance language modeling was a method of representing words in a way that captures semantic meaning and relationships between words.

In neural networks, a word embedding model known as Word2Vec was able to learn associations from a large corpus of text. However, the Word2Vec model struggled to perform well with out-of-vocabulary words. Since the 2010s, researchers have been experimenting with more advanced sequence modeling techniques to address this limitation, such as recurrent neural networks (RNNs) and long short-term memory (LSTM) networks. These models have memory cells that allow them to consider the context of previous words in a sentence when predicting the next word. RNNs and LSTMs can capture longer-range dependencies compared to models such as Word2Vec. While powerful for modeling word sequences, RNNs and LSTM are also more computationally and memory intensive, which means they can hold limited context depending on how much data is being fed to the model. Therefore, these models are unable to perform well when a whole document with several pages is provided.

In 2017, researchers at Google and the University of Toronto published a paper called Attention Is All You Need (https://arxiv.org/abs/1706.03762). This paper introduced the transformer architecture, which is based on a self-attention mechanism rather than recurrent or convolutional layers used in previous models. This self-attention mechanism allows the model to learn contextual relationships between all words (or a set of tokens) in the input simultaneously. It does this by calculating the importance of each word concerning other words in the sequence. This attention is applied to derive contextual representations for downstream tasks such as language modeling or machine translation. One major benefit of the transformer architecture is its ability to perform parallel computation with a long sequence of words. This enabled transformers to be effectively applied to much longer texts and documents compared to previous recurrent models.

Language models based on the transformer architecture exhibit state-of-the-art (SOTA) and near-human-level performance. Since the advent of transformer architecture, various models have been developed. This breakthrough paved the way for modern large language models (LLMs), including Bidirectional Encoder Representations from Transformers (BERT), Generative Pre-Training language model (GPT), Text-To-Text Transfer Transformer (T5), BLOOM, and Anthropic Claude.

Now, let’s dive into some LLMs that a powering a substantial change in the generative AI domain.

What are FMs?

Most of the generative AI models today are powered by the transformer-based architecture. In general, these generative AI models, also widely known as FMs, employ transformers due to their ability to process text one token at a time or entire sequences of text at once using self-attention. FMs are trained on massive amounts of data with millions or billions of parameters, allowing them to understand relationships between words in context to predict subsequent sequences. While models based on the transformer architecture currently dominate the field, not all FMs rely on this architecture. Some models are built using alternative techniques, such as generative adversarial networks (GANs) or variational autoencoders.

GANs utilize two neural networks pitted against each other in competition. The first network is known as the generator and is tasked with generating synthetic samples that mimic real data. For example, the generator could produce new images, texts, or audio clips. The second network is called the discriminator. Its role is to analyze examples, both real and synthetic, to classify which ones are genuine and which have been artificially generated.

Through this adversarial process, the generator learns to produce increasingly convincing fakes that can fool the discriminator. Meanwhile, the discriminator becomes better at detecting subtle anomalies that reveal the synthetic samples. Their competing goals drive both networks to continuously improve. An example of a GAN can be found at https://thispersondoesnotexist.com/. By refreshing the page endlessly, users are presented with an endless stream of novel human faces. However, none are real – all are synthetic portraits created solely by a GAN trained on vast databases of real human images. The site offers a glimpse into how GANs can synthesize highly realistic outputs across many domains.

Variational autoencoders are simpler-to-train generative AI algorithms that also utilize two neural networks – an encoder and a decoder. Encoders learn the patterns in the data by mapping it into lower-dimensional latent space, while decoders use these patterns from the latent space and generate realistic samples.

While these FMs (transformer, GAN, or variational autoencoders-based) are trained on massive datasets, this makes them different from other traditional ML models, such as logistic regression, support vector machines (SVM), decision trees, and others. The term foundation models was coined by researchers at Stanford University at Human-Centered Artificial Intelligence to differentiate them from other ML models. The traditional ML models are trained on the labeled data and are only capable of performing narrowly defined tasks. For example, there will be one model for text generation, another model for summarization, and so on.

In contrast, FMs learn patterns in language by analyzing the relationships between words and sentences while training on a massive dataset containing millions or billions of parameters. Due to their enormous pre-training datasets, FMs tend to generalize well and understand contextual meaning, which allows them to solve various use cases, such as text generation, summarization, entity extraction, image generation, and others. Their pre-training enables them to serve as a highly adaptable starting point for many different applications. Figure 1.2 highlights some of the differences between traditional ML models and FMs:

Figure 1.2 – Traditional ML models versus FMs

Despite the range of FMs available, organizations face several challenges when adopting these models at scale:

No single model solution: There is no single model that’s optimized for all tasks and models are constantly improving with new advances in technology. To address multiple use cases, organizations may need to assemble several models that work with each other. This can take significant time and resources.Security concerns: Security and privacy pose a major concern as organizations want to protect their data and valuable intellectual property, and they also want control over how their data is shared and used by these models.Time and resource management: For applications such as document summarization and virtual assistants, specific model configuration is needed. This includes defining tasks, granting access to internal data sources, and developing APIs for the model to take action. This requires a multi-step process and complex coding.Lack of seamless integration: Being able to seamlessly integrate into existing applications is important to avoid managing large computational infrastructures or incurring high costs. Organizations want models to work behind the scenes without any heavy lifting or expense.

Addressing these technical, operational, security, and privacy challenges is key for organizations to successfully adopt and deploy FMs at an enterprise scale.

These are the very problems that Amazon Bedrock is designed to solve.

What is Amazon Bedrock?

Amazon Bedrock is a fully managed service that offers various choices of high-performing FMs via a single API. Fully managed implies that users do not have to worry about creating, deploying, and operating the backend infrastructure as it has been taken care of by Amazon. So, from within your application or code, you can invoke the model on Bedrock with a single API containing your prompt. One of the key advantages of Amazon Bedrock is it provides a wide choice of leading FMs from Amazon and top AI companies such as Anthropic, AI21 Labs, Cohere, Meta, Stability AI, and Mistral.

Once you’ve defined your use case, the next step is to choose an FM. Amazon Bedrock provides a playground experience (a web interface for rapid experimentation) where you can experiment with different models and prompts. Additionally, there are certain techniques and suitability criteria you need to employ to choose the best-fit model for your use case. We will learn how to evaluate LLMs in the upcoming sections.

Once you have evaluated and identified the FM for your use case, the focus turns to enhancing its predictive capabilities. Amazon Bedrock provides the following key capabilities for refining model performance:

Prompt engineering: Prompt engineering and design is a critical first step when interacting with FMs. Taking the time to craft clear, nuanced prompts is important for establishing the proper context and for the model to provide a reliable outcome. Prompts can be as simple as Tell me the recipe for chocolate cake or can be detailed prompts with multiple examples, depending on the use case that you are trying to solve. With its playground experience, Amazon Bedrock lets you effectively design and formulate prompts through rapid experimentation. We will discuss some of these techniques and practical aspects of prompt engineering in Chapter 3.Easy fine-tuning: Amazon Bedrock allows you to easily customize FMs with your dataset. This process is called fine-tuning the model and involves training the model further with your domain dataset, improving the accuracy for domain-specific tasks. Fine-tuning can be done directly from the Amazon Bedrock console or through APIs, and by providing your datasets in an Amazon Simple Storage Service (Amazon S3) bucket. We will discuss fine-tuning Amazon Bedrock FMs in detail in Chapter 4.Native support for RAG: Retrieval augmented generation (RAG) is a powerful technique to fetch data from outside the language model, such as from internal knowledge bases or external sources, to provide accurate responses to domain-specific use cases. This technique is useful when large documents are needed that are beyond the context provided by the model. Amazon Bedrock provides native support for RAG, so you can connect your data source for retrieval augmentation. We will discuss RAG in greater detail in Chapter 5.

Furthermore, there are additional capabilities provided by Amazon Bedrock, such as the ability to build intelligent Agents to orchestrate and carry out multiple tasks on your behalf. Agents can call various internal and external data sources, connect to applications, and run complex tasks in multiple steps. We will dive deep into building intelligent Agents in Chapter 10.

Security, privacy, and observability are some of the key capabilities of Amazon Bedrock. The data that you provide when you invoke FMs, including prompts and context, isn’t used to retain any of the FMs. In addition, all the AWS security and governance capabilities, including data encryption, IAM authentication and permission policies, VPC configuration, and others, apply to Amazon Bedrock. Hence, you can encrypt your data at rest and in transit. You can tell Amazon Bedrock to use Virtual Private Cloud (VPC) so that the traffic between AWS-hosted system components does not flow through the internet. Also, via Identity and Access Management (IAM), you can provide access to certain resources or users. Furthermore, metrics, logs, and API calls are pushed to AWS CloudWatch and AWS CloudTrail, so you can have visibility and monitor the usage of Amazon Bedrock models. In Part 3 of the book, we will cover model evaluation, monitoring, security, privacy, and ensuring safe and responsible AI practices.

For now, let’s look at the different FMs offered by Amazon Bedrock.

FMs in Amazon Bedrock

With Amazon Bedrock, you have access to six FMs from Amazon and leading AI companies – that is, AI21, Anthropic, Command, Stability AI, and Meta – as depicted in Figure 1.3. Amazon Bedrock might add access to more FMs in the future:

Figure 1.3 – FMs available on Amazon Bedrock

Now, let’s discuss each of these models in detail.

Amazon Titan FMs

The Amazon Titan FMs represent a suite of powerful, multipurpose models developed by AWS through extensive pretraining on vast datasets, endowing them with broad applicability across diverse domains. This FM supports use cases such as generating texts, question-answering, summarization, RAG, personalization, image generation, and more. A simple example would be generating an article/blog or writing an email.

Three types of Amazon Titan models are currently available on Amazon Bedrock: Titan Text Generation, Titan Image Generator, and Titan Embeddings.

Titan Text Generation

Titan Text Generation is an LLM that’s designed for use cases such as generating texts, summarization, and more. Let’s assume that John has to write an email to the customer support team of his telephone operator, asking them to fix the billing issue he has been facing. We can provide a prompt to the Titan Text Generation model. The response will be generated alongside the subject, as shown in Figure 1.4:

Figure 1.4 – Response generated by the Titan Text G1- Express model

At the time of writing, Titan Text Generation is available in three different flavors – Titan Text G1 Lite,Titan Text G1 Express and Titan Text G1 Premier. The main difference is that Lite is a more cost-effective and smaller model and supports up to 4,000 tokens, Express is a larger model that supports up to 8,000 tokens and is designed for complex use cases, and Premier is most advanced model by Titan that supports up to 32k tokens and is designed to provide exceptional performance.

Titan Image Generator

Titan Image Generator is designed to generate a variety of images from texts, edit images, perform in-painting and out-painting, and more. The Image Generator model, known as Titan Image Generator G1, currently supports up to 77,000 tokens with a maximum image size of 25 MB. For example, we can ask the model to Generate an image of a Bunny skiing in the Swiss Alps. Once the images have been generated, we can create variations of a single image, or even edit the image, as demonstrated in Figure 1.5:

Figure 1.5 – Titan Image Generator and its configurations

In Chapter 9, we will learn more about how image generation works and dive into various use cases.

Titan Embeddings

The main function of the Titan Embeddings model is to convert texts (or images) into numeric vectors. These vectors represent words mathematically so that similar words have similar vectors. You can store these embeddings in vector databases such as OpenSearch, Aurora pgvector, Amazon Kendra, or Pinecone, and these databases will be used to compare the relationship between the texts.

At the time of writing, the Titan Embeddings model is available in two variations – Titan Text Embeddings and Titan Multimodal Embeddings. The main difference is Titan Text Embeddings converts texts into embeddings, which makes the model a suitable fit for use cases such as RAG and clustering, while Titan Multimodal Embeddings can convert a combination of texts and images into embeddings, which makes it apt for use cases such as searching within images and providing recommendations.

While Titan Text Embeddings supports up to 8,000 tokens and over 25 languages, Titan Multimodal Embeddings can support up to 128 tokens with a maximum image size of 25 MB. Here, English is the only supported language.

In the next chapter, we will learn how to invoke these models and their input configuration parameters. For now, let’s learn about some other FMs provided by Amazon Bedrock.

AI21 Labs – Jurassic-2

AI21 Labs has built several FMs and task-specific models. However, at the time of writing, Amazon Bedrock provides access to Jamba-Instruct, Jurassic 2 – Ultra and Jurassic 2 – Mid FMs.

Jamba-Instruct supports only English, whereas Jurassic-2 models support multiple languages and use cases such as advanced text generation, comprehension, open book Q&A, summarization and others.

Jamba-Instruct supports context token length of 256K, whereas, Jurassic-2 Ultra and Jurassic-2 Mid both support a context token length of 8,192.

An example would be the prompt Give me pointers on how I should grow vegetables at home. The output is depicted in Figure 1.6:

Figure 1.6 – Prompting the Jurassic-2 model

Anthropic Claude

Anthropic focuses on safe and responsible AI and provides a group of Claude models. These models support use cases such as Q&A, removing personally identifiable information (PII), content generation, roleplay dialogues, and more. One major benefit of using Anthropic Claude is its ability to process longer sequences of text as prompts. With a maximum context window of 200,000 tokens to date, Claude can understand and respond to much more extensive prompts. This larger context allows Claude to engage in deeper discussions, understand longer narratives or documents, and generate more coherent multi-paragraph responses.

Amazon Bedrock currently offers access to five versions of Anthropic’s Claude language model:

Anthropic Claude 3.5 Sonnet: This sets new industry standards for superior intelligence, outperforming its predecessors and other top AI models in various benchmarks. Claude 3.5 Sonnet excels in areas like visual processing, content generation, customer support, data analysis, and coding. Remarkably, it achieves this enhanced performance while being 80% more cost-effective than previous Anthropic models, making it an attractive choice for businesses seeking advanced AI capabilities at a lower price point. The following link highlights the benchmarks and comparison with other models on different tasks: https://aws.amazon.com/blogs/aws/anthropics-claude-3-5-sonnet-model-now-available-in-amazon-bedrock-the-most-intelligent-claude-model-yet/.Anthropic Claude 3: This has three model variants – Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku. They are the recent and most advanced family of Anthropic models available on Amazon Bedrock. All these models have multimodal capabilities and can perceive and analyze images (jpeg, png), as well as other file types, such as .csv, .doc, .docx, .html, .md, .pdf, .txt, .xls, .xlsx, .gif, and text input, with a 200K context window:Claude 3 Opus: This is Anthropic’s most capable model to date, with 175 billion parameters. Opus has advanced few-shot learning capabilities, allowing it to quickly adapt to a wide variety of tasks using just a few examples.Claude 3 Sonnet: A 60-billion-parameter multimodal AI model, Sonnet has strong few-shot learning abilities. Its parameter-efficient architecture allows it to handle complex inputs such as long documents while being more computationally efficient than Opus.Claude 3 Haiku: At 7 billion parameters, Haiku is Anthropic’s most compact and lightweight model. It is optimized for efficiency, providing high performance for its size. Its low computational requirements make it very fast to run inference.Anthropic Claude 2.1 and Claude 2: They are also advanced additions to Anthropic’s Claude family. They provide performant reasoning capabilities and high accuracy with lower hallucination rates. They perform well on use cases such as dialogue, creative writing, information, roleplay, summarization, and others. In terms of context length, Claude 2.1 supports up to 200,000 tokens and Claude 2 supports up to 100,000 tokens.Anthropic Claude 1.3: This is an earlier release with capabilities typical of LLMs at that time. It demonstrated strong performance on tasks involving factual responses, summarization, and basic question-answering. In terms of context length, Claude 1.3 supports up to 100,000 tokens.Anthropic Claude Instant 1.2: This offers a faster and more cost-effective option compared to other Claude models. The latency of the Claude Instant model is greatly reduced at the cost of impacted performance. However, Claude Instant still demonstrates strong language skills for many common NLP applications that do not require the highest levels of reasoning or nuanced responses, and when speed or cost is a higher priority than absolute highest performance. In terms of context length, Claude Instant 1.2 supports up to 100,000 tokens.

We will walk through some examples of leveraging Anthropic Claude with Bedrock in the next chapter.

Cohere

Amazon Bedrock offers multiple models from Cohere: Command, Command R+, Command R, Command Light models, Embed English, and Embed Multilingual. Cohere Command, trained with 52 billion parameters, is an LLM useful for more complex language understanding. Command Light, with 6 billion parameters, is cost-effective and faster, making it a good option for those who need a lighter model for their applications. Command R+, trained on 104 billiion parameters, is the most powerful model by Cohere, at the time of writing this book, and is designed for tasks with context window size of 128K tokens. Command R, trained on 35 billion parameters, is also designed for tasks with longer context window of 128K tokens.

Cohere Embed provides a set of models that have been trained to generate high-quality embeddings, which we already know are representations of text documents in a numerical format in vector space. Cohere offers Embed English, which has only been trained on English text, as well as Embed Multilingual, which can handle multiple (more than 100) languages. Embed models support a maximum token length of 512. These embedding models open a wide range of downstream applications, such as semantic search to find related documents, RAG, text clustering, classification, and more.

Take note of the following figure, which highlights a text generation example for summarizing a conversation using the Cohere Command model within Amazon Bedrock’s text playground:

Figure 1.7 – Cohere Command text generation example in Amazon Bedrock’s text playground

Meta Llama 2 and Llama 3

Meta offers several pre-trained LLMs under their Llama 2 and Llama 3 series for chatbot applications. Their base Llama2 model is pre-trained on over 2 trillion tokens of publicly available online data sources, at which point it’s fine-tuned with over 1 million examples of human annotation.

Four variants of Llama2 have been made available through Amazon Bedrock: Llama 2 Chat 13B, Llama 2 Chat 70B, Llama 2 13B, and Llama 2 70B. The 13B model contains 13 billion parameters and its training process took 368,640 GPU hours to complete. One of the key advantages of the Llama 13B model is its ability to process input sequences of arbitrary length, making it well-suited for tasks that require long documents or web pages to be analyzed. The larger 70B model variant contains 70 billion parameters and its training process took 1,720,320 GPU hours to complete. The 70B model can be used for multitask learning, implying it is well suited for performing multiple tasks simultaneously, such as image classification, speech recognition, and NLP. It has been shown to achieve improved performance on several tasks compared to 13B models, likely due to its relatively larger size and higher computational resources.

Along with Llama2, Meta Llama 3 variants are also available on Amazon Bedrock, namely Llama 3 8B Instruct and Llama 3 70B Instruct. The Llama 3 8B Instruct model is optimized for scenarios with limited computational resources, making it well-suited for edge devices and applications. It demonstrates strong performance in tasks such as text summarization, text classification, language translation, and sentiment analysis. The Llama 3 70B Instruct model is tailored for content creation, conversational AI systems, language understanding applications, and enterprise solutions. It excels in areas such as accurate text summarization, nuanced text classification, sophisticated sentiment analysis and reasoning, language modeling, dialogue systems, code generation, and following complex instructions.

For developers looking to utilize these models, Meta has created an open source GitHub repository called llama-recipes (https://github.com/facebookresearch/llama-recipes/tree/main) that includes demo code and examples of integrating the Llama2 models into chatbots and virtual assistants. This provides a starting point for researchers and practitioners to experiment with Llama2 and adapt it for their own conversational AI applications.

Figure 1.8 demonstrates an entity extraction example using the Meta Llama 2 Chat 13 B model in Amazon Bedrock’s text playground:

Figure 1.8 – Entity extraction with the Llama 2 Chat 13B model in Amazon Bedrock’s text playground

Mistral AI

Mistral AI focuses on building compute-efficient, trustworthy, and powerful AI models. These are currently available in four variants on Amazon Bedrock – Mistral 7B Instruct, Mixtral 8X7B Instruct, Mistral Large, and Mistral Small:

Mistral 7B instruct: This is a 7-billion-parameter dense transformer language model designed for instructional tasks. It offers a compelling balance of performance and efficiency, delivering robust capabilities suitable for a wide range of use cases despite its relatively compact size. Mistral 7B instruct supports processing English natural language and code inputs, with an extended 32,000 token context window capacity. While more limited than larger models, Mistral 7B instruct provides high-quality language understanding, generation, and task execution tailored for instructional applications at a lower computational cost.Mixtral 8X7B: This is a 7-billion-parameter sparse Mixture-of-Experts language model that employs a highly parameter-efficient architecture. Despite its relatively compact total size, it utilizes 12 billion active parameters for any given input, enabling stronger language understanding and generation capabilities compared to similarly-sized dense models such as Mistral 7B. This sparse model supports processing inputs across multiple natural languages, as well as coding languages, catering to a wide range of multilingual and programming use cases. Additionally, Mixtral 8X7B maintains an extended context window of 32,000 tokens, allowing it to effectively model long-range dependencies within lengthy inputs.Mistral Large: This is capable of complex reasoning, analysis, text generation, and code generation and excels at handling intricate multilingual tasks across English, French, Italian, German, and Spanish. Mistral Large supports a maximum context window of 32,000 tokens, enabling it to process long-form inputs while delivering SOTA performance on language understanding, content creation, and coding applications demanding sophisticated multilingual capabilities.Mistral Small: This is an advanced language model designed for efficiency and affordability. It excels in handling high-volume, low-latency language tasks swiftly and cost-effectively. With its specialized capabilities, Mistral Small seamlessly tackles coding challenges and operates fluently across multiple languages, including English, French, German, Spanish, and Italian. Mistral Small supports a maximum context window of 32,000 tokens.

Figure 1.9 illustrates the usage of the Mistral Large model with a reasoning scenario within Amazon Bedrock’s text playground:

Figure 1.9 – Mistral Large in Amazon Bedrock’s text playground

Stability AI – Stable Diffusion

Stable Diffusion was developed by Stability AI to generate highly realistic images using diffusion models trained on large datasets. The core technique behind Stable Diffusion is called latent diffusion, which involves using a forward diffusion process to add noise to data over time, and a reverse diffusion process to gradually remove noise and reconstruct the original data. In the case of image generation, this allows the model to generate new images conditioned on text or image prompts provided by the user.

Amazon Bedrock provides SDXL 0.8 and SDXL1.0 Stable Diffusion models from Stability AI. The Stable Diffusion model aims to generate highly realistic images based on the text or image that’s provided as a prompt. SDXL 1.0 is particularly impressive due to its large model sizes. Its base model contains over 3.5 billion parameters, while its ensemble pipeline uses two models totaling 6.6 billion parameters. By aggregating results from multiple models, the ensemble approach generates even higher-quality images.

Through Amazon Bedrock, developers can leverage Stable Diffusion for a variety of image generation tasks. This includes generating images from text descriptions (text-to-image), generating new images based on existing images (image-to-image), as well as filling in missing areas (inpainting) or extending existing images (outpainting). We will look at these in detail in Chapter 9.

Let’s run through a simple example of the Stable Diffusion model in Amazon Bedrock’s text playground by using this prompt: a dog wearing sunglasses, riding a bike on mars.

Figure 1.10 – Image generation with the Stable Diffusion model

The ability to automatically create visual content has many applications across industries such as advertising, media and entertainment, and gaming. In Chapter 9, we will explore how Stable Diffusion works under the hood. We will also discuss best practices and architecture patterns for leveraging image generation models in your applications.

Evaluating and selecting the right FM

Now that we’ve understood the different types of FMs available in Amazon Bedrock, how do we determine which one is best suited for our specific project needs? This section will help you learn how to evaluate the model fit for your use case.

The first step is to clearly define the problem you’re trying to solve or the use case you want to build. Get as specific as possible about the inputs, outputs, tasks involved, and any other requirements. With a well-defined use case in hand, you can research which models have demonstrated capabilities relevant to your needs. Narrowing the options upfront based on capabilities will streamline the evaluation process.

Once you’ve identified some potential candidate models, the next step is to examine their performance across standardized benchmarks and use cases. Amazon Bedrock provides a capability to evaluate FMs, also called model evaluation jobs. With model evaluation jobs, users have the option to use either automatic model evaluation or evaluation through the human workforce. We will cover Amazon Bedrock’s model evaluation in more detail in the upcoming chapters.

In addition, several leaderboards and benchmarks exist today that can help with this evaluation, such as the following:

Stanford Helm leaderboard for LLMsHuggingFace’s open leaderboardGLUE (https://gluebenchmark.com/)SuperGLUE (https://super.gluebenchmark.com/)MMLU (https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu)BIG-bench (https://github.com/google/BIG-bench)

Reviewing where each model ranks on tasks related to your use case provides an objective measure of its abilities.

Apart from benchmark performance, inspecting each model’s cost per query, processing latency, training parameters if fine-tuning is needed, and any other non-functional requirements need to be considered. The right model needs to not only achieve your technical objectives but also fit within your cost and timeline constraints.

No evaluation is complete without hands-on testing. Take advantage of Amazon Bedrock’s text playground or Amazon Partyrock to try out candidates on sample prompts, text generation tasks, or other example interactions representing your intended use case. More details regarding Amazon Bedrock’s text playground and Amazon Partyrock will be covered in the next chapter. This mechanism of model evaluation allows for a more qualitative assessment of things such as generated language quality, ability to maintain context, interpretability of responses, and the overall feel of interacting with each model.

By thoroughly researching capabilities, performance, and requirements, as well as testing multiple options, you’ll be well-equipped to select the right FM that provides the best overall fit and solution for your project needs. The right choice will help ensure your project’s success.

Generative AI capabilities of Amazon

This book is primarily focused on Amazon Bedrock, but we wanted to highlight a few other generative AI capabilities offered by Amazon that are being used in enterprises for accelerating developer productivity, innovating faster, and solving their use cases with ease.

Amazon SageMaker

Amazon SageMaker is Amazon’s fully managed ML platform for building, training, and deploying ML models at scale. One of the most powerful features of SageMaker is SageMaker Jumpstart, which provides a catalog of pre-trained open source FMs that are ready to be deployed and used.

Some examples of FMs available in SageMaker Jumpstart include FLAN-T5 XL, a fine-tuned XL version of the T5 transformer model optimized for natural language understanding. Additional models, such as Meta Llama2, AI21 Jurassic-2 Ultra, and Stable Diffusion models, are also available in SageMaker Jumpstart.

In addition to deploying these pre-trained FMs directly, SageMaker Jumpstart provides tools for customizing and fine-tuning select models for specific use cases. For instance, users can perform prompt engineering to better control model responses by adjusting text prompts. Some models also support reasoning augmentation to improve the common-sense reasoning ability of LLMs through question-answering tasks. Fine-tuning capabilities allow you to adapt the language models to domain-specific datasets.

This enables engineers and researchers to leverage the power of these generative AI models directly from Jumpstart so that they can build novel applications without requiring deep expertise in model training. The SageMaker platform handles all the heavy lifting of deploying, scaling, and managing ML models. When you open SageMaker Jumpstart within SageMaker Studio UI, you will see models offered by different model providers. This can be seen in Figure 1.11:

Figure 1.11 – SageMaker Jumpstart

You can choose the model you would like to work with based on your use case and deploy it directly to a SageMaker endpoint, or you can fine-tune the model with a custom dataset. Figure 1.12 shows several open source models offered by HuggingFace, on SageMaker Jumpstart, exemplifying the simplicity in SageMaker to search for models of your choice suited to a particular task using the search bar or Filters options:

Figure 1.12 – SageMaker Jumpstart HuggingFace models

Amazon Q

Amazon Q is a Generative AI-powered assistant that is built on top of Amazon Bedrock, and has been designed to enhance productivity and accelerate decision-making across various domains. It can assist users in a multitude of tasks, ranging from software development to data analysis and decision making.

Here is an overview of key offerings available with Amazon Q.

Amazon Q for Business

Amazon Q for Business is an enterprise-grade, generative AI-powered assistant designed to streamline operations and enhance productivity within organizations. With this tool you can access and interact with the company repositories of data if you have required permissions, simplifying tasks and accelerating problem-solving processes. Here are some key features of Amazon Q for Business:

Comprehensive Data Integration: Amazon Q for Business seamlessly connects to over 40 popular enterprise data sources, including Amazon S3, Microsoft 365, and Salesforce. It ensures secure access to content based on existing user permissions and credentials, leveraging single sign-on for a seamless experience.Intelligent Query Handling: You can ask questions in natural language, and Amazon Q for Business will search across all connected data sources, summarize relevant information logically, analyze trends, and engage in interactive dialogue. This empowers users to obtain accurate and comprehensive answers, eliminating the need for time-consuming manual data searches.Customizable and Secure: Organizations can tailor Amazon Q for Business to their specific needs by configuring administrative guardrails, document enrichment, and relevance tuning. This ensures that responses align with company guidelines while maintaining robust security and access controls.Task Automation: Amazon Q for Business allows users to streamline routine tasks, such as employee onboarding requests or expense reporting, through simple, natural language prompts. Additionally, users can create and share task automation applications, further enhancing efficiency and productivity.

You can set up Amazon Q for Business Application in a few clicks as shown in Figure 1.13.

Figure 1.13 – Setting up Amazon Q for Business

For more details on setting up Amazon Q for Business Application, you can check the link: https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/getting-started.html

Figure 1.14 – Customize web experience for Amazon Q for Business

Once the application is set up, users can customize the web experience for the Q business application as shown in Figure 1.14

Let us now look at another offering Amazon Q for QuickSight.

Amazon Q for QuickSight

Amazon Q for QuickSight is built for business users and analysts to unlock insights from their data more efficiently. It leverages the capabilities of Generative AI to streamline the process of data analysis and visualization. Here are some key features of Amazon Q for QuickSight :

Intuitive Storytelling