29,99 €
Find out what makes Azure OpenAI a robust platform for building AI-driven solutions that can transform how businesses operate. Written by seasoned experts from Microsoft, this book will guide you in understanding Azure OpenAI from fundamentals through to advanced concepts and best practices.
The book begins with an introduction to large language models (LLMs) and the Azure OpenAI Service, detailing how to access, use, and optimize its models. You'll learn how to design and implement AI-driven solutions, such as question-answering systems, contact center analytics, and GPT-powered search applications. Additionally, the chapters walk you through advanced concepts, including embeddings, fine-tuning models, prompt engineering, and building custom AI applications using LangChain and Semantic Kernel. You'll explore real-world use cases such as QnA systems, document summarizers, and SQLGPT for database querying, as well as gain insights into securing and operationalizing these solutions in enterprises.
By the end of this book, you'll be ready to design, develop, and deploy scalable AI solutions, ensuring business success through intelligent automation and data-driven insights.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 410
Veröffentlichungsjahr: 2025
Azure OpenAI Essentials
A practical guide to unlocking generative AI-powered innovation with Azure OpenAI
Amit Mukherjee
Adithya Saladi
Copyright © 2025 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Portfolio Director: Gebin George
Relationship Lead: Deepesh Patel
Program Manager: Prajakta Naik
Content Engineer: Joseph Sunil
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Proofreader: Tanya D’cruz
Indexer: Hemangini Bari
Production Designer: Vijay Kamble
Marketing Coordinator: Vignesh Raju
First published: February 2025
Production reference: 1200225
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK
ISBN 978-1-80512-506-8
www.packtpub.com
To the Almighty, for showering me with grace, wisdom, and strength throughout this journey.
To my beloved parents, the late Ashutosh Mukherjee and Pratima Mukherjee, for their endless love, unwavering support, and sacrifices. You have always been my guiding lights and have made me who I am today.
To my wonderful wife, Sujata, for her constant love, patience, and support. She’s not only my partner in life but also the one who keeps everything running smoothly, even when my mind is occupied with complex algorithms and endless ideas. Without her, I’d be like an AI without proper training data—disconnected and incomplete. Your patience, love, and constant encouragement make all the difference, Sujata.
To my two beautiful daughters, Aditri and Adrija, who bring joy, laughter, and an infinite amount of curiosity into my life. You remind me that the greatest learning comes from the most unexpected places. To my in-laws, Madhusudan Som and Tapati Som, for their love and encouragement, which have always made our family stronger.
To my lifelong friends, Indraneel Mitra, Avik Biswas, and Biswajit Mondal, for their constant friendship and support through everything.
A heartfelt thanks to the Microsoft Healthcare & Life Sciences Technical Specialist community and the incredible leaders who have mentored me: Tyler Bryson, Patty Carrolo, Joyce Gottbetter, Austin Walsh, Carl Bender, Ian Morrison, Camile Whicker, Laura Robinson, Annita McDonald, and Gene Buckley. Your knowledge, passion, and leadership have inspired me to reach new heights and push beyond limits.
This book is the result of all the love, support, and guidance I have received along the way. Thank you to everyone who has been part of this journey.
— Amit Mukherjee
I would like to begin by expressing my heartfelt gratitude to my parents, Krishna Priya Saladi and Rama Mohan Saladi, for their sacrifices in providing my sister and me with the best education and lifestyle. Their constant support, guidance in correcting my mistakes, and the inspiration they gave me have been instrumental in my journey. Without them, I would not be where I am today. I also want to thank my sister, Yogitha Saladi, for always supporting me through all challenges. I am thankful to my cousins and extended family for their continuous support. A special thank you to my college friends Praveen, Mouni, Neha, Lokesh, Tara, and Likitha for being there and supporting me during both good and tough times.
I owe a great deal to my first mentor, Trinath Mallavarapu from Keka, who taught me the invaluable lesson: “Do it right even if you are doing it for the first time.” This advice has shaped my approach to work and life. I would also like to extend my deepest thanks to Michael Stockerl, my mentor during my time at DPS by UnternehmerTUM in Germany. His trust and belief in me gave me opportunities to learn and grow. I am also thankful to Thomas Zeller for supporting me on this journey. I am also immensely grateful to Ashok Reddy, the Founder of Grabon, for bringing me into the company and for always supporting me in various situations. Grabon provided me with a unique perspective on the startup world, and Ashok’s encouragement to explore new ideas has been invaluable.
At Microsoft, I first want to thank my manager, Pavani Anne, for being my guide in this new phase of my career. Coming from a startup background, transitioning to a large organization like Microsoft was a big leap, and Pavani’s mentorship has made all the difference. I am also thankful to my current managers, Amit Aggarwal and Durga Prasad Rai, for their continuous support. A special thanks to my leads, Subhendu De and Ajay Kumar S, for their guidance and direction. I would also like to thank Tajeshwar Singh, Prasant Kraleti, Vivek Adholia, and all my colleagues at Microsoft for their daily inspiration and encouragement.
As an ambivert, I’ve often felt excluded and tend to stay quiet until I’m truly comfortable. I dedicate this book to those who made me feel included, showed me love and trust, and helped me find my voice. A special thanks to my parents, whose sacrifices and unwavering support have shaped who I am today. Thank you to everyone who has been a part of this journey.
— Adithya Saladi
In the rapidly evolving landscape of artificial intelligence, the integration of advanced AI models into practical applications has become a cornerstone of innovation. As VP Products of Azure AI and AI Futurist at Microsoft, I have witnessed firsthand the transformative power of AI technologies in reshaping industries, enhancing productivity, and driving new levels of creativity.
This book, a comprehensive guide to leveraging Azure OpenAI, serves as an invaluable resource for both seasoned professionals and newcomers to the field. It meticulously covers the foundational concepts of LLMs, the intricacies of embedding models, and the practical applications of AI in various domains. Amit has done an exceptional job in demystifying complex topics and presenting them in a manner that is both accessible and deeply informative.
One of the standout features of this book is its focus on real-world applications. From content generation and customer support to healthcare and cybersecurity, the examples provided illustrate the vast potential of AI to solve complex problems and create new opportunities. The detailed exploration of Azure OpenAI’s capabilities, including the innovative “Azure OpenAI On Your Data” feature, highlights how businesses can harness AI to gain insights from their proprietary data, enhancing decision-making and operational efficiency.
The book delves into advanced topics such as fine-tuning models, integrating AI with external systems through function calling, and utilizing multimodal models for tasks that require both text and image processing. These sections are useful for developers looking to push the boundaries of what AI can achieve.
As we stand on the brink of a new era in AI, it is essential to equip ourselves with the knowledge and tools to navigate this exciting frontier. This book is a testament to the authors’ expertise and dedication to advancing the field of AI. It is a must-read for anyone looking to understand and leverage the power of Azure OpenAI to drive innovation and achieve remarkable outcomes.
I am confident that this book will inspire and empower its readers to explore the limitless possibilities of AI, fostering a future where technology and human ingenuity work hand in hand to create a better world.
Marco Casalaina
Vice President of Products and AI Futurist, Azure AI Microsoft
Amit Mukherjee is a GenAI Technical Specialist in healthcare at Microsoft, where he leverages AI to enhance patient care and drive innovation. With a strong foundation in ML and AI technologies, he designs AI systems that extract valuable insights from a wide range of data sources. This enables healthcare professionals to make informed, data-driven decisions that improve patient outcomes.
He has led cross-functional teams in developing generative AI solutions tailored for diagnostics, treatment planning, and patient engagement across various healthcare domains. Through close collaboration with stakeholders, he ensures that AI strategies align with organizational objectives for effective and meaningful implementation. As a thought leader in this field, he remains at the forefront of emerging advancements, dedicated to unlocking the transformative potential of generative AI in healthcare.
Adithya Saladi is a seasoned software development consultant with over 8 years of experience, currently working as a Software Engineer in the Azure Reliability team at Microsoft’s C+AI organization. He specializes in building high-scale, high-quality software solutions leveraging Azure services. He has a proven track record of collaborating with architects and cross-functional teams to deliver innovative, business-critical solutions within tight deadlines. His extensive experience spans both enterprise-level applications and startup environments, enabling him to quickly adapt to diverse challenges.
In addition to his technical expertise, Adithya is passionate about mentoring junior developers and actively contributes to the tech community. He is also the founder of GreenOccasion, a non-profit startup focused on developing technology solutions aimed at reducing carbon emissions and contributing to a sustainable future.
Chalamayya Batchu is a distinguished Enterprise Architect specializing in AI, ML, and Data Strategy. He excels in designing enterprise-scale AI architectures and crafting strategies to extract actionable insights from complex datasets using cloud infrastructure and advanced technologies. As a trusted advisor to CXOs, he aligns analytics strategies with business goals, develops robust data governance frameworks, and integrates AI technologies like NLP, GenAI, and CV. Recognized with awards like the International Achievers’ Award, he has authored works in IEEE and Springer, reviewed books for Manning, and been featured in SAS, SiliconIndia, and TechBullion. His leadership inspires innovation in the data-driven era.
Dr. Daniel J. Dean has over 15 years of experience in machine learning, cloud-based application development, cloud data pipeline development, and generative AI solution development. Daniel has worked at both large tech companies, such as Microsoft and IBM, as well as small startup companies throughout his career. Daniel’s responsibilities have included distributed systems research, cloud-native software engineering, rapid solution prototyping, and intellectual property generation. Daniel has been published in leading computer science conferences, holds three patents, and is a reviewer for industry-track submissions in several computer science publication venues.
Preetish Kakkar is a senior computer graphics engineer and AI enthusiast with over 15 years of experience in advanced rendering engines, computer vision, and AI-driven innovation. He specializes in integrating 3D graphics with AI to enhance AR/VR/XR experiences, combining cutting-edge technologies in physically based rendering, ray tracing, and machine learning. His contributions include advancing visual fidelity and performance in industry-leading tools at Adobe, and his expertise bridges the domains of graphics, vision, and neural processing. As the author of The Modern Vulkan Cookbook and a reviewer of leading journals and books, Preetish actively promotes cross-disciplinary advancements.
This part serves as a comprehensive foundation for understanding the transformative impact of Large Language Models (LLMs) and their enterprise-ready applications. We begin by introducing LLMs, exploring their rapid adoption, and setting the stage for understanding their broader implications in business and technology. Next, we dive into the fundamentals of Azure OpenAI Service, detailing its partnership with OpenAI, model deployment processes, and pricing aspects. Finally, we progress to advanced concepts such as embedding models, multimodal capabilities, and fine-tuning, equipping you with a deeper understanding of Azure OpenAI’s extensive capabilities.
This part has the following chapters:
Chapter 1, Introduction to Large Language ModelsChapter 2, Azure OpenAI FundamentalsChapter 3, Azure OpenAI Advanced TopicsIn the world of technology, Chat Generative Pre-trained Transformer (ChatGPT) is a large language model (LLM)-based chatbot that was launched by OpenAI on November 30, 2022. Just in ChatGPT’s first week, over a million people started using the technology. This is an important moment because it shows how regular people are now using generative artificial intelligence (AI) in their daily lives. By January 2023, ChatGPT had over 100 million users, making it the fastest-growing application in history and making OpenAI, the company behind it, worth $29 billion.
In this introductory chapter, we’ll establish the basic concepts behind LLMs, look at some examples, understand the concept of foundation models, and provide various business use cases where LLMs can be applied to solve complex problems.
This chapter covers the following topics:
What are LLMs?LLM examplesThe concept of foundation modelsLLM use casesLLMs are a modern breakthrough in deep learning that focuses on human languages. They’ve shown themselves to be useful in many ways, such as content creation, customer support, coding assistance, education and tutoring, medical diagnosis, sentiment analysis, legal assistance, and more. Simply put, an LLM is a kind of smart computer program that can understand and create text like humans can by using large transformer models under the hood. The Transformer architecture enables models to understand context and relationships within data more effectively, making it particularly powerful for tasks involving human language and sequential data.
For humans, text is a bunch of words put together. We read sentences, sentences make up paragraphs, and paragraphs make up chapters in a document. But for computers, text is just a series of letters and symbols. To make computers understand text, we can create a model using something called recurrent neural networks. This model goes through the text one word or character at a time and gives an answer when it finishes reading everything. This model is good, but sometimes, when it gets to the end of a block of text, it has trouble recalling the text from the beginning of that block. This is where the Transformer architecture shines. The key innovation of the Transformer architecture was its use of the self-attention mechanism, which allowed it to capture relationships between different parts of a sequence more effectively than previous models.
Back in 2017, Ashish Vaswani and their team wrote a paper called Attention is All You Need (https://arxiv.org/pdf/1706.03762.pdf) to introduce a new model called the Transformer. This model uses something called attention. Unlike the old way in which recurrent neural networks process text, attention lets you look at a whole sentence or even a whole paragraph all at once, instead of just one word at a time. This helps the transformer to better “understand” words as a result of the added context. Nowadays, many of the best LLMs are built on transformers.
When you want a Transformer model to understand a piece of text, you must break it down into separate words or parts called tokens. These tokens are then turned into numbers and mapped to special codes called embeddings, which are like special maps that store the semantic meaning of the tokens. Finally, the transformer’s encoder takes these embeddings and turns them into a representation. This “representation” is a vector that captures the contextual meaning of the input tokens, allowing the model to understand and process the input more effectively. In simple terms, you can think of it as putting all the pieces together to understand the whole story.
Here’s an example of a text string, its tokenization, and its vector embedding. Note that tokenization can turn words into subwords. For example, the word “generative” can be tokenized into “gener” and “ative.”
Let’s look at the input text:
Generative AI, a groundbreaking technology fueled by intricate algorithms and machine learning, possesses the remarkable ability to independently craft content across diverse domains. By meticulously analyzing vast datasets and discerning intricate patterns, it generates textual compositions, artistic creations, and a myriad of other outputs that mirror human ingenuity. This innovative capability is reshaping industries far and wide, driving unprecedented advancements in fields such as language generation, creative arts, and data synthesis.Here’s the tokenized text:
['Gener', 'ative', ' AI', ',', ' a', ' groundbreaking', ' technology', ' fueled', ' by', ' intricate', ' algorithms', ' and', ' machine', ' learning', ',', ' possesses', ' the', ' remarkable', ' ability', ' to', ' independently', ' craft', ' content', ' across', ' diverse', ' domains', '.', ' By', ' meticulously', ' analyzing', ' vast', ' datasets', ' and', ' discern', 'ing', ' intricate', ' patterns', ',', ' it', ' generates', ' textual', ' compositions', ',', ' artistic', ' creations', ',', ' and', ' a', ' myriad', ' of', ' other', ' outputs', ' that', ' mirror', ' human', ' ing', 'enuity', '.', ' This', ' innovative', ' capability', ' is', ' resh', 'aping', ' industries', ' far', ' and', ' wide', ',', ' driving', ' unprecedented', ' advancements', ' in', ' fields', ' such', ' as', ' language', ' generation', ',', ' creative', ' arts', ',', ' and', ' data', ' synthesis', '.']Now, let’s look atthe embeddings:
[-0.02477909065783024, -0.013280253857374191, 0.014264720492064953,0.002092828741297126, 0.008900381624698639, 0.017131058499217033, 0.04224500060081482, 0.012088178656995296, -0.028958052396774292, 0.04128062725067139, 0.020171519368886948, 0.034369271248579025, 0.005337550304830074, -0.011920752003788948, 0.0027072832453995943,0.008103433065116405, 0.035440798848867416, 0.015430007129907608, … … -0.02812761813402176, -0.009549995884299278, 0.02203330025076866, 0.015215701423585415, 0.02339949831366539, -0.008967352099716663, 0.01867138035595417, -0.01762663945555687, 0.01278467196971178, 0.029922427609562874, -0.0002689284738153219, -0.010213003493845463]Think of the context vector as the heart of the input information. It enables the transformer’s decoder to determine what to say next. For example, by providing the decoder with a starting sentence as a hint, it can suggest the next word that makes sense. This process repeats, with each new suggestion becoming part of the hint, allowing the decoder to generate a naturally flowing paragraph from an initial sentence.
Decoder-based content generation is like a game, where each move is based on the previous one, and you end up with a complete story. This method of content generation is called “autoregressive generation.” Broadly speaking, this is how LLMs work. Autoregressive generation-based models can handle long input texts while also maintaining a context vector big enough to deal with complicated ideas. In addition to this, it has many layers in its decoder, making it highly sophisticated. It’s so big that it typically can’t run on just one computer and instead must run on a cluster of nodes working together that are accessed often. That’s why it’s offered as a service through an application programming interface (API). As you might have guessed, this enormous model is trained using a massive amount of text until it understands how language works, including all the patterns and structures of sentences. Now, let’s understand the main structure of LLMs.
An LLM’s structure (see Figure 1.1) is mainly made up of different layers of neural networks, such as recurrent layers, feedforward layers, embedding layers, and attention layers. These layers collaborate to handle input text and make predictions about the output. Let’s take a closer look:
The embedding layer changes each word in the input text into a special kind of detailed description, kind of like a unique fingerprint. These descriptions hold crucial details about the words and their meanings, which helps the model understand the bigger picture.The feedforward layers in LLMs consist of many connected layers that process the detailed descriptions created in the embedding layer. These layers perform complex transformations on these embeddings, which helps the model understand the more important ideas in the input text.The recurrent layers in LLMs are designed to read the input text one step at a time. These layers have hidden memory that gets updated as they read each part of the text. This helps the model remember how the words are related to each other in a sentence.The attention mechanism is another important part of LLMs. It’s like a spotlight where the model shines on different parts of the input text. This helps the model focus on the most important parts of the text and make better predictions.For example, when you read, you don’t pay equal attention to every word; instead, you focus more on keywords and important phrases to grasp the main idea. For instance, in the sentence “The cat sat on the mat,” you might emphasize “cat” and “mat” to understand what’s happening. Additionally, you use context from previous sentences to make sense of the current one – if you read about a cat playing earlier, you understand why the cat is now sitting on the mat. As you continue reading, you adjust your focus based on what’s important for comprehension, revisiting or paying more attention to crucial sections that help you understand the overall plot.
In essence, just as humans read text by focusing on important words and using context to understand meaning, the attention mechanism in transformers focuses on key parts of the input and adjusts dynamically to capture the context and relationships between words:
Figure 1.1: Transformer architecture (source: https://arxiv.org/pdf/1706.03762.pdf)
Now that we’ve learned the basic concepts behind LLMs, let’s focus on some of the top industry examples.
Cutting-edge LLMs have been developed by many companies, including OpenAI (GPT-4), Meta (Llama 3.1), Anthropic (Claude), and Google (Gemini), to name a few. OpenAI has consistently maintained a dominant role in the field of LLMs. Let’s look at the top models that are used at the time of writing:
Generative Pre-trained Transformer (GPT): OpenAI has created various GPT models, including GPT1 (117 million parameters), GPT2 (1.5 billion parameters), GPT-3 (175 billion parameters), GPT 3.5, GPT4-Turbo, GPT4-o, and GPT4-o mini. GPT4-o is one of the most advanced LLMs globally. These models learn from a huge amount of text and can provide human-like answers to many subjects and questions. They also remember various parts of conversations.Anthropic: Anthropic’s Claude models are a family of advanced LLMs that are designed to handle complex tasks with high efficiency. The latest iteration, Claude 3, includes models such as Opus, Sonnet, and Haiku, each tailored for different performance needs. Opus is the most powerful, excelling in complex analysis and higher-order tasks, while Sonnet balances speed and intelligence, and Haiku offers the fastest response times for lightweight actions. These models are built with a focus on security, reliability, and ethical AI practices.Llama 3.1: Llama 3.1 is a cutting-edge LLM that represents a significant milestone in AI research. With its advanced architecture and massive scale, Llama 3.1 is capable of understanding and generating human-like text with unprecedented accuracy and nuance. This powerful tool has far-reaching implications for various applications, including natural language processing (NLP), text generation, and conversational AI.Llama 2: Llama 2 is a second-generation LLM developed by Meta. It’s open source and can be used to create chatbots such as ChatGPT or Google Bard. Llama 2 was trained on 40% more data than Llama1 to make logical and natural-sounding responses. Llama 2 is available for anyone to use for research or business. Meta says that Llama 2 understands twice as much context as Llama 1. This makes it a smarter language model that can give answers that sound just like what a human would provide.Gemini: Google Gemini is a family of advanced multimodal LLMs developed by Google DeepMind. Announced on December 6, 2023, Gemini includes variants such as Gemini Ultra, Gemini Pro, Gemini Flash, and Gemini Nano. It’s designed to understand and operate across different types of information seamlessly, including text, images, audio, video, and code. Positioned as a competitor to OpenAI’s GPT-4, Gemini powers Google’s AI chatbot and aims to boost creativity and productivity.PaLM 2: Finally, PaLM 2 is Google’s updated LLM. It’s skilled at handling complex tasks such as working with code and math, categorizing and answering questions, translating languages, being proficient in multiple languages, and creating human-like sentences. It outperforms the previously mentioned models, including the original PaLM. Google is careful about how it creates and uses AI, and PaLM 2 is a part of this approach. It went through thorough evaluations so that it could be checked for potential problems and biases. PaLM 2 is not just used in isolation but is also used in other advanced models, such as Med-PaLM 2 and Sec-PaLM. It’s also responsible for powering AI features and tools at Google, such as Bard and the PaLM API.The evolutionary tree (see Figure 1.2) of modern LLMs illustrates how these models have evolved in recent years and highlights some of the most famous ones. Models that are closely related are shown on the same branches. Models that use the Transformer architecture are shown in different colors: those that only decode are on the blue branch, ones that only encode are on the pink branch, and models that do both encoding and decoding are on the green branch. The position of the models on the timeline shows when they were released. Open source models are represented by filled squares, while models that are not open source are represented by empty squares. The bar chart at the bottom right displays the number of models from different companies and organizations:
Figure 1.2: The evolutionary tree of modern LLMs (source: https://arxiv.org/abs/2304.13712)
Having explored some exemplary LLM instances, let’s discuss the concept of foundation models and their advantages and disadvantages.
In recent years, there has been a huge buzz around LLMs such as ChatGPT sweeping across the world. LLMs are a subset of a broader category of models known as foundation models. Interestingly, the term “foundation models” was initially introduced by a team from Stanford. They observed a shift in the AI landscape, leading to the emergence of a new paradigm.
In the past, AI applications were constructed by training individual AI models, each tailored to a specific task using specialized data. This approach often involved assembling a library of various AI models in a mostly supervised training manner. Its foundational capability, known as a foundation model (see Figure 1.3), would become the driving force behind various applications and use cases. Essentially, this single model could cater to the very same applications that were once powered by distinct AI models in the traditional approach. This meant that a single model could fuel an array of diverse applications. The key here is that this model possesses the incredible ability to adapt to a multitude of tasks. What empowers this model to achieve such versatility is the fact that it has undergone extensive training on an immense volume of unstructured data through an unsupervised approach:
Figure 1.3: Foundational model (source: https://arxiv.org/pdf/2108.07258.pdf)
Consider a scenario where I start a sentence with “Don’t count your chickens before they’re.” Now, my goal is to guide the model in predicting the last word, which could be “hatched,” “grown,” or even “gone.” This process involves training the model to anticipate the appropriate word by analyzing the context provided by the words that come before it in the sentence. The impressive ability to generate predictions for the next word, while drawing on the context of preceding words it has encountered, positions foundation models within the realm of generative AI. In essence, these models fall under the category of generative AI because they’re capable of crafting something novel – in this case, predicting the upcoming word in a sentence.
Although these models are primarily designed to generate predictions, particularly anticipating the next word in a sentence, they offer immense capabilities. With the addition of a modest amount of labeled data, we can adjust these models to perform exceptionally well on more traditional NLP tasks. These tasks include activities such as classification or named-entity recognition, which are typically not associated with generative capabilities. This transformation is achieved through a process known as fine-tuning. When you fine-tune your foundation model with a modest dataset, you adjust its parameters so that it can excel at a specific natural language task. This way, the model evolves from being primarily generative to being a powerful tool for targeted NLP tasks.
Even with limited data, foundation models can prove highly effective, especially in domains where data is scarce. Through a process known as prompting, or prompt engineering, techniques such as in-context learning, zero-shot, one-shot, and few-shot learning can be used to tackle complex downstream tasks. Let’s break down how you could prompt a model for a classification task. Imagine that you provide the model with a sentence and follow it up with the question, “Does this sentence carry a positive or negative sentiment?” The model would then work its magic, completing the sentence with generated words. The very next word it generates would serve as the answer to your classification question. Depending on where it perceives the sentiment of the sentence to lie, the model would respond with either “positive” or “negative.” This method leverages the model’s inherent ability to generate contextually relevant text to solve a specific classification challenge. We’ll cover different prompting techniques and advanced prompt engineering later in this book.
Let’s talk about some of the key advantages of foundation models:
Performance: These models have been trained on an immense amount of content with data volumes regularly in the terabyte range. When employed for smaller tasks, these models exhibit remarkable performance that far surpasses models trained on only a handful of data points.Productivity gain: LLMs can boost productivity in a big way. They’re like a super-efficient human for tasks that usually take a lot of time and effort. For instance, in customer service, LLMs can quickly answer common questions, freeing up human workers to handle more complex issues. In businesses, they can process and organize data way faster than people can. Using LLMs, companies can save time and money. This lets them focus on important tasks and thus acts like a turbocharger for productivity.However, these foundation models also have key challenges and limitations:
Cost: These models tend to be quite costly to train because of the huge data volumes needed. This often poses challenges for smaller businesses attempting to train their foundation models. Additionally, as these models grow in size, reaching a scale of several billion parameters, they can become pricey to use for inference.Cloud providers such as Microsoft offer a service called Azure OpenAI. This service lets businesses use these models on-demand and pay only for what they use. This is similar to renting a powerful computer for a short time instead of buying one outright. Leveraging this service-based capability allows companies to save money on both model training and model use, especially considering the powerful, GPU-based hardware required.
To summarize, using services such as Azure OpenAI, businesses can take advantage of these advanced models without spending a ton on resources and infrastructure.
Trustworthiness: Just as data serves as a massive advantage for these models, there’s a flip side to consider: LLMs are trained on vast amounts of internet-scraped language data that may contain biases, hate speech, or toxic content, compromising their reliability. This would be a monumental task. Furthermore, there’s the challenge of not even fully knowing what the data comprises. For many open source models, the exact datasets used for training many LLMs are unclear, making it difficult to assess their raising concerns about the models’ trustworthiness and potential biases. The sheer scale of LLM training data makes it nearly impossible for human annotators to thoroughly vet each data point, increasing the risk of unintended consequences such as perpetuating harmful biases or generating toxic content.Big organizations are fully aware of the immense possibilities that these technologies hold. To solve the foundational model trustworthiness issue, OpenAI, Microsoft, Google, and Anthropic are jointly unveiling the creation of the Frontier Model Forum (https://blogs.microsoft.com/on-the-issues/2023/07/26/anthropic-google-microsoft-openai-launch-frontier-model-forum), a fresh industry initiative aimed at ensuring the safe and responsible advancement of frontier AI models. This new collaborative entity will tap into the collective technical and operational prowess of its member companies to foster progress across the broader AI landscape. One of its core objectives involves driving technical evaluations and benchmarks forward. Additionally, the forum will strive to construct a publicly accessible repository of solutions, bolstering the adoption of industry best practices and standards throughout the AI domain.
Hallucination: Sometimes, LLMs can come up with information or answers that might not be entirely accurate. This is like when you have a dream that seems real, but it’s not based on what’s happening. LLMs might generate text that sounds right but isn’t completely true or accurate. So, while LLMs are highly intelligent, they can sometimes make mistakes or come up with content that isn’t real.The applications of LLMs often require human oversight to make sure the outputs are trustworthy. However, there is a promising technique called grounding the model