Building AI Applications with Microsoft Semantic Kernel - Lucas A. Meyer - E-Book

Building AI Applications with Microsoft Semantic Kernel E-Book

Lucas A. Meyer

0,0
32,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

In the fast-paced world of AI, developers are constantly seeking efficient ways to integrate AI capabilities into their apps. Microsoft Semantic Kernel simplifies this process by using the GenAI features from Microsoft and OpenAI.
Written by Lucas A. Meyer, a Principal Research Scientist in Microsoft’s AI for Good Lab, this book helps you get hands on with Semantic Kernel. It begins by introducing you to different generative AI services such as GPT-3.5 and GPT-4, demonstrating their integration with Semantic Kernel. You’ll then learn to craft prompt templates for reuse across various AI services and variables. Next, you’ll learn how to add functionality to Semantic Kernel by creating your own plugins. The second part of the book shows you how to combine multiple plugins to execute complex actions, and how to let Semantic Kernel use its own AI to solve complex problems by calling plugins, including the ones made by you. The book concludes by teaching you how to use vector databases to expand the memory of your AI services and how to help AI remember the context of earlier requests. You’ll also be guided through several real-world examples of applications, such as RAG and custom GPT agents.
By the end of this book, you'll have gained the knowledge you need to start using Semantic Kernel to add AI capabilities to your applications.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 295

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Building AI Applications with Microsoft Semantic Kernel

Easily integrate generative AI capabilities and copilot experiences into your applications

Lucas A. Meyer

Building AI Applications with Microsoft Semantic Kernel

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Niranjan Naikwadi

Publishing Product Manager: Tejashwini R

Book Project Manager: Neil D’mello

Senior Editor: Mark D’Souza

Technical Editor: K Bimala Singha

Copy Editor: Safis Editing

Proofreader: Mark D’Souza

Indexer: Pratik Shirodkar

Production Designer: Shankar Kalbhor

DevRel Marketing Coordinator: Vinishka Kalra

First published: June 2024

Production reference: 1060624

Published by

Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK

ISBN 978-1-83546-370-3

www.packtpub.com

Contributors

About the author

Lucas A. Meyer is a financial economist and computer scientist with an MBA and an M.Sc. in finance from the University of Washington in Seattle. Lucas works as a principal research scientist at the Microsoft AI for Good Lab, where he works on the economics of AI and using large language models to combat disinformation and improve economic outcomes. Prior to that, Lucas worked for several years in finance, pioneering several uses of natural language processing for financial analysis and automation. Lucas is passionate about AI education and is a Top Voice on LinkedIn, where he posts about new developments in the field.

To my wife, Ilanah, and my children, Thomas, Eric, and Caroline: you make it easy and worthwhile for me to face any challenge. To my mom, Ana Maria, thank you for teaching me that education is the best investment. To my colleagues at the AI for Good Lab: I learn so much from you. To the online communities of LinkedIn and Threads, thanks for all you taught me and all the support.

About the reviewer

Lucas Puskaric has driven billions of dollars in revenue as a staff software engineer and tech lead, specializing in crafting creative customer-focused solutions across the entire stack. Over his career, he has worked at every type of company from non-profit to Fortune 25. After helping vaccinate tens of millions of people, he founded the software company Interweave. When LLMs arrived on the scene, he was an early adopter who released several AI apps. He’s known for his presence on Threads, where he often shares technical knowledge and projects he’s working on. Connect with him at @lucaspuskaric on all socials.

Shoutout to the love of my life, Alex, and our four cats: Reggie, Sonny, Nori, and Figgy. Also, thanks to Threads for introducing me to Lucas Meyer and Packt for giving me the opportunity to review this masterpiece.

Table of Contents

Preface

Part 1: Introduction to Generative AI and Microsoft Semantic Kernel

1

Introducing Microsoft Semantic Kernel

Technical requirements

Obtaining an OpenAI API key

Obtaining an Azure OpenAI API key

Generative AI and how to use it

Text generation models

Understanding the difference between applications and models

Generating text using consumer applications

Generating images

Microsoft Semantic Kernel

Installing the Microsoft Semantic Kernel package

Using Semantic Kernel to connect to AI services

Connecting to OpenAI Services using Python

Connecting to OpenAI services using C#

Running a simple prompt

Running a simple prompt in Python

Running a simple prompt in C#

Using generative AI to solve simple problems

Creating semantic functions

Creating native functions

Plugins

The config.json file for the knock-knock joke function

The skprompt.txt file for the knock-knock joke function

The config.json file for the semantic function that explains jokes

The skprompt.txt file for the explain joke function

Loading the plugin from a directory into the kernel

Using a planner to run a multistep task

Calling the Function Calling Stepwise planner with Python

Summary

References

2

Creating Better Prompts

Technical requirements

A simple plugin template

The skprompt.txt file

The config.json file

Calling the plugin from Python

Calling the plugin from C#

Results

Improving the prompt to get better results

Revising the skprompt.txt file

The result

Prompts with multiple variables

Requesting a complex itinerary with Python

Requesting a complex itinerary with C#

The result of the complex itinerary

Issues when answering math problems

Multistage prompts

CoT – “Let’s think step by step”

Implementing CoT with Python

Implementing CoT with C#

Results for CoT

An ensemble of answers

Summary

References

Part 2: Creating AI Applications with Semantic Kernel

3

Extending Semantic Kernel

Technical requirements

Getting to know the core plugins

An example – Using the TimePlugin

Introducing the application – Validating grants

Directory structure of our application

Developing native plugins

The directory structure of our plugins

Checking the structure of our Excel spreadsheet

Additional checks

Evaluating the Word document

Developing semantic plugins

Evaluating the grant proposal with a semantic plugin

Summary

4

Performing Complex Actions by Chaining Functions

Technical requirements

Creating a native plugin that generates images

Writing a DALL-E 3 wrapper in Python

Writing a DALL-E 3 wrapper in C#

Using multiple steps to solve a problem

Generating an image from a clue

Chaining semantic and native functions with C#

Chaining semantic and native functions with Python

Dealing with larger, more complex chains

Preparing our directory structure

Understanding the flow of our process

Creating the native function to process a folder

Modifying the Excel native plugin

Modifying the Word native plugin

Modifying the semantic functions

Creating and calling the pipeline

Summary

References

5

Programming with Planners

Technical requirements

What is a planner?

When to use a planner

Instantiating a planner

Creating and running a plan

An example of how a planner can help

How do planners work?

Controlling home automation with the planner

Creating the native functions

Adding a semantic function to suggest movies

Invoking the planner

Summary

6

Adding Memories to Your AI Application

Technical requirements

Defining memory and embeddings

How does semantic memory work?

Embeddings in action

Using memory within chats and LLMs

Using memory with Microsoft Semantic Kernel

Using memory in chats

Reducing history size with summarization

Summary

Part 3: Real-World Use Cases

7

Real-World Use Case – Retrieval-Augmented Generation

Technical requirements

Why would you need to customize GPT models?

Retrieval-augmented generation

Creating an index

Uploading documents to the index

Using the index to find academic articles

Using RAG to create a summary of several articles on a topic

Summary

References

8

Real-World Use Case – Making Your Application Available on ChatGPT

Technical requirements

Custom GPT agents

Creating a custom GPT

Creating a custom GPT that supports actions

Creating a web API wrapper for the native function

Deploying your application to an Azure Web App

Connecting the custom GPT with your custom GPT action

Summary

Index

Other Books You May Enjoy

Part 1:Introduction to Generative AI and Microsoft Semantic Kernel

In this part, you will get an overview of generative AI and how to use it with Microsoft Semantic Kernel. In addition to this, you will also learn best practices for prompting that can be useful not only when you’re using Semantic Kernel, but also when you’re interacting with AI on your own.

This part includes the following chapters:

Chapter 1, Introducing Microsoft Semantic Kernel

Chapter 2, Creating Better Prompts

1

Introducing Microsoft Semantic Kernel

The generative artificial intelligence (GenAI) space is evolving quickly, with dozens of new products and services being launched weekly; it is becoming hard for developers to keep up with the ever-changing features and application programming interfaces (APIs) for each of the services. In this book, you will learn about Microsoft Semantic Kernel, an API that will make it a lot easier for you to use GenAI as a developer, making your code shorter, simpler, and more maintainable. Microsoft Semantic Kernel will allow you, as a developer, to use a single interface to connect with several different GenAI providers. Microsoft used Semantic Kernel to develop its copilots, such as Microsoft 365 Copilot.

Billions of people already use GenAI as consumers, and you are probably one of them. We will start this chapter by showing some examples of what you can do with GenAI as a consumer. Then, you will learn how you can start using GenAI as a developer to add AI services to your own applications.

In this chapter, you will learn the differences between using GenAI as a user and as a developer and how to create and run a simple end-to-end request with Microsoft Semantic Kernel. This will help you see how powerful and simple Semantic Kernel is and will serve as a framework for all further chapters. It will enable you to begin integrating AI into your own apps right away.

In this chapter, we’ll be covering the following topics:

Understanding the basic use of a generative AI application like ChatGPTInstalling Microsoft Semantic KernelConfiguring Semantic Kernel to interact with AI servicesRunning a simple task using Semantic Kernel

Technical requirements

To complete this chapter, you will need to have a recent, supported version of your preferred Python or C# development environment:

For Python, the minimum supported version is Python 3.10, and the recommended version is Python 3.11For C#, the minimum supported version is .NET 8

Important note

The examples are presented in C# and Python, and you can choose to only read the examples of your preferred language. Occasionally, a feature is available in only one of the languages. In such cases, we provide an alternative in the other language for how to achieve the same objectives.

In this chapter, we will call OpenAI services. Given the amount that companies spend on training these large language models (LLMs), it’s no surprise that using these services is not free. You will need an OpenAI API key, obtained either directly through OpenAI or Microsoft, via the Azure OpenAI service.

Important: Using the OpenAI services is not free

The examples that we will run throughout this book will call the OpenAI API. These calls require a paid subscription, and each call will incur a cost. The costs are usually small per request (for example, GPT-4 costs up to $0.12 per 1,000 tokens), but they can add up. In addition, note that different models have different prices, with GPT-3.5 being 30 times less expensive per token than GPT-4.

OpenAI pricing information can be found here: https://openai.com/pricing

Azure OpenAI pricing information can be found here: https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/

If you use .NET, the code for this chapter is at https://github.com/PacktPublishing/Building-AI-Applications-with-Microsoft-Semantic-Kernel/tree/main/dotnet/ch1.

If you use Python, the code for this chapter is at https://github.com/PacktPublishing/Building-AI-Applications-with-Microsoft-Semantic-Kernel/tree/main/python/ch1.

You can install the required packages by going to the GitHub repository and using the following: pip install -r requirements.txt.

Obtaining an OpenAI API key

Go to the OpenAI Platform website (https://platform.openai.com).Sign up for a new account or sign in with an existing account. You can use your email or an existing Microsoft, Google, or Apple account.On the left sidebar menu, select API keys.On the Project API keys screen, click the button labeled + Create new secret key (optionally, give it a name).

Important

You have to copy and save the key immediately. It will disappear as soon as you click Done. If you didn’t copy the key or if you lost it, you need to generate a new one. There’s no cost to generate a new key. Remember to delete old keys.

Obtaining an Azure OpenAI API key

Currently, you need to submit an application to obtain access to the Azure OpenAI Service. To apply for access, you need to complete a form at https://aka.ms/oai/access.

The instructions to obtain an Azure OpenAI API key are available at https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource.

Generative AI and how to use it

Generative AI refers to a subset of artificial intelligence programs that are capable of creating content that is similar to what humans can produce. These systems use training from very large datasets to learn their patterns, styles, and structures. Then, they can generate entirely new content, such as synthesized images, music, and text.

Using GenAI as consumer or end-user is very easy, and as a technical person, you probably have already done it. There are many consumer-facing AI products. The most famous is OpenAI’s ChatGPT, but there are many others that have hundreds of millions of users every day, such as Microsoft Copilot, Google Gemini (formerly Bard), and Midjourney. As of October 2023, Meta, the parent company of Facebook, WhatsApp, and Instagram, is making GenAI services available to all its users, increasing the number of GenAI daily users to billions.

While the concept of GenAI has existed for a while, it gained a lot of users with the release of OpenAI’s ChatGPT in November of 2022. The initial version of ChatGPT was based on a model called generative pre-trained transformer (GPT) version 3.5. That version was a lot better than earlier versions in the task of mimicking human-like writing. In addition, OpenAI made it easy to use by adding a chatbot-like interface and making it available to the general public. This interface is called ChatGPT. With ChatGPT, users can easily initiate tasks in their own words. At its launch, ChatGPT was the product with the fastest adoption rate in history.

The GenAI concept was further popularized with the release of Midjourney, an application that allows for the generation of high-quality images from prompts submitted through Discord, a popular chat application, and Microsoft Copilot, a free web application that can generate text by using GPT-4 (the newest version of OpenAI’s GPT) and images by using an OpenAI model called DALL-E 3.

In the upcoming subsections, we will discuss text and image generation using GenAI applications and explain the differences between generating them using applications such as ChatGPT and with an API as a developer.

Text generation models

The initial use cases for GenAI were to generate text based on a simple instruction called a prompt.

The technology used behind most text-based GenAI products is called a transformer, and it was introduced in the paper Attention is All you Need [1] in 2017. The transformer immensely improved the quality of the text being generated, and in just a few years, the text looked very similar to human-generated text. The transformer greatly improved the ability of AI to guess masked words in a phrase after being trained on a large number of documents (a corpus). Models trained on very large corpuses are called large language models (LLMs).

If LLMs are given a phrase such as “I went to the fast-food restaurant to <X>,” they can generate good options for X, such as “eat.” Applying the transformer repeatedly can generate coherent phrases and even stories. The next iteration could be “I went to the fast-food restaurant to eat <X>,” returning “a,” and then “I went to the fast-food restaurant to eat a <X>,” could return “burger,” forming the full phrase “I went to the fast-food restaurant to eat a burger.”

The performance of an LLM model depends on the number of parameters, which is roughly proportional to how many comparisons a model can make at once, the context window, the maximum size of the text that can be handled at once, and the data used to train the model, which is usually kept secret by the companies that create LLMs.

The GPT is a model created by OpenAI that uses transformers and is good at generating text. There are many versions of GPT:

GPT-1, launched in February 2018, had 120 million parameters and a context window of 512 tokens.

GPT-2 was launched in February 2019, with the number of parameters increasing to 1.5 billion and the context window increasing to 1,024 tokens. Up to this point, while they sometimes produced interesting results, these models were mostly used by academics.

This changed with GPT-3, launched in June 2020, which had several sizes: small, medium, large, and extra-large. Extra-large had 175 billion parameters and a 2,048 token context window. The generated text was, in most cases, hard to distinguish from human-generated text. OpenAI followed it with GPT-3.5, released in November 2022, with still 175 billion parameters and a context window of 4,096 tokens (now expanded to 16,384 tokens), and launched a user interface named ChatGPT.

ChatGPT is a web and mobile application that uses the GPT models in the background and allows users to submit prompts to the GPT models and get responses online. It was launched together with GPT-3.5, and at the time, it was the consumer product with the fastest adoption rate, reaching 100 million users in less than two months.

In February 2023, Microsoft released Bing Chat, which also uses OpenAI’s GPT models in the back end, further popularizing the usage of transformer models and AI. Recently, Microsoft renamed it to Microsoft Copilot.

Just a month later, in March 2023, OpenAI released the GPT-4 model, which was quickly incorporated into the backend of consumer products such as ChatGPT and Bing.

Not all details about the GPT-4 model have been released to the public. It’s known that it has a context window of up to 32,768 tokens; however, its number of parameters has not been made public, but it has been estimated at 1.8 trillion.

The GPT-4 model is notably good at human-like tasks related to text generation. A benchmark test shown in the GPT-4 technical report academic paper [2] shows the performance of GPT-3.5 and GPT-4 in taking exams. GPT-4 could pass many high-school and college level exams. You can read the paper at https://doi.org/10.48550/arXiv.2303.08774.

Understanding the difference between applications and models

Most people, including you, have likely used a GenAI application, such as ChatGPT, Microsoft Copilot, Bing Image Creator, Bard (now named Gemini), or Midjourney. These applications use GenAI models in their backend, but they also add a user interface and configurations that restrict and control the output of the models.

When you are developing your own application, you will need to do these things by yourself. You may not yet realize how much is carried out behind the scenes by applications such as Bing and ChatGPT.

When you submit a prompt to an application, the application may add several additional instructions to the prompt you submitted. The most typical things added are instructions to restrict some types of output, for example: “your reply should contain no curse words.” For example, when you submit the prompt “Tell me a joke” to an application like ChatGPT, it may modify your prompt to “Tell me a joke. Your reply should contain no curse words” and pass that modified prompt to the model.

The application may also add a summary of the questions that you have already submitted and the answers that were already given. For example, if you ask, “How warm is Rio de Janeiro, Brazil, in the summer?,” the answer may be, “Rio de Janeiro is typically between 90 and 100 degrees Fahrenheit (30-40 degrees Celsius) in the summer.” If you then ask the question, “How long is the flight from New York to there?,” an application such as ChatGPT will not submit “How long is the flight from New York to there?” directly to the model because the answer would be something like “I don’t know what you mean by ‘there’.”

A straightforward way to address this problem is to save everything that the user entered and all the answers that were provided and re-submit them with every new prompt. For example, when the user submits “How long is the flight from New York to there?” after asking about the temperature, the application prepends the earlier questions and answers to the prompt, and what is actually submitted to the model is: “How warm is Rio de Janeiro, Brazil, in the summer? Rio de Janeiro is typically between 90 and 100 degrees Fahrenheit (30-40 degrees Celsius) in the summer. How long is the flight from New York to there?” Now, the model knows that “there” means “Rio de Janeiro,” and the answer will be something like “Approximately 10 hours.”

The consequence of appending all earlier prompts and responses to each new prompt is that it consumes a lot of space in the context window. Therefore, some techniques have been developed to compress the information that is added to the user prompt. The simplest technique is to keep only the earlier user questions, but not the answers given by the application. In that case, for example, the modified prompt would be something like “Earlier I said: ‘How warm is Rio de Janeiro, Brazil, in the summer?, now answer only: “How long is the flight from New York to there?”’. Note that the prompt needs to tell the model to respond only to the last question submitted by the user.

Knowing that applications modify your prompts will be relevant for you if you test your prompts using consumer applications because the output you get from them can be substantially different from what you get when you use the model directly through an API, such as Microsoft Semantic Kernel. There’s usually no way to know how the applications are modifying your prompts, as the providers usually don’t reveal all their techniques.

Furthermore, a significant part of what you will do as an application developer will be to create prompt modifications that match your own application. Therefore, when your user submits their prompt, you will add your own prompt modifications to ensure they get an appropriate result. The techniques to modify user prompts are called prompt engineering, which we will explore briefly in the next chapter.

Generating text using consumer applications

Let’s explore the two most popular text generation applications: ChatGPT and Microsoft Copilot. If you are not familiar with the power of GenAI, trying them out will give you a sense of what can be done with them and how powerful they are. We will also briefly talk about their configuration parameters and their architecture, which can help you decide the appropriate architecture for your own applications later.

OpenAI ChatGPT

If you have never used a GenAI app, ChatGPT is likely the best place to start. ChatGPT’s default backend model is GPT 3.5, a fast and very powerful model. ChatGPT is free to use when OpenAI has available capacity on their servers. You can also buy a subscription to ChatGPT Plus for $20 per month, and that will give you the ability to use their most powerful model (currently GPT-4) and ensure you will always have capacity.

To use ChatGPT, go to https://chat.openai.com.

The ChatGPT interface is very simple. It allows you to choose the backend model on top, some suggestions of prompts in the middle, and a text box to enter prompts in the bottom. It also includes a notice that the output produced may include incorrect information:

Figure 1.1 – Submitting requests to ChatGPT using the web interface

I will submit the following prompt in the Send a Message textbox:

How long is the flight between New York City and Rio de Janeiro?

ChatGPT, using the GPT-3.5 model, provides the following answer:

The flight duration between New York City and Rio de Janeiro can vary depending on several factors, including the specific departure and arrival airports, the flight route, and any layovers or stopovers. On average, a non-stop flight between John F. Kennedy International Airport (JFK) in New York City and Rio de Janeiro-Galeão International Airport (GIG) in Rio de Janeiro takes approximately 9 to 10 hours. Keep in mind that flight durations can vary, so it's always a good idea to check with airlines for the most up-to-date and accurate flight schedules and durations for your specific travel dates and routes. Additionally, if you have layovers or stopovers, the total travel time will be longer.

Microsoft Copilot

Another free alternative is Microsoft Copilot, formerly Bing Chat. It is available from the www.bing.com page, but it can be accessed directly from https://www.bing.com/chat.

The user interface for Microsoft Copilot is like the interface of ChatGPT. It has some suggestions for prompts in the middle of the screen and a text box, where the user can enter a prompt at the bottom. The Microsoft Copilot UI also shows a couple of options that will be relevant from when we use models programmatically.

The first is the conversation style. Copilot offers the option of being More Creative, More Balanced, or More Precise. This is related to the temperature parameter that will be passed to the underlying model. We will talk about the temperature parameter in Chapter 3, but in short, the temperature parameter determines how common the words chosen by the LLM are.

Parameters in Microsoft Copilot

While Microsoft Copilot does not reveal the exact configuration values, Precise is likely equivalent to a low temperature (maybe between 0 and 0.2), resulting in very safe guesses for the next word. For Balanced, the temperature is likely higher (maybe between 0.4 and 0.6), resulting mostly in safe guesses, but with the occasional guess being rare. Creative is the next step, likely around 0.8. Most guesses will still be safe, but more guesses will be rare words. Since LLMs guess words of a phrase in sequence, previous guesses influence subsequent guesses. When generating a phrase, each rare word makes the whole phrase more unusual.

Another interesting component in the UI is that the bottom right of the text box shows how many characters have been entered, giving you an idea of how much you will consume of the underlying model’s context window. Note that you cannot know for sure how much you will consume because the Copilot application will modify your prompt.

Figure 1.2 – Microsoft Copilot user interface

On August 11, 2023, Mikhail Parakhin, Bing’s former CEO, posted on X/Twitter that Microsoft Copilot outperforms GPT-4 because it uses retrieval augmented inference (https://x.com/MParakhin/status/1689824478602424320?s=20):

Figure 1.3 – Post by Bing’s former CEO about Microsoft Copilot using RAG

We will talk more about retrieval augmented inference in Chapters 6 and 7, but for our current purposes, this means that Microsoft Copilot does not directly submit your prompt to the model. Bing has not publicly released the details of their architecture yet, but it is likely that Bing modifies your prompt (it shows the modified prompt in the UI, under Searching for), makes a regular Bing query using the modified prompt, gathers the results of that Bing query, concatenates them, and submits the concatenated results as a large prompt to a GPT model, asking it to combine the results to output a coherent answer.

Using