Azure AI-102 Certification Essentials - Peter T. Lee - E-Book

Azure AI-102 Certification Essentials E-Book

Peter T. Lee

0,0
35,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Written by a seasoned solutions architect and Microsoft AI professional with over 25 years of IT experience, Azure AI-102 Certification Essentials will help you gain the skills and knowledge needed to confidently pass the Azure AI-102 certification exam and advance your career. This comprehensive guide covers all of the exam objectives, from designing AI solutions to integrating AI models into Azure services. By combining theoretical concepts with visual examples, hands-on exercises, and real-world use cases, the chapters teach you how to effectively apply your new-found knowledge.
The book emphasizes responsible AI practices, addressing fairness, reliability, privacy, and security, while guiding you through testing AI models with diverse data and navigating legal considerations. Featuring the latest Azure AI tools and technologies, each chapter concludes with hands-on exercises to reinforce your learning, culminating in Chapter 11's comprehensive set of 45 mock questions that simulate the actual exam and help you assess your exam readiness.
By the end of this book, you'll be able to confidently design, implement, and integrate AI solutions on Azure, while achieving this highly sought-after certification.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 472

Veröffentlichungsjahr: 2025

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Azure AI-102 Certification Essentials

Master the AI Engineer Associate exam with real-world case studies and full-length mock tests

Peter T. Lee

Azure AI-102 Certification Essentials

Copyright © 2025 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

The author acknowledges the use of cutting-edge AI, such as ChatGPT, with the sole aim of enhancing the language and clarity within the book, thereby ensuring a smooth reading experience for readers. It’s important to note that the content itself has been crafted by the author and edited by a professional publishing team.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Portfolio Director: Sunith Shetty

Relationship Lead: Sanjana Gupta

Project Manager: Hemangi Lotlikar

Content Engineer: Nathanya Dias

Technical Editor: Arjun Varma

Copy Editor: Safis Editing

Proofreader: Nathanya Dias

Indexer: Rekha Nair

Production Designer: Alishon Falcon

Growth Lead: Bhavesh Amin

First published: August 2025

Production reference: 1170725

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-83620-527-2

www.packtpub.com

I am deeply grateful to my parents, SeungHoo Lee and Jungsook Lee, for instilling in me strong values and making countless sacrifices that enabled me to pursue an education in the U.S. despite financial hardships. I also extend heartfelt thanks to my aunt, Jungyeon Lee—your emotional support throughout my college years at Temple University helped me stay focused and resilient.

Contributors

About the author

Peter T. Lee is a Senior Solution Architect at Microsoft, specializing in AI and data with over 25 years of IT experience spanning industries such as telecom, fintech, payments, retail, and pharmacy. Recently, his focus has been on delivering Generative AI projects, developing data extraction solutions for unstructured data, and spearheading AI initiatives in the financial, banking, insurance, and capital markets sectors. With deep expertise in cloud platforms such as Azure, AWS, and GCP, Peter excels in designing scalable and resilient architectures while enabling organizations to adopt cutting-edge AI/ML and Generative AI technologies. Holding over 18 industry certifications, he embodies a strong commitment to continuous learning and innovation.

I would like to express my deepest gratitude to my loving and patient wife, Jayeol Koo, and my son, Joshua K. Lee, for their unwavering support, boundless patience, and constant encouragement throughout the journey of writing this book.

I am deeply grateful to my parents, SeungHoo Lee and Jungsook Lee, for instilling in me strong values and making countless sacrifices that enabled me to pursue an education in the U.S. despite financial hardships. I also extend heartfelt thanks to my aunt, Jungyeon Lee—your emotional support throughout my college years at Temple University helped me stay focused and resilient.

About the reviewers

Wilson Mok is a Microsoft MVP and Databricks Champion, passionate about helping others learn and grow in data and AI. As a Senior Data Architect and Advisor, he focuses on driving digital transformation and enabling organizations to make data-driven decisions. He shares practical insights through articles, presentations, and training, contributing to user groups, industry events, and publications. His work emphasizes leadership in creating innovative solutions that leverage modern data platforms to improve operational efficiency and deliver business value. Wilson is dedicated to mentoring professionals and inspiring the next generation to build with confidence in the AI-driven future.

About the reviewers

Wilson Mok is a Microsoft MVP and Databricks Champion, passionate about helping others learn and grow in data and AI. As a Senior Data Architect and Advisor, he focuses on driving digital transformation and enabling organizations to make data-driven decisions. He shares practical insights through articles, presentations, and training, contributing to user groups, industry events, and publications. His work emphasizes leadership in creating innovative solutions that leverage modern data platforms to improve operational efficiency and deliver business value. Wilson is dedicated to mentoring professionals and inspiring the next generation to build with confidence in the AI-driven future.

Rahat Yasir is one of Canada’s top 30 software developers under 30 (2018) and a ten-time Microsoft MVP Award holder in AI. With expertise in imaging, data analysis, cross-platform technologies, and enterprise-level data and AI system design, he authored Windows Phone 8.1 Complete Solution and Universal Windows Platform Complete Solution. He has contributed to AI research at P2IRC, developed early AI video upscaling tools at IDS, and built a production-grade financial AI system at Intact Financial. He has led AI initiatives at OSEDEA, CAE, and ISAAC Instruments, shaping AI for manufacturing, aviation, defense, and transportation. Currently, he is Head of Data Insights & Advanced Analytics at IATA, driving AI in global aviation data management.

Steve Miles holds a senior technology leadership role within the cloud practice of part of a multi-billion turnover IT distributor. Steve is a Microsoft Azure MVP, Microsoft Certified Trainer (MCT), and an Alibaba Cloud MVP. Steve has over 25 years of Microsoft-focused technology experience, along with his previous military career in engineering, signals, and communications. Among other books, Steve is the author of the number 1 Amazon best-selling AZ-900 certification book titled Microsoft Azure Fundamentals and Beyond, as well as Microsoft Azure AI Fundamentals AI-900 Exam Guide and Microsoft Certified Azure Data Fundamentals (DP-900) Exam Guide.

Table of Contents

Preface

Part 1: Foundations and Essentials of Azure AI

1

Understanding AI, ML, and Azure’s AI Services

Foundations of AI: exploring ML, LMs, and key AI capabilities

AI

ML

Deep learning

Six key AI capabilities

Exploring Azure AI services

Azure AI Search

Document Intelligence

Video Indexer

Azure OpenAI Service

Azure Vision

Azure Speech

Azure Language

Content Safety

Summary

Review questions

Further reading

2

Getting Started with Azure AI: Studios, Pipelines, and Containerization

Technical requirements

Various AI studios

Creating and configuring Azure AI services

Exercise 1: Getting started with Azure AI services

Integrating CI/CD in Azure AI and machine learning development

Traditional versus AI-based system testing and monitoring

Key considerations for CI/CD in AI and machine learning projects

Container deployment strategies

Exercise 2: Using an Azure AI services container

Summary

Review questions

Further reading

3

Managing, Monitoring, and Securing Azure AI Services

Technical requirements

Managing diagnostic logging

Exercise 1: Creating resources for diagnostic log storage

Exercise 2: Viewing log data in Azure Log Analytics

Monitoring metrics

Add metric

Adding a metric to a dashboard

Managing costs for Azure AI services

Planning costs

Viewing costs

Setting up cost alerts

Exercise 3: Setting up an alert rule

Exercise 4: Visualizing a metric

Understanding authentication

Exercise 5: Regenerating keys

Protecting keys with Azure Key Vault

Microsoft Entra ID authentication

Authenticating requests to Azure AI services

Exercise 6: Managing Azure AI services security

Configuring network security

Managing default network access rules

Granting access from a virtual network

Summary

Review questions

Further reading

Part 2: Practical Applications of Azure AI

4

Implementing Content Moderation Solutions

Planning for responsible AI principles

Recognizing the risks associated with generative AI

Innovating responsibly through iteration

Understanding built-in security and safety systems

Implementing mitigating strategies

Leveraging Azure AI Content Safety

Azure AI Content Safety overview

Content safety evaluation in Azure AI Foundry

Exercise 1: Content filtering via Azure OpenAI

Exercise 2: Create an Azure AI Content Safety resource

Exercise 3: Image content via AI Foundry

Summary

Review questions

Further reading

5

Exploring Azure AI Vision Solutions

Analyzing images

Exercise 1: Analyzing images using Azure AI Vision

Implementing model customization

Custom model types overview

Creating a custom project

Labeling and training a custom model

Exercise 2: Creating a custom model training project

Implementing the Azure AI Face service

Key features of Azure AI Face

Common use cases for Azure AI Face

Getting started with the Azure AI Face service

Exercise 3: Detecting and analyzing faces using the Azure AI Face service (Python)

Overview of OCR in Azure AI Vision

How OCR works in Azure AI Vision

Common use cases for OCR

Exercise 4: Reading text in images using Azure AI Vision OCR (Python)

Analyzing videos with Azure AI Video Indexer

Key features of video analysis in Azure AI Vision

Common use cases for video analysis

Getting started with video analysis in Azure AI Vision

Exercise 5: Analyzing video content using Azure AI Video Indexer (Python)

Summary

Review questions

Further reading

6

Implementing Natural Language Processing Solutions

Analyzing text by using Azure AI Language

Exercise 1: Text analysis with Azure AI Language

Processing speech by using Azure AI Speech

Key features

Accessing the Azure AI Speech service

Configuring audio formats and voices

Exercise 2: Recognizing and synthesizing speech

Translating text/speech with speech services

Translating speech to text using the SDK

Synthesize translations (speech-to-speech translation)

Exercise 3: Translating documents from a source language to a target language

Exercise 4: Translating speech using Azure AI Speech

Building a conversational language understanding model

Exercise 5: Building a conversational language understanding model

Creating a custom question-answering solution by using Azure AI Language

Exercise 6: Creating question-answering solution

Developing NLP solutions

Exercise 7: Creating custom text classification

Summary

Review questions

Further reading

7

Implementing Knowledge Mining, Document Intelligence, and Content Understanding

Exploring Azure AI Search

Azure AI Search process

Exercise 1: Creating an Azure AI Search service

Understanding indexes, skillsets, and indexers in the Azure portal

Managing knowledge store projections

Exercise 2: Creating an index, skillset, indexer, custom skill, and knowledge store within VS Code

Implementing the Document Intelligence solution

Document Intelligence capabilities

Exercise 3: Document Intelligence Studio/Azure AI Foundry – UI interface and no coding

Exercise 4: Document Intelligence client libraries approach

Understanding Azure AI Content Understanding

What is Azure AI Content Understanding?

Exercise 5: Analyzing content with Azure AI Content Understanding

Summary

Review questions

Further reading

8

Working on Generative AI Solutions

Azure AI Foundry

Overview of Azure AI Foundry

Exercise 1: Creating a hub, project, and AI service in the Azure portal

Using Azure OpenAI to generate content

Exercise 2: Deploying Azure OpenAI

Advanced techniques in generative AI: DALL-E 3, the RAG pattern, prompt engineering, and fine-tuning

Exercise 3: Using DALL-E 3 to generate images

Exercise 4: Applying prompt engineering techniques

Exercise 5: The RAG pattern (using your own data)

Exercise 6: Fine-tuning models with your own data

Summary

Review questions

Further reading

Part 3: Agentic AI Solutions, Applying Real-World Use Cases, and Preparing for the AI-102 Certification

9

Implementing Agentic Solutions with Azure AI Agent Service

Understanding AI agents and their use cases

Configuring resources to build an agent

Testing, optimizing, and deploying agents

Summary

Review questions

Further reading

10

Practical AI Implementation: Industry Use Cases, Technical Patterns, and Hands-On Projects

Industry use cases and key technical patterns

Modern AI tools in enterprise

AI across industries

Learning accelerators projects on GitHub

Chat your own data

The RAG pattern with database: using function calling to access and query structured data

Document Intelligence

AI Search

Summary

Further reading

11

Preparing for the AI-102 Azure AI Engineer Associate Certification Exam

Strategies and tips for success

Master key concepts through explanation

Hands-on practice

Thoroughly practice and analyze test questions

Prioritize high-weighted topics first

Exam tips

Practice exams

Summary

Further reading

Index

Other Books You May Enjoy

Part 1: Foundations and Essentials of Azure AI

Part 1 of this book is designed to provide a comprehensive foundation for working with Azure AI services. The first chapter focuses on key concepts in Artificial Intelligence (AI) and Machine Learning (ML), introducing supervised, unsupervised, and reinforcement learning, as well as advanced topics such as deep learning and Generative AI. It also covers foundational elements such as Large Language Models (LLMs) and Small Language Models (SMLs), Natural Language Processing (NLP), and prompt engineering, offering a clear understanding of these concepts without diving too deeply into technical details. The second chapter transitions into getting started with Azure AI, offering an overview of its capabilities, including services such as AI Search, Document Intelligence, Azure OpenAI Service, Vision, Speech, Language, and Content Safety, along with their features and practical applications. The third chapter focuses on managing, monitoring, and securing Azure AI services, covering critical strategies such as logging, metrics, cost management, secure key handling with Azure Key Vault, and private communication with virtual networks and private endpoints. Together, these chapters provide a solid foundation for building, deploying, and maintaining robust AI solutions.

This part has the following chapters:

Chapter 1, Understanding AI, ML, and Azure’s AI ServicesChapter 2, Getting Started with Azure AI: Studio, Pipelines, and ContainerizationChapter 3, Managing, Monitoring, and Securing Azure AI Services

1

Understanding AI, ML, and Azure’s AI Services

Artificial Intelligence (AI) and Machine Learning (ML) are becoming critical drivers of technological innovation, transforming industries globally. In this chapter, we’ll cover key AI and ML concepts, including supervised, unsupervised, and reinforcement learning, and touch on advanced areas such as deep learning and Generative AI (GenAI). You’ll also be introduced to essential elements such as Large and Small Language Models (LLMs and SLMs), Natural Language Processing (NLP), and prompt engineering, which are foundational for building intelligent systems. This chapter will give you a solid understanding without delving too deeply into technical theory.

Additionally, we’ll explore Azure’s key AI services, such as AI Search, Document Intelligence, Azure OpenAI Service, Vision, Speech, Language, and Content Safety. For each, we’ll outline its core features, functionality, and practical use cases. This chapter aims to build a knowledge base that will help you better understand the concepts and tools discussed in subsequent chapters. You can refer back to it for clarity as you progress through the book.

In this chapter, you’ll explore the following key topics:

Core concepts of AI, ML, and how they relate to each otherAn overview of different types of ML: supervised, unsupervised, and reinforcement learningIntroduction to deep learning and its application in real-world AI scenariosUnderstanding GenAI and how it creates new content such as text and imagesThe role of Language Models (LMs), including LLMs and SLMs, in natural language understandingPractical applications of NLP and the importance of prompt engineeringSix foundational AI techniques—prompt engineering, NLP, Retrieval-Augmented Generation (RAG), grounding, embedding, and tokenization—that power intelligent applicationsOverview of Microsoft Azure’s key AI services, including Azure AI Search, Document Intelligence, Azure OpenAI, Vision, Speech, Language, and Content SafetyReal-world scenarios where each Azure AI service is most effective and guidance on selecting the right tools for your use case

Let’s jump in and review the key concepts!

Foundations of AI: exploring ML, LMs, and key AI capabilities

The following diagram provides a high-level overview of the relationship between AI, ML, deep learning, GenAI, and LMs. Each layer represents a subset of the previous, showcasing the evolution of AI technology. Starting with AI in 1956, ML in 1997, and deep learning in 2017, the diagram also highlights how LMs and GenAI, which emerged more recently, fit into this broader context. Further details on these technologies are discussed in the following section.

Figure 1.1 – Brief AI history

Let’s dive deeper.

AI

While AI refers to the broader goal of simulating human intelligence, ML is one of the core methods used to achieve it. ML provides the statistical techniques and models that enable AI to learn from data.

AI is like a smart assistant that can perform tasks that typically require human intelligence, such as understanding language, recognizing images, making decisions, translating, and solving problems. Imagine having a robot that can sort your photos, play chess, translate to another language, book appointments for you, or even have conversations with you—AI makes this possible.

ML

ML is a branch of data science focused on training models to make predictions or decisions based on data. Instead of being explicitly programmed for every task, ML enables systems to learn patterns from examples and improve over time.

ML is like teaching a child to recognize animals by showing them many pictures labeled with names. Over time, the child learns to identify new animals on their own. Similarly, ML allows computers to learn from past data and generalize to new, unseen situations.

ML is broadly categorized into three main types, each with distinct characteristics and use cases:

Supervised learning: This approach uses labeled data to train models to recognize patterns and make predictions. It’s used in scenarios where accuracy is critical, such as medical diagnosis or fraud detection. For example, a supervised learning model is trained on thousands of labeled X-ray images to detect whether a tumor is present.Unsupervised learning: Here, the model identifies patterns or groupings in data without labeled outcomes. It’s useful for discovering hidden structures, such as customer segments or anomalies. For instance, a credit card company uses unsupervised learning to detect suspicious transactions that deviate from typical user behavior.Reinforcement learning: In this type, the model learns by interacting with an environment and receiving rewards or penalties based on its actions. It’s ideal for decision-making tasks involving sequences of actions. For example, a reinforcement learning agent optimizes energy usage in a data center by adjusting cooling and power settings based on real-time conditions.

Each of these types of ML has distinct advantages and is suitable for different types of problems.

Deep learning

Deep learning is a specialized subset of ML that uses artificial neural networks to model and learn complex patterns from large volumes of data. These networks are inspired by the structure of the human brain and consist of multiple interconnected layers (hence the term deep).

Deep learning models automatically learn features from raw data without the need for manual feature engineering. They excel in handling unstructured data—such as text, images, and audio—where traditional ML may struggle. For example, in NLP, deep learning enables chatbots to understand context, recognize intent, and respond naturally by learning from vast amounts of conversational data.

The impact of deep learning spans many domains, including image recognition, Text-To-Speech (TTS), language translation, recommendation systems, and autonomous vehicles. It has revolutionized industries such as healthcare, finance, retail, and digital marketing by enabling highly accurate and scalable AI solutions.

For example, a deep learning model can power a virtual assistant capable of understanding your voice commands, converting them into text, interpreting your request, and generating a human-like response—all in real time.

Deep learning’s ability to extract insights from complex, high-dimensional data has made it a cornerstone of modern AI systems.

Did you know?

Generative Pre-trained Transformers (GPTs) are deep learning models that generate natural language text. They can be customized for specific tasks and purposes, allowing users to create tailored GPTs for various applications.

GenAI

GenAI is a type of AI that can create new content, such as text, images, music, and videos. It’s like having a creative artist who, after studying many examples of art, can produce original paintings. GenAI learns from existing data and generates new, original works based on that learning.

Imagine a talented chef who not only cooks but also creates new recipes.

Figure 1.2 – An example of the GenAI process

Let’s break down these elements:

Data (ingredients and recipes): AI and ML learn from a large amount of data, similar to how a chef needs ingredients and recipes to cook.ML (learning recipes): ML helps the AI learn from this data, improving its ability to perform tasks, much like a chef practicing recipes.AI (chef cooking): The AI uses what it has learned to perform tasks, just like a chef cooking a meal.GenAI (creating new recipes): GenAI takes it a step further by creating new and original content, similar to a chef inventing new recipes.Application (delicious dishes and new creations): The result is an application that can perform intelligent tasks and create new content, providing valuable solutions and innovative creations.

Did you know?

Do you know why GenAI is so popular? Its ability to create new, original content—such as text, images, and music—is transforming industries with content creation, design, and software development. By automating creative tasks, it enhances productivity and enables rapid innovation across various fields.

LMs

LMs are a type of ML model trained to understand and generate human language. They form the foundation for many NLP tasks by analyzing vast amounts of text to learn grammar, meaning, and context.

LMs are used for a wide range of tasks such as text classification, summarization, sentiment analysis, question answering, and content generation. These models predict the next word in a sentence or evaluate the probability of a phrase, enabling them to produce coherent and contextually appropriate responses.

For example, when you ask a chatbot, “What’s the weather in London?”, an LM helps interpret your intent and generate a natural response such as “It’s currently 12°C and cloudy in London.”

Modern LMs range from small, task-specific models to LLMs such as GPT, which are capable of handling complex, multi-turn conversations and even working across modalities such as text, images, or code. These models power everyday AI experiences such as search engines, writing assistants, and virtual agents.

LLMs and SLMs

LLMs are powerful AI models trained on massive datasets that enable them to understand, generate, and reason with natural language. Their broad knowledge and contextual understanding make them ideal for tasks such as chatbots, summarization, translation, and content creation. LLMs can also operate in multimodal scenarios—processing not just text but also images, audio, or code—extending their use cases across industries. For example, an LLM can power a virtual assistant that summarizes customer emails, generates draft replies, and extracts key tasks to populate a to-do list—all within seconds.

SLMs, by contrast, offer a lightweight alternative to LLMs, delivering many of the same capabilities with fewer computational resources. They are designed for efficiency, making them suitable for running on devices with limited memory, such as laptops or mobile phones. Microsoft’s Phi model series exemplifies this, with Phi-3 and Phi-4 models offering impressive performance despite having far fewer parameters than traditional LLMs.

SLMs are especially useful when speed, cost-efficiency, and local processing are priorities. Together, LLMs and SLMs allow developers to choose the right balance of performance, size, and deployment flexibility for their AI applications. In multi-model solutions, these models can even be combined—where an SLM handles lightweight local tasks and an LLM steps in for more complex reasoning—creating a smart, efficient, and scalable AI system.

Important note

New models are continuously being introduced, offering greater power and efficiency at lower costs. Be sure to check the availability of the latest models beyond those mentioned in this book, as some versions may become outdated by the time of publication.

Six key AI capabilities

To effectively build intelligent solutions using Azure AI, it’s essential to understand six foundational capabilities that drive most modern AI applications. These capabilities—NLP, prompt engineering, RAG, grounding, embedding, and tokenization—form the building blocks for working with LMs, building chat interfaces, automating content, and retrieving relevant data. Together, these capabilities empower developers to create reliable, context-aware, and high-performing AI solutions. The following sections explain each concept with practical examples to help you connect theory to real-world application.

NLP

NLP enables AI systems to understand, interpret, and respond to human language—both spoken and written. It powers capabilities such as speech-to-text, chatbots, sentiment analysis, and language translation. For instance, when you ask a voice assistant, “What’s the weather today?”, NLP helps convert your speech to text, understand your intent, and generate a spoken response with the current forecast. Chapter 6 provides a detailed walkthrough of this topic.

Prompt engineering

This is the art of crafting clear, purposeful inputs—called prompts—that guide GenAI models to produce specific results. A well-structured prompt helps the model stay on topic and deliver accurate content. For example, prompting a model with “Summarize this email thread into key points for a meeting” can produce a concise summary, saving time and ensuring clarity. More details will be covered in the Advanced techniques in generative AI section in Chapter 8.

Fine-tuning

Fine-tuning is the process of adapting a pre-trained language model to perform better on a specific task or domain by training it further on a smaller, specialized dataset. This helps the model align more closely with the unique language, tone, or structure of your target content. For example, you can fine-tune a base GPT model to draft legal contracts or respond to customer service tickets in your organization’s preferred style. Unlike prompt engineering, which controls output by adjusting the input prompt, fine-tuning adjusts the model’s internal weights, enabling it to consistently deliver tailored responses across multiple use cases. Fine-tuning is particularly useful when accuracy, consistency, or domain specificity is critical. For a deeper dive into fine-tuning, refer to Exercise 5, Fine-tuning models with your own data, in Chapter 8.

RAG

RAG combines the power of search with language generation. Instead of relying solely on what the model was trained on, RAG retrieves relevant information from external sources and provides it to the model before it responds. This leads to more accurate, up-to-date answers. For example, a chatbot using RAG can look up your company’s internal documentation to answer a policy question, even if the base model wasn’t trained on that information. More details will be covered throughout Chapter 7 and in the Chat your own data section of Chapter 10.

Grounding

Grounding is the process of ensuring that an AI model’s responses are based on factual, real-world information rather than relying solely on its internal training data—which may be outdated or incomplete. It connects the model to trusted external sources, such as company knowledge bases, databases, or documents, so that generated responses reflect current and contextually relevant information. For example, if a user asks about your organization’s travel policy, grounding enables the AI to retrieve and cite the latest version of that policy from an internal document rather than guessing. Grounding is essential in RAG systems and plays a key role in reducing hallucinations—responses that sound plausible but are inaccurate or fabricated.

Did you know?

Grounding significantly reduces hallucinations, which are when a model generates inaccurate or made-up responses without real-world context.

Embedding

This is the technique of converting text, images, or other types of data into numerical vectors that represent their meaning and context. These vectors allow AI systems to compare, group, and search information based on similarity rather than exact matches. This is especially useful in applications such as semantic search, recommendations, and RAG, where understanding context is more important than matching keywords.

For example, in the simplified 3D vector space shown in Figure 1.3, the word cat might be represented as [0.8, 0.2, -0.5], while dog could be [0.7, 0.1, -0.4]—close in distance, showing they’re semantically similar. In contrast, an unrelated word such as car might be [-0.3, 0.9, 0.7], positioned farther away. This spatial arrangement enables AI models to reason about meaning and relationships in language. Embeddings power advanced features in Azure AI Search, such as vector search and hybrid retrieval, making it possible to deliver highly relevant and contextual search results across large, unstructured datasets.

Figure 1.3 – Similarity embedding vector space

Next, let’s look at tokenization.

Tokenization

This is the process of breaking down text into smaller units called tokens, which are the basic building blocks that LMs understand. Tokens can be full words, parts of words, or even punctuation marks. Tokenization is the first step in training and using transformer-based models such as GPT, enabling them to analyze and generate language effectively.

For example, consider the following sentence: I heard a dog bark loudly at a cat.

To tokenize this text, you can identify each discrete word and assign token IDs to them, as in this example:

- I (1) - heard (2) - a (3) - dog (4) - bark (5) - loudly (6) - at (7) - *("a" is already tokenized as 3)* - cat (8)

The sentence can now be represented with the tokens {1 2 3 4 5 6 7 3 8}. Similarly, the sentence I heard a cat could be represented as {1 2 3 8}.

As you continue to train the model, each new token in the training text is added to the vocabulary with appropriate token IDs:

meow (9)skateboard (10)And so on...

With a sufficiently large set of training texts, a vocabulary of many thousands of tokens could be compiled. To explore how tokens are calculated for LLMs, you can visit https://token-calculator.net/.

Now that you have a solid understanding of the basic AI and ML concepts, let’s explore Azure AI services in a practical way. We’ll review the available services, examine the key features of each, understand how they function, and identify when to use them effectively. This section will guide you through the services, offering insights into how they can be applied in real-world scenarios to maximize your AI solutions.

Exploring Azure AI services

Azure provides a comprehensive suite of AI services designed to accelerate the development of intelligent applications. These services cover a broad range of capabilities, including vision, language, speech, search, and GenAI. With prebuilt models, APIs, and customization options, developers can quickly integrate advanced AI features into their solutions without needing deep ML expertise.

At the core of this ecosystem is the Azure AI Foundry platform (discussed in detail in the AI Foundry section of Chapter 2)—a unified environment for building, deploying, and managing AI applications. It streamlines the development process by combining model training, data integration, and deployment workflows with enterprise-grade security and compliance features. Azure AI Foundry empowers teams to collaborate efficiently while scaling AI solutions across the organization.

Figure 1.4 – Overview of Azure AI services

The following is a breakdown of the key Azure AI services, their core features, and practical use cases.

Important note

As Azure AI services rapidly evolve, model availability, API versions, and regional support can change frequently. Before starting a project or working through the hands-on exercises in this book, it’s essential to verify that the services and models you plan to use are supported in your chosen Azure region. This step helps avoid compatibility issues and ensures a smooth deployment experience.

To help you stay up to date, the Further reading section includes direct links to the official Microsoft documentation for each service. Reviewing these resources will ensure you’re working with the most current capabilities—keeping your solutions scalable, cost-effective, and aligned with production-ready standards.

Azure AI Search

Azure AI Search (formerly Azure Cognitive Search) is a cloud-based service that enables fast, secure, and scalable information retrieval across your own data. It supports keyword, semantic, and vector-based search, making it a versatile tool for both traditional and GenAI applications.

Key features include the following:

Flexible search capabilities: Supports full-text, semantic, vector, and hybrid search across structured and unstructured contentComprehensive indexing: Offers data chunking, vectorization, Optical Character Recognition (OCR), and built-in language analysis toolsAdvanced query support: Enables fuzzy search, filters, autocomplete, faceting, geo-search, and semantic rankingSeamless integration: Easily connects with Azure OpenAI, Azure ML, and external data pipelines

How it works

Azure AI Search functions in two stages: indexing and querying. During indexing, your content is ingested, processed (e.g., chunked, vectorized, and tokenized), and stored in search indexes. Built-in AI enrichments—such as OCR and language detection—can be applied to enhance the content. When users issue queries, the service searches across the appropriate indexes and returns ranked results. Semantic ranking and hybrid retrieval ensure highly relevant responses, especially in RAG-based applications. For an in-depth look, see Figure 7.2 in Chapter 7 and the AI Search section in Chapter 10.

When to use Azure AI Search

You can use this for the following use cases:

Enterprise search portals: Enable employees to find content using natural language across large document repositories.GenAI and RAG applications: Retrieve vectorized content for context-aware language generation.Custom search experiences: Build search tools with autocomplete, filters, and synonyms tailored to your business.Centralized indexing: Unify documents, structured data, and vector content under one searchable index.Multilingual and domain-specific search: Apply linguistic rules or custom analyzers to improve accuracy across languages or specialized content domains. Implement a semantic document search tool that helps employees quickly find relevant internal reports using natural language queries.

Did you know?

OpenAI uses Azure AI Search as the vector database and retrieval systems in their RAG workloads, including ChatGPT, custom GPTs, and the Assistants API. OpenAI found Azure AI Search to be aligned with their unique scale needs, highly productive, and a complete retrieval system that went beyond vectors, offering hybrid retrieval, metadata filtering, and more.

In the video at https://youtu.be/cjIE5fBInAE?si=j4FHgQ0lczRKUWO9, discover how ChatGPT, the fastest-growing consumer app in history with over 100 million weekly active users, combines RAG-powered features, OpenAI’s trusted API, and Azure AI Search to tackle today’s and tomorrow’s biggest challenges!

Document Intelligence

Azure Document Intelligence (formerly Form Recognizer, covered in detail in Chapter 7’s Implementing Document Intelligence solution section) is a cloud-based service that automates document processing by extracting structured data from forms, invoices, receipts, and other document types. It reduces manual data entry and enables scalable, accurate document workflows.

Key features include the following:

Prebuilt, custom, and composed models: Uses ready-made models for common documents or trains custom models for unique layoutsAI-powered extraction: Identifies and extracts key-value pairs, tables, selection marks, and text from scanned documentsFlexible interfaces: Supports REST APIs, SDKs, and low-code tools for easy integration

How it works

The service processes documents through OCR and ML models. Depending on the layout, it uses either prebuilt or custom-trained models to analyze and extract information such as line items, totals, and metadata. The extracted data is returned in structured formats (e.g., JSON) that can be directly integrated into downstream systems such as Enterprise Resource Planning (ERP) or databases.

When to use Azure Document Intelligence

You can use this for the following use cases:

Invoice and receipt automation: Streamline accounts payable by extracting data from scanned or digital documents.Custom form processing: Train custom models to handle forms with domain-specific layouts.Archival and search: Convert paper archives into structured, searchable formats.Regulatory and compliance workflows: Automatically detect key fields or data patterns to ensure documentation standards. Automatically extract line items from scanned invoices and upload structured data to a financial system.

Do you know?

Document field extraction features to help automatic labeling, grounding, and confidence scores by leveraging the LLM to improve accuracy. For more details, visit https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/train/custom-model?view=doc-intel-4.0.0.

Video Indexer

Azure AI Video Indexer (covered in detail in Chapter 5’s Analyzing videos with Azure AI Video Indexer section) is a video and audio analytics service that uses prebuilt AI models to extract detailed metadata from media content, such as spoken text, faces, scenes, objects, and emotions.

Key features include the following:

Automatic transcription and translation: Supports over 50 languages and generates multilingual captionsRich media insights: Identifies topics, named entities, speaker timelines, brands, and sentimentCustom model training: Recognizes specific people or visuals using account-trained modelsContent moderation and accessibility: Detects inappropriate material and provides captioning for inclusiveness

How it works

The service ingests audio or video content and applies AI models to identify spoken words, detect objects or faces, and extract other key metadata. All metadata is indexed and made searchable via APIs or the Video Indexer portal. You can also customize recognition logic by training models to detect known individuals or brand elements.

When to use Azure AI Video Indexer

You can use this for the following use cases:

Media libraries and archives: Make large video repositories searchable by topics, people, or scenes.Broadcast and content platforms: Add multilingual subtitles, scene segmentation, and moderation filters.Corporate training and compliance: Automatically summarize and tag videos to ensure regulatory compliance and improve internal training material discoverability.Advertising and personalization: Identify product placements, brand mentions, or emotional tone. Enhance a video platform by indexing large video libraries for scene-based search and multilingual subtitles.

Azure OpenAI Service

Azure OpenAI Service (covered in detail in Chapter 8) provides secure access to advanced OpenAI models such as GPT-4, GPT-4 Turbo with Vision, and GPT-3.5. It enables enterprise-grade language capabilities such as summarization, chat, content creation, and code generation.

Important note

New models are continuously being introduced, offering greater power and efficiency at lower costs. Be sure to check the availability of the latest models beyond those mentioned in this book, as some versions may become outdated by the time of publication. For more information, visit the official documentation at https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions.

Key features include the following:

Access to powerful LMs: Includes GPT-4o, Codex, DALL-E, and embeddings modelsScalable interfaces: Uses APIs, SDKs, or the Azure OpenAI Studio for prototyping and productionEnterprise-grade controls: Integrates with Azure networking, identity, and security featuresFine-tuning and batch inference: Customizes outputs or runs large-scale processing jobs efficiently

How it works

After deploying a model in Azure, developers interact with it using prompts through REST APIs or SDKs. Prompt engineering helps shape the response. For specialized tasks, fine-tuning can adjust the model’s behavior. Azure also provides tooling to monitor usage, apply content filtering, and ensure responsible AI practices.

When to use Azure OpenAI Service

You can use this for the following use cases:

Conversational agents and copilots: Build assistants that understand context and respond naturally.Document summarization and insights: Extract key points from contracts, reports, or support tickets.Code generation and refactoring: Leverage Codex to write, review, or optimize code.Image understanding (vision): Analyze and describe visual inputs alongside text in multimodal workflows. Use GPT-4 to build a customer support chatbot that generates accurate, natural responses based on internal knowledge.

Did you know?

LangChain is a popular open source AI framework used to build applications powered by LMs, such as agents, tools, and chains. Microsoft’s Semantic Kernel is a production-ready and stable SDK designed for integrating LLMs into real-world applications with reliability and scalability. Meanwhile, AutoGen is a cutting-edge research SDK from Microsoft for developing advanced, multi-agent LLM systems, ideal for exploring state-of-the-art AI coordination and reasoning.

Azure Vision

The Azure Vision service (covered in detail in Chapter 8’s Analyzing images section) provides powerful capabilities to extract, classify, and analyze visual information from images and videos using prebuilt and custom computer vision models.

Key features include the following:

Prebuilt models: Recognizes objects, text, landmarks, celebrities, and brandsCustom vision: Trains models with your labeled images for tailored recognitionOCR and spatial analysis: Extracts text and layout from scanned documents or monitors people’s flowDeployment flexibility: Runs in the cloud or export to edge devices

How it works

You upload an image or video frame to the Vision API, which applies prebuilt or custom-trained models, depending on your needs. For example, OCR can extract text from a document, while object detection highlights specific features in a photo. Custom vision lets you build models that specialize in your specific domain data.

When to use the Azure Vision service

You can use this for the following use cases:

Manufacturing quality control: Detect visual defects or anomalies in production.Retail and inventory: Identify products on shelves and automate cataloging.Document digitization: Use OCR to convert paper records into structured text.Smart spaces: Monitor foot traffic and room usage using spatial analytics. Detect product defects on a manufacturing line using a custom-trained object detection model.

Azure Speech

The Azure Speech service (covered in detail in Chapter 6’s Processing speech by using Azure AI Speech section) offers comprehensive tools to add speech capabilities to applications, including transcription, voice synthesis, and translation—all with high accuracy and natural delivery.

Key features include the following:

Speech-to-text: Convert spoken audio into text in real-time or batch modeText-to-speech: Generate human-like speech using prebuilt or custom neural voicesSpeech translation: Enable multilingual communication across more than 60 languagesCustom speech models: Improve recognition in noisy environments or for specific jargon

How it works

Audio input is sent to the Azure Speech service via an API or SDK. The model processes use neural networks to generate transcriptions, translate into another language, or synthesize voice from text. You can fine-tune models for specialized vocabularies or dialects and deploy them across web, mobile, or IoT apps.

When to use the Azure Speech service

You can use this for the following use cases:

Customer support automation: Convert voice calls to searchable transcripts.Voice assistants: Create natural-sounding interactions with users in apps or devices.Live captioning and accessibility: Provide real-time subtitles for meetings or broadcasts.Language learning apps: Assess pronunciation and aid interactive speech practice. Create a multilingual voice assistant for a global customer service center.

Azure Language

The Azure Language service (covered in detail in Chapter 6) provides a comprehensive suite of NLP features that enable developers to build intelligent applications capable of understanding and analyzing text. This service unifies several previously available Azure AI services, including Text Analytics, QnA Maker, and Language Understanding Intelligent Service (LUIS), while introducing new capabilities such as document summarization and Personally Identifiable Information (PII) detection. Users can interact with the service through REST APIs, SDKs, or the web-based Language Studio, making it accessible and versatile for various use cases.

Key features include the following:

Text analysis: Sentiment analysis, key phrase extraction, entity recognition, and language detectionSummarization and Q&A: Automatically summarize long documents or extract answers from unstructured textPII detection and translation: Redact sensitive information and support multilingual applicationsLanguage Studio: No-code interface for training and testing NLP models

How it works

Text input is submitted through the API or Language Studio. Azure Language services use prebuilt or custom models to analyze the content, extract linguistic insights, and return results in structured formats. These insights can be used to power applications such as customer support chatbots, document summarization tools, and compliance workflows.

When to use the Azure Language service

You can use this for the following use cases:

Customer feedback analysis: Identify sentiment and trends in product reviews or surveys.Knowledge extraction: Extract structured data, such as named entities, key phrases, and summaries, from unstructured text to support search indexing and reporting pipelines.Privacy compliance: Detect and redact sensitive data (PII) before storing or sharing content.Multilingual applications: Build apps that support language detection and translation across global markets. Automatically summarize customer reviews and identify trends in product feedback.

Content Safety

The Azure Content Safety service (covered in detail in Chapter 4) provides a comprehensive suite of tools designed to detect and moderate harmful user-generated and AI-generated content across various platforms and services. The service includes powerful capabilities for text and image moderation, helping businesses maintain a safe and respectful environment for their users. Developers can interact with the service via REST APIs, SDKs, or through the intuitive Content Safety Studio, making it easy to implement and manage content safety measures.

Key features include the following:

Text and image moderation: Detects hate speech, violence, sexual content, and self-harmMulti-severity scoring: Classifies content by risk levelCustom categories: Defines moderation rules with custom filters using the Rapid APIContent Safety Studio: Visual tool for testing and refining moderation logic

How it works

Content—whether text or