35,99 €
Written by a seasoned solutions architect and Microsoft AI professional with over 25 years of IT experience, Azure AI-102 Certification Essentials will help you gain the skills and knowledge needed to confidently pass the Azure AI-102 certification exam and advance your career. This comprehensive guide covers all of the exam objectives, from designing AI solutions to integrating AI models into Azure services. By combining theoretical concepts with visual examples, hands-on exercises, and real-world use cases, the chapters teach you how to effectively apply your new-found knowledge.
The book emphasizes responsible AI practices, addressing fairness, reliability, privacy, and security, while guiding you through testing AI models with diverse data and navigating legal considerations. Featuring the latest Azure AI tools and technologies, each chapter concludes with hands-on exercises to reinforce your learning, culminating in Chapter 11's comprehensive set of 45 mock questions that simulate the actual exam and help you assess your exam readiness.
By the end of this book, you'll be able to confidently design, implement, and integrate AI solutions on Azure, while achieving this highly sought-after certification.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 472
Veröffentlichungsjahr: 2025
Azure AI-102 Certification Essentials
Master the AI Engineer Associate exam with real-world case studies and full-length mock tests
Peter T. Lee
Copyright © 2025 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
The author acknowledges the use of cutting-edge AI, such as ChatGPT, with the sole aim of enhancing the language and clarity within the book, thereby ensuring a smooth reading experience for readers. It’s important to note that the content itself has been crafted by the author and edited by a professional publishing team.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Portfolio Director: Sunith Shetty
Relationship Lead: Sanjana Gupta
Project Manager: Hemangi Lotlikar
Content Engineer: Nathanya Dias
Technical Editor: Arjun Varma
Copy Editor: Safis Editing
Proofreader: Nathanya Dias
Indexer: Rekha Nair
Production Designer: Alishon Falcon
Growth Lead: Bhavesh Amin
First published: August 2025
Production reference: 1170725
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-83620-527-2
www.packtpub.com
I am deeply grateful to my parents, SeungHoo Lee and Jungsook Lee, for instilling in me strong values and making countless sacrifices that enabled me to pursue an education in the U.S. despite financial hardships. I also extend heartfelt thanks to my aunt, Jungyeon Lee—your emotional support throughout my college years at Temple University helped me stay focused and resilient.
Peter T. Lee is a Senior Solution Architect at Microsoft, specializing in AI and data with over 25 years of IT experience spanning industries such as telecom, fintech, payments, retail, and pharmacy. Recently, his focus has been on delivering Generative AI projects, developing data extraction solutions for unstructured data, and spearheading AI initiatives in the financial, banking, insurance, and capital markets sectors. With deep expertise in cloud platforms such as Azure, AWS, and GCP, Peter excels in designing scalable and resilient architectures while enabling organizations to adopt cutting-edge AI/ML and Generative AI technologies. Holding over 18 industry certifications, he embodies a strong commitment to continuous learning and innovation.
I would like to express my deepest gratitude to my loving and patient wife, Jayeol Koo, and my son, Joshua K. Lee, for their unwavering support, boundless patience, and constant encouragement throughout the journey of writing this book.
I am deeply grateful to my parents, SeungHoo Lee and Jungsook Lee, for instilling in me strong values and making countless sacrifices that enabled me to pursue an education in the U.S. despite financial hardships. I also extend heartfelt thanks to my aunt, Jungyeon Lee—your emotional support throughout my college years at Temple University helped me stay focused and resilient.
Wilson Mok is a Microsoft MVP and Databricks Champion, passionate about helping others learn and grow in data and AI. As a Senior Data Architect and Advisor, he focuses on driving digital transformation and enabling organizations to make data-driven decisions. He shares practical insights through articles, presentations, and training, contributing to user groups, industry events, and publications. His work emphasizes leadership in creating innovative solutions that leverage modern data platforms to improve operational efficiency and deliver business value. Wilson is dedicated to mentoring professionals and inspiring the next generation to build with confidence in the AI-driven future.
Wilson Mok is a Microsoft MVP and Databricks Champion, passionate about helping others learn and grow in data and AI. As a Senior Data Architect and Advisor, he focuses on driving digital transformation and enabling organizations to make data-driven decisions. He shares practical insights through articles, presentations, and training, contributing to user groups, industry events, and publications. His work emphasizes leadership in creating innovative solutions that leverage modern data platforms to improve operational efficiency and deliver business value. Wilson is dedicated to mentoring professionals and inspiring the next generation to build with confidence in the AI-driven future.
Rahat Yasir is one of Canada’s top 30 software developers under 30 (2018) and a ten-time Microsoft MVP Award holder in AI. With expertise in imaging, data analysis, cross-platform technologies, and enterprise-level data and AI system design, he authored Windows Phone 8.1 Complete Solution and Universal Windows Platform Complete Solution. He has contributed to AI research at P2IRC, developed early AI video upscaling tools at IDS, and built a production-grade financial AI system at Intact Financial. He has led AI initiatives at OSEDEA, CAE, and ISAAC Instruments, shaping AI for manufacturing, aviation, defense, and transportation. Currently, he is Head of Data Insights & Advanced Analytics at IATA, driving AI in global aviation data management.
Steve Miles holds a senior technology leadership role within the cloud practice of part of a multi-billion turnover IT distributor. Steve is a Microsoft Azure MVP, Microsoft Certified Trainer (MCT), and an Alibaba Cloud MVP. Steve has over 25 years of Microsoft-focused technology experience, along with his previous military career in engineering, signals, and communications. Among other books, Steve is the author of the number 1 Amazon best-selling AZ-900 certification book titled Microsoft Azure Fundamentals and Beyond, as well as Microsoft Azure AI Fundamentals AI-900 Exam Guide and Microsoft Certified Azure Data Fundamentals (DP-900) Exam Guide.
Part 1 of this book is designed to provide a comprehensive foundation for working with Azure AI services. The first chapter focuses on key concepts in Artificial Intelligence (AI) and Machine Learning (ML), introducing supervised, unsupervised, and reinforcement learning, as well as advanced topics such as deep learning and Generative AI. It also covers foundational elements such as Large Language Models (LLMs) and Small Language Models (SMLs), Natural Language Processing (NLP), and prompt engineering, offering a clear understanding of these concepts without diving too deeply into technical details. The second chapter transitions into getting started with Azure AI, offering an overview of its capabilities, including services such as AI Search, Document Intelligence, Azure OpenAI Service, Vision, Speech, Language, and Content Safety, along with their features and practical applications. The third chapter focuses on managing, monitoring, and securing Azure AI services, covering critical strategies such as logging, metrics, cost management, secure key handling with Azure Key Vault, and private communication with virtual networks and private endpoints. Together, these chapters provide a solid foundation for building, deploying, and maintaining robust AI solutions.
This part has the following chapters:
Chapter 1, Understanding AI, ML, and Azure’s AI ServicesChapter 2, Getting Started with Azure AI: Studio, Pipelines, and ContainerizationChapter 3, Managing, Monitoring, and Securing Azure AI ServicesArtificial Intelligence (AI) and Machine Learning (ML) are becoming critical drivers of technological innovation, transforming industries globally. In this chapter, we’ll cover key AI and ML concepts, including supervised, unsupervised, and reinforcement learning, and touch on advanced areas such as deep learning and Generative AI (GenAI). You’ll also be introduced to essential elements such as Large and Small Language Models (LLMs and SLMs), Natural Language Processing (NLP), and prompt engineering, which are foundational for building intelligent systems. This chapter will give you a solid understanding without delving too deeply into technical theory.
Additionally, we’ll explore Azure’s key AI services, such as AI Search, Document Intelligence, Azure OpenAI Service, Vision, Speech, Language, and Content Safety. For each, we’ll outline its core features, functionality, and practical use cases. This chapter aims to build a knowledge base that will help you better understand the concepts and tools discussed in subsequent chapters. You can refer back to it for clarity as you progress through the book.
In this chapter, you’ll explore the following key topics:
Core concepts of AI, ML, and how they relate to each otherAn overview of different types of ML: supervised, unsupervised, and reinforcement learningIntroduction to deep learning and its application in real-world AI scenariosUnderstanding GenAI and how it creates new content such as text and imagesThe role of Language Models (LMs), including LLMs and SLMs, in natural language understandingPractical applications of NLP and the importance of prompt engineeringSix foundational AI techniques—prompt engineering, NLP, Retrieval-Augmented Generation (RAG), grounding, embedding, and tokenization—that power intelligent applicationsOverview of Microsoft Azure’s key AI services, including Azure AI Search, Document Intelligence, Azure OpenAI, Vision, Speech, Language, and Content SafetyReal-world scenarios where each Azure AI service is most effective and guidance on selecting the right tools for your use caseLet’s jump in and review the key concepts!
The following diagram provides a high-level overview of the relationship between AI, ML, deep learning, GenAI, and LMs. Each layer represents a subset of the previous, showcasing the evolution of AI technology. Starting with AI in 1956, ML in 1997, and deep learning in 2017, the diagram also highlights how LMs and GenAI, which emerged more recently, fit into this broader context. Further details on these technologies are discussed in the following section.
Figure 1.1 – Brief AI history
Let’s dive deeper.
While AI refers to the broader goal of simulating human intelligence, ML is one of the core methods used to achieve it. ML provides the statistical techniques and models that enable AI to learn from data.
AI is like a smart assistant that can perform tasks that typically require human intelligence, such as understanding language, recognizing images, making decisions, translating, and solving problems. Imagine having a robot that can sort your photos, play chess, translate to another language, book appointments for you, or even have conversations with you—AI makes this possible.
ML is a branch of data science focused on training models to make predictions or decisions based on data. Instead of being explicitly programmed for every task, ML enables systems to learn patterns from examples and improve over time.
ML is like teaching a child to recognize animals by showing them many pictures labeled with names. Over time, the child learns to identify new animals on their own. Similarly, ML allows computers to learn from past data and generalize to new, unseen situations.
ML is broadly categorized into three main types, each with distinct characteristics and use cases:
Supervised learning: This approach uses labeled data to train models to recognize patterns and make predictions. It’s used in scenarios where accuracy is critical, such as medical diagnosis or fraud detection. For example, a supervised learning model is trained on thousands of labeled X-ray images to detect whether a tumor is present.Unsupervised learning: Here, the model identifies patterns or groupings in data without labeled outcomes. It’s useful for discovering hidden structures, such as customer segments or anomalies. For instance, a credit card company uses unsupervised learning to detect suspicious transactions that deviate from typical user behavior.Reinforcement learning: In this type, the model learns by interacting with an environment and receiving rewards or penalties based on its actions. It’s ideal for decision-making tasks involving sequences of actions. For example, a reinforcement learning agent optimizes energy usage in a data center by adjusting cooling and power settings based on real-time conditions.Each of these types of ML has distinct advantages and is suitable for different types of problems.
Deep learning is a specialized subset of ML that uses artificial neural networks to model and learn complex patterns from large volumes of data. These networks are inspired by the structure of the human brain and consist of multiple interconnected layers (hence the term deep).
Deep learning models automatically learn features from raw data without the need for manual feature engineering. They excel in handling unstructured data—such as text, images, and audio—where traditional ML may struggle. For example, in NLP, deep learning enables chatbots to understand context, recognize intent, and respond naturally by learning from vast amounts of conversational data.
The impact of deep learning spans many domains, including image recognition, Text-To-Speech (TTS), language translation, recommendation systems, and autonomous vehicles. It has revolutionized industries such as healthcare, finance, retail, and digital marketing by enabling highly accurate and scalable AI solutions.
For example, a deep learning model can power a virtual assistant capable of understanding your voice commands, converting them into text, interpreting your request, and generating a human-like response—all in real time.
Deep learning’s ability to extract insights from complex, high-dimensional data has made it a cornerstone of modern AI systems.
Did you know?
Generative Pre-trained Transformers (GPTs) are deep learning models that generate natural language text. They can be customized for specific tasks and purposes, allowing users to create tailored GPTs for various applications.
GenAI is a type of AI that can create new content, such as text, images, music, and videos. It’s like having a creative artist who, after studying many examples of art, can produce original paintings. GenAI learns from existing data and generates new, original works based on that learning.
Imagine a talented chef who not only cooks but also creates new recipes.
Figure 1.2 – An example of the GenAI process
Let’s break down these elements:
Data (ingredients and recipes): AI and ML learn from a large amount of data, similar to how a chef needs ingredients and recipes to cook.ML (learning recipes): ML helps the AI learn from this data, improving its ability to perform tasks, much like a chef practicing recipes.AI (chef cooking): The AI uses what it has learned to perform tasks, just like a chef cooking a meal.GenAI (creating new recipes): GenAI takes it a step further by creating new and original content, similar to a chef inventing new recipes.Application (delicious dishes and new creations): The result is an application that can perform intelligent tasks and create new content, providing valuable solutions and innovative creations.Did you know?
Do you know why GenAI is so popular? Its ability to create new, original content—such as text, images, and music—is transforming industries with content creation, design, and software development. By automating creative tasks, it enhances productivity and enables rapid innovation across various fields.
LMs are a type of ML model trained to understand and generate human language. They form the foundation for many NLP tasks by analyzing vast amounts of text to learn grammar, meaning, and context.
LMs are used for a wide range of tasks such as text classification, summarization, sentiment analysis, question answering, and content generation. These models predict the next word in a sentence or evaluate the probability of a phrase, enabling them to produce coherent and contextually appropriate responses.
For example, when you ask a chatbot, “What’s the weather in London?”, an LM helps interpret your intent and generate a natural response such as “It’s currently 12°C and cloudy in London.”
Modern LMs range from small, task-specific models to LLMs such as GPT, which are capable of handling complex, multi-turn conversations and even working across modalities such as text, images, or code. These models power everyday AI experiences such as search engines, writing assistants, and virtual agents.
LLMs are powerful AI models trained on massive datasets that enable them to understand, generate, and reason with natural language. Their broad knowledge and contextual understanding make them ideal for tasks such as chatbots, summarization, translation, and content creation. LLMs can also operate in multimodal scenarios—processing not just text but also images, audio, or code—extending their use cases across industries. For example, an LLM can power a virtual assistant that summarizes customer emails, generates draft replies, and extracts key tasks to populate a to-do list—all within seconds.
SLMs, by contrast, offer a lightweight alternative to LLMs, delivering many of the same capabilities with fewer computational resources. They are designed for efficiency, making them suitable for running on devices with limited memory, such as laptops or mobile phones. Microsoft’s Phi model series exemplifies this, with Phi-3 and Phi-4 models offering impressive performance despite having far fewer parameters than traditional LLMs.
SLMs are especially useful when speed, cost-efficiency, and local processing are priorities. Together, LLMs and SLMs allow developers to choose the right balance of performance, size, and deployment flexibility for their AI applications. In multi-model solutions, these models can even be combined—where an SLM handles lightweight local tasks and an LLM steps in for more complex reasoning—creating a smart, efficient, and scalable AI system.
Important note
New models are continuously being introduced, offering greater power and efficiency at lower costs. Be sure to check the availability of the latest models beyond those mentioned in this book, as some versions may become outdated by the time of publication.
To effectively build intelligent solutions using Azure AI, it’s essential to understand six foundational capabilities that drive most modern AI applications. These capabilities—NLP, prompt engineering, RAG, grounding, embedding, and tokenization—form the building blocks for working with LMs, building chat interfaces, automating content, and retrieving relevant data. Together, these capabilities empower developers to create reliable, context-aware, and high-performing AI solutions. The following sections explain each concept with practical examples to help you connect theory to real-world application.
NLP enables AI systems to understand, interpret, and respond to human language—both spoken and written. It powers capabilities such as speech-to-text, chatbots, sentiment analysis, and language translation. For instance, when you ask a voice assistant, “What’s the weather today?”, NLP helps convert your speech to text, understand your intent, and generate a spoken response with the current forecast. Chapter 6 provides a detailed walkthrough of this topic.
This is the art of crafting clear, purposeful inputs—called prompts—that guide GenAI models to produce specific results. A well-structured prompt helps the model stay on topic and deliver accurate content. For example, prompting a model with “Summarize this email thread into key points for a meeting” can produce a concise summary, saving time and ensuring clarity. More details will be covered in the Advanced techniques in generative AI section in Chapter 8.
Fine-tuning is the process of adapting a pre-trained language model to perform better on a specific task or domain by training it further on a smaller, specialized dataset. This helps the model align more closely with the unique language, tone, or structure of your target content. For example, you can fine-tune a base GPT model to draft legal contracts or respond to customer service tickets in your organization’s preferred style. Unlike prompt engineering, which controls output by adjusting the input prompt, fine-tuning adjusts the model’s internal weights, enabling it to consistently deliver tailored responses across multiple use cases. Fine-tuning is particularly useful when accuracy, consistency, or domain specificity is critical. For a deeper dive into fine-tuning, refer to Exercise 5, Fine-tuning models with your own data, in Chapter 8.
RAG combines the power of search with language generation. Instead of relying solely on what the model was trained on, RAG retrieves relevant information from external sources and provides it to the model before it responds. This leads to more accurate, up-to-date answers. For example, a chatbot using RAG can look up your company’s internal documentation to answer a policy question, even if the base model wasn’t trained on that information. More details will be covered throughout Chapter 7 and in the Chat your own data section of Chapter 10.
Grounding is the process of ensuring that an AI model’s responses are based on factual, real-world information rather than relying solely on its internal training data—which may be outdated or incomplete. It connects the model to trusted external sources, such as company knowledge bases, databases, or documents, so that generated responses reflect current and contextually relevant information. For example, if a user asks about your organization’s travel policy, grounding enables the AI to retrieve and cite the latest version of that policy from an internal document rather than guessing. Grounding is essential in RAG systems and plays a key role in reducing hallucinations—responses that sound plausible but are inaccurate or fabricated.
Did you know?
Grounding significantly reduces hallucinations, which are when a model generates inaccurate or made-up responses without real-world context.
This is the technique of converting text, images, or other types of data into numerical vectors that represent their meaning and context. These vectors allow AI systems to compare, group, and search information based on similarity rather than exact matches. This is especially useful in applications such as semantic search, recommendations, and RAG, where understanding context is more important than matching keywords.
For example, in the simplified 3D vector space shown in Figure 1.3, the word cat might be represented as [0.8, 0.2, -0.5], while dog could be [0.7, 0.1, -0.4]—close in distance, showing they’re semantically similar. In contrast, an unrelated word such as car might be [-0.3, 0.9, 0.7], positioned farther away. This spatial arrangement enables AI models to reason about meaning and relationships in language. Embeddings power advanced features in Azure AI Search, such as vector search and hybrid retrieval, making it possible to deliver highly relevant and contextual search results across large, unstructured datasets.
Figure 1.3 – Similarity embedding vector space
Next, let’s look at tokenization.
This is the process of breaking down text into smaller units called tokens, which are the basic building blocks that LMs understand. Tokens can be full words, parts of words, or even punctuation marks. Tokenization is the first step in training and using transformer-based models such as GPT, enabling them to analyze and generate language effectively.
For example, consider the following sentence: I heard a dog bark loudly at a cat.
To tokenize this text, you can identify each discrete word and assign token IDs to them, as in this example:
- I (1) - heard (2) - a (3) - dog (4) - bark (5) - loudly (6) - at (7) - *("a" is already tokenized as 3)* - cat (8)The sentence can now be represented with the tokens {1 2 3 4 5 6 7 3 8}. Similarly, the sentence I heard a cat could be represented as {1 2 3 8}.
As you continue to train the model, each new token in the training text is added to the vocabulary with appropriate token IDs:
meow (9)skateboard (10)And so on...With a sufficiently large set of training texts, a vocabulary of many thousands of tokens could be compiled. To explore how tokens are calculated for LLMs, you can visit https://token-calculator.net/.
Now that you have a solid understanding of the basic AI and ML concepts, let’s explore Azure AI services in a practical way. We’ll review the available services, examine the key features of each, understand how they function, and identify when to use them effectively. This section will guide you through the services, offering insights into how they can be applied in real-world scenarios to maximize your AI solutions.
Azure provides a comprehensive suite of AI services designed to accelerate the development of intelligent applications. These services cover a broad range of capabilities, including vision, language, speech, search, and GenAI. With prebuilt models, APIs, and customization options, developers can quickly integrate advanced AI features into their solutions without needing deep ML expertise.
At the core of this ecosystem is the Azure AI Foundry platform (discussed in detail in the AI Foundry section of Chapter 2)—a unified environment for building, deploying, and managing AI applications. It streamlines the development process by combining model training, data integration, and deployment workflows with enterprise-grade security and compliance features. Azure AI Foundry empowers teams to collaborate efficiently while scaling AI solutions across the organization.
Figure 1.4 – Overview of Azure AI services
The following is a breakdown of the key Azure AI services, their core features, and practical use cases.
Important note
As Azure AI services rapidly evolve, model availability, API versions, and regional support can change frequently. Before starting a project or working through the hands-on exercises in this book, it’s essential to verify that the services and models you plan to use are supported in your chosen Azure region. This step helps avoid compatibility issues and ensures a smooth deployment experience.
To help you stay up to date, the Further reading section includes direct links to the official Microsoft documentation for each service. Reviewing these resources will ensure you’re working with the most current capabilities—keeping your solutions scalable, cost-effective, and aligned with production-ready standards.
Azure AI Search (formerly Azure Cognitive Search) is a cloud-based service that enables fast, secure, and scalable information retrieval across your own data. It supports keyword, semantic, and vector-based search, making it a versatile tool for both traditional and GenAI applications.
Key features include the following:
Flexible search capabilities: Supports full-text, semantic, vector, and hybrid search across structured and unstructured contentComprehensive indexing: Offers data chunking, vectorization, Optical Character Recognition (OCR), and built-in language analysis toolsAdvanced query support: Enables fuzzy search, filters, autocomplete, faceting, geo-search, and semantic rankingSeamless integration: Easily connects with Azure OpenAI, Azure ML, and external data pipelinesAzure AI Search functions in two stages: indexing and querying. During indexing, your content is ingested, processed (e.g., chunked, vectorized, and tokenized), and stored in search indexes. Built-in AI enrichments—such as OCR and language detection—can be applied to enhance the content. When users issue queries, the service searches across the appropriate indexes and returns ranked results. Semantic ranking and hybrid retrieval ensure highly relevant responses, especially in RAG-based applications. For an in-depth look, see Figure 7.2 in Chapter 7 and the AI Search section in Chapter 10.
You can use this for the following use cases:
Enterprise search portals: Enable employees to find content using natural language across large document repositories.GenAI and RAG applications: Retrieve vectorized content for context-aware language generation.Custom search experiences: Build search tools with autocomplete, filters, and synonyms tailored to your business.Centralized indexing: Unify documents, structured data, and vector content under one searchable index.Multilingual and domain-specific search: Apply linguistic rules or custom analyzers to improve accuracy across languages or specialized content domains. Implement a semantic document search tool that helps employees quickly find relevant internal reports using natural language queries.Did you know?
OpenAI uses Azure AI Search as the vector database and retrieval systems in their RAG workloads, including ChatGPT, custom GPTs, and the Assistants API. OpenAI found Azure AI Search to be aligned with their unique scale needs, highly productive, and a complete retrieval system that went beyond vectors, offering hybrid retrieval, metadata filtering, and more.
In the video at https://youtu.be/cjIE5fBInAE?si=j4FHgQ0lczRKUWO9, discover how ChatGPT, the fastest-growing consumer app in history with over 100 million weekly active users, combines RAG-powered features, OpenAI’s trusted API, and Azure AI Search to tackle today’s and tomorrow’s biggest challenges!
Azure Document Intelligence (formerly Form Recognizer, covered in detail in Chapter 7’s Implementing Document Intelligence solution section) is a cloud-based service that automates document processing by extracting structured data from forms, invoices, receipts, and other document types. It reduces manual data entry and enables scalable, accurate document workflows.
Key features include the following:
Prebuilt, custom, and composed models: Uses ready-made models for common documents or trains custom models for unique layoutsAI-powered extraction: Identifies and extracts key-value pairs, tables, selection marks, and text from scanned documentsFlexible interfaces: Supports REST APIs, SDKs, and low-code tools for easy integrationThe service processes documents through OCR and ML models. Depending on the layout, it uses either prebuilt or custom-trained models to analyze and extract information such as line items, totals, and metadata. The extracted data is returned in structured formats (e.g., JSON) that can be directly integrated into downstream systems such as Enterprise Resource Planning (ERP) or databases.
You can use this for the following use cases:
Invoice and receipt automation: Streamline accounts payable by extracting data from scanned or digital documents.Custom form processing: Train custom models to handle forms with domain-specific layouts.Archival and search: Convert paper archives into structured, searchable formats.Regulatory and compliance workflows: Automatically detect key fields or data patterns to ensure documentation standards. Automatically extract line items from scanned invoices and upload structured data to a financial system.Do you know?
Document field extraction features to help automatic labeling, grounding, and confidence scores by leveraging the LLM to improve accuracy. For more details, visit https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/train/custom-model?view=doc-intel-4.0.0.
Azure AI Video Indexer (covered in detail in Chapter 5’s Analyzing videos with Azure AI Video Indexer section) is a video and audio analytics service that uses prebuilt AI models to extract detailed metadata from media content, such as spoken text, faces, scenes, objects, and emotions.
Key features include the following:
Automatic transcription and translation: Supports over 50 languages and generates multilingual captionsRich media insights: Identifies topics, named entities, speaker timelines, brands, and sentimentCustom model training: Recognizes specific people or visuals using account-trained modelsContent moderation and accessibility: Detects inappropriate material and provides captioning for inclusivenessThe service ingests audio or video content and applies AI models to identify spoken words, detect objects or faces, and extract other key metadata. All metadata is indexed and made searchable via APIs or the Video Indexer portal. You can also customize recognition logic by training models to detect known individuals or brand elements.
You can use this for the following use cases:
Media libraries and archives: Make large video repositories searchable by topics, people, or scenes.Broadcast and content platforms: Add multilingual subtitles, scene segmentation, and moderation filters.Corporate training and compliance: Automatically summarize and tag videos to ensure regulatory compliance and improve internal training material discoverability.Advertising and personalization: Identify product placements, brand mentions, or emotional tone. Enhance a video platform by indexing large video libraries for scene-based search and multilingual subtitles.Azure OpenAI Service (covered in detail in Chapter 8) provides secure access to advanced OpenAI models such as GPT-4, GPT-4 Turbo with Vision, and GPT-3.5. It enables enterprise-grade language capabilities such as summarization, chat, content creation, and code generation.
Important note
New models are continuously being introduced, offering greater power and efficiency at lower costs. Be sure to check the availability of the latest models beyond those mentioned in this book, as some versions may become outdated by the time of publication. For more information, visit the official documentation at https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions.
Key features include the following:
Access to powerful LMs: Includes GPT-4o, Codex, DALL-E, and embeddings modelsScalable interfaces: Uses APIs, SDKs, or the Azure OpenAI Studio for prototyping and productionEnterprise-grade controls: Integrates with Azure networking, identity, and security featuresFine-tuning and batch inference: Customizes outputs or runs large-scale processing jobs efficientlyAfter deploying a model in Azure, developers interact with it using prompts through REST APIs or SDKs. Prompt engineering helps shape the response. For specialized tasks, fine-tuning can adjust the model’s behavior. Azure also provides tooling to monitor usage, apply content filtering, and ensure responsible AI practices.
You can use this for the following use cases:
Conversational agents and copilots: Build assistants that understand context and respond naturally.Document summarization and insights: Extract key points from contracts, reports, or support tickets.Code generation and refactoring: Leverage Codex to write, review, or optimize code.Image understanding (vision): Analyze and describe visual inputs alongside text in multimodal workflows. Use GPT-4 to build a customer support chatbot that generates accurate, natural responses based on internal knowledge.Did you know?
LangChain is a popular open source AI framework used to build applications powered by LMs, such as agents, tools, and chains. Microsoft’s Semantic Kernel is a production-ready and stable SDK designed for integrating LLMs into real-world applications with reliability and scalability. Meanwhile, AutoGen is a cutting-edge research SDK from Microsoft for developing advanced, multi-agent LLM systems, ideal for exploring state-of-the-art AI coordination and reasoning.
The Azure Vision service (covered in detail in Chapter 8’s Analyzing images section) provides powerful capabilities to extract, classify, and analyze visual information from images and videos using prebuilt and custom computer vision models.
Key features include the following:
Prebuilt models: Recognizes objects, text, landmarks, celebrities, and brandsCustom vision: Trains models with your labeled images for tailored recognitionOCR and spatial analysis: Extracts text and layout from scanned documents or monitors people’s flowDeployment flexibility: Runs in the cloud or export to edge devicesYou upload an image or video frame to the Vision API, which applies prebuilt or custom-trained models, depending on your needs. For example, OCR can extract text from a document, while object detection highlights specific features in a photo. Custom vision lets you build models that specialize in your specific domain data.
You can use this for the following use cases:
Manufacturing quality control: Detect visual defects or anomalies in production.Retail and inventory: Identify products on shelves and automate cataloging.Document digitization: Use OCR to convert paper records into structured text.Smart spaces: Monitor foot traffic and room usage using spatial analytics. Detect product defects on a manufacturing line using a custom-trained object detection model.The Azure Speech service (covered in detail in Chapter 6’s Processing speech by using Azure AI Speech section) offers comprehensive tools to add speech capabilities to applications, including transcription, voice synthesis, and translation—all with high accuracy and natural delivery.
Key features include the following:
Speech-to-text: Convert spoken audio into text in real-time or batch modeText-to-speech: Generate human-like speech using prebuilt or custom neural voicesSpeech translation: Enable multilingual communication across more than 60 languagesCustom speech models: Improve recognition in noisy environments or for specific jargonAudio input is sent to the Azure Speech service via an API or SDK. The model processes use neural networks to generate transcriptions, translate into another language, or synthesize voice from text. You can fine-tune models for specialized vocabularies or dialects and deploy them across web, mobile, or IoT apps.
You can use this for the following use cases:
Customer support automation: Convert voice calls to searchable transcripts.Voice assistants: Create natural-sounding interactions with users in apps or devices.Live captioning and accessibility: Provide real-time subtitles for meetings or broadcasts.Language learning apps: Assess pronunciation and aid interactive speech practice. Create a multilingual voice assistant for a global customer service center.The Azure Language service (covered in detail in Chapter 6) provides a comprehensive suite of NLP features that enable developers to build intelligent applications capable of understanding and analyzing text. This service unifies several previously available Azure AI services, including Text Analytics, QnA Maker, and Language Understanding Intelligent Service (LUIS), while introducing new capabilities such as document summarization and Personally Identifiable Information (PII) detection. Users can interact with the service through REST APIs, SDKs, or the web-based Language Studio, making it accessible and versatile for various use cases.
Key features include the following:
Text analysis: Sentiment analysis, key phrase extraction, entity recognition, and language detectionSummarization and Q&A: Automatically summarize long documents or extract answers from unstructured textPII detection and translation: Redact sensitive information and support multilingual applicationsLanguage Studio: No-code interface for training and testing NLP modelsText input is submitted through the API or Language Studio. Azure Language services use prebuilt or custom models to analyze the content, extract linguistic insights, and return results in structured formats. These insights can be used to power applications such as customer support chatbots, document summarization tools, and compliance workflows.
You can use this for the following use cases:
Customer feedback analysis: Identify sentiment and trends in product reviews or surveys.Knowledge extraction: Extract structured data, such as named entities, key phrases, and summaries, from unstructured text to support search indexing and reporting pipelines.Privacy compliance: Detect and redact sensitive data (PII) before storing or sharing content.Multilingual applications: Build apps that support language detection and translation across global markets. Automatically summarize customer reviews and identify trends in product feedback.The Azure Content Safety service (covered in detail in Chapter 4) provides a comprehensive suite of tools designed to detect and moderate harmful user-generated and AI-generated content across various platforms and services. The service includes powerful capabilities for text and image moderation, helping businesses maintain a safe and respectful environment for their users. Developers can interact with the service via REST APIs, SDKs, or through the intuitive Content Safety Studio, making it easy to implement and manage content safety measures.
Key features include the following:
Text and image moderation: Detects hate speech, violence, sexual content, and self-harmMulti-severity scoring: Classifies content by risk levelCustom categories: Defines moderation rules with custom filters using the Rapid APIContent Safety Studio: Visual tool for testing and refining moderation logicContent—whether text or