32,39 €
Computer vision (CV) is a field of artificial intelligence that helps transform visual data into actionable insights to solve a wide range of business challenges. This book provides prescriptive guidance to anyone looking to learn how to approach CV problems for quickly building and deploying production-ready models.
You’ll begin by exploring the applications of CV and the features of Amazon Rekognition and Amazon Lookout for Vision. The book will then walk you through real-world use cases such as identity verification, real-time video analysis, content moderation, and detecting manufacturing defects that’ll enable you to understand how to implement AWS AI/ML services. As you make progress, you'll also use Amazon SageMaker for data annotation, training, and deploying CV models. In the concluding chapters, you'll work with practical code examples, and discover best practices and design principles for scaling, reducing cost, improving the security posture, and mitigating bias of CV workloads.
By the end of this AWS book, you'll be able to accelerate your business outcomes by building and implementing CV into your production environments with the help of AWS AI/ML services.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 276
Veröffentlichungsjahr: 2023
Build and deploy real-world CV solutions with Amazon Rekognition, Lookout for Vision, and SageMaker
Lauren Mullennex
Nate Bachmeier
Jay Rao
BIRMINGHAM—MUMBAI
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Dinesh Chaudhary
Content Development Editor: Joseph Sunil
Technical Editor: Sweety Pagaria
Copy Editor: Safis Editing
Project Coordinator: Farheen Fathima
Proofreader: Safis Editing
Indexer: Manju Arasan
Production Designer: Ponraj Dhandapani
Marketing Coordinator: Shifa Ansari
First published: April 2023
Production reference: 1240323
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80107-868-9
www.packtpub.com
To my father, brother, mother, and Denise – there are not enough words to express how grateful I am for your support and guidance. Thank you for your encouragement and for teaching me how to persevere.
– Lauren Mullennex
To my parents, sister, and my wife – who always believed in me. Thank you for your continued love and support!
– Jay Rao
Lauren Mullennex is a senior AI/ML specialist solutions architect at AWS. She has broad experience in infrastructure, DevOps, and cloud architecture across multiple industries. She has published multiple AWS AI/ML blogs, spoken at AWS conferences, and focuses on developing solutions using CV and MLOps.
Nate Bachmeier is a principal solutions architect at AWS (PhD. CS, MBA). He nomadically explores the world one cloud integration at a time, focusing on the financial service industry.
Jay Rao is a principal solutions architect at AWS. He enjoys providing technical and strategic guidance to customers and helping them design and implement solutions.
Morteza Kiadi is a seasoned cloud computing and machine learning technical instructor with over fifteen years of expertise. Morteza pursued his Ph.D. in AI and Optimization while held several positions in enterprises and startups. As a senior technical trainer at AWS, Morteza instructs all machine learning courses in AWS, supporting AWS customers in mastering challenging topics and cutting-edge technology. Morteza utilized his skills in academia, industry, teaching, and analyzing other publications to render this book accessible to its readers by providing insightful and impartial reviews.
I am extremely grateful to Emily Webber for inspiring me to review this book. I am thankful to the entire Packt Publishing team for your ongoing assistance in the publishing of this book.
Computer vision (CV) transforms visual data into actionable insights to solve many business challenges. In recent years, due to the availability of increased computing power and access to vast amounts of data, CV has become more accessible. Amazon Web Services (AWS) has played an important role in democratizing CV by providing services to build, train, and deploy CV models.
In this book, you will begin by exploring the applications of CV and features of Amazon Rekognition and Amazon Lookout for Vision. Then, you’ll walk through real-world use cases such as identity verification, real-time video analysis, content moderation, and detecting manufacturing defects to understand how to implement AWS AI/ML services. You’ll also use Amazon SageMaker for data annotation, training, and deploying CV models. As you progress, you’ll learn best practices and design principles for scaling, reducing cost, improving the security posture, and mitigating the bias of CV workloads.
By the end of this book, you’ll be able to accelerate your business outcomes by building and implementing CV into your production environments with AWS AI/ML services.
If you are a machine learning engineer, a data scientist, or want to better understand best practices and how to build comprehensive CV solutions on AWS, this book is for you. Knowledge of AWS basics is required to grasp the concepts covered in this book more effectively. A solid understanding of ML concepts and the Python programming language will also be beneficial.
Chapter 1, Computer Vision Applications and AWS AI/ML Overview, provides an introduction to CV and summarizes use cases where CV can be applied to solve business challenges. It also includes an overview of the AWS AI/ML services.
Chapter 2, Interacting with Amazon Rekognition, covers an overview of Amazon Rekognition and details the different capabilities available, including walking through the Amazon Rekognition console, and how to use the APIs.
Chapter 3, Creating Custom Models with Amazon Rekognition Custom Labels, provides a detailed introduction to Amazon Rekognition Custom Labels, what its benefits are, and a code example to train a custom object detection model.
Chapter 4, Using Identity Verification to Build a Contactless Hotel Check-In System, dives deep into a real-world use case using Amazon Rekognition and other AWS AI services to build applications that demonstrate how to solve business challenges using core CV capabilities. A code example is provided to build a mobile application for customers to register their faces and check into a fictional hotel kiosk system.
Chapter 5, Automating a Video Analysis Pipeline, dives deep into a real-world use case using Amazon Rekognition to build an application that demonstrates how to solve business challenges using core CV capabilities. A code example is provided to build a real-time video analysis pipeline using Amazon Rekognition Video APIs.
Chapter 6, Moderating Content with AWS AI Services, dives deep into a real-world use case using Amazon Rekognition and other AWS AI services to build applications that demonstrate how to solve business challenges using core CV capabilities. A code example is provided to build content moderation workflows.
Chapter 7, Introducing Amazon Lookout for Computer Vision, provides a detailed introduction to Amazon Lookout for Vision, what its functions are, and a code example to train a model to detect anomalies.
Chapter 8, Detecting Manufacturing Defects Using CV at the Edge, dives deeper into Amazon Lookout for Vision, covers the benefits of deploying CV at the edge, and walks through a code example to train a model to detect anomalies in manufacturing parts.
Chapter 9, Labeling Data with Amazon SageMaker Ground Truth, provides a detailed introduction to Amazon SageMaker Ground Truth, what its benefits are, and a code example to integrate a human labeling job into offline data labeling workflows.
Chapter 10, Using Amazon SageMaker for ComputerVision, dives deeper into Amazon SageMaker, covers its capabilities, and walks through a code example to train a model using a built-in image classifier.
Chapter 11, Integrating Human-in-the-Loop with Amazon Augmented AI, provides a detailed introduction to Amazon Augmented AI (Amazon A2I), what its functions are, and a code example that uses human reviewers to improve the accuracy of your CV workflows.
Chapter 12, Best Practices for Designing an End-to-End CV Pipeline, covers best practices that can be applied to CV workloads across the entire ML lifecycle, including considerations for cost optimization, scaling, security, and developing an MLOps strategy.
Chapter 13, Applying AI Governance in CV, discusses the purpose of establishing an AI governance framework, introduces Amazon SageMaker for ML governance, and provides an overview of the importance of mitigating bias.
You will need access to an AWS account, so before getting started, we recommend that you create one.
Software/hardware covered in the book
Operating system requirements/Account creation requirements
Access to or signing up for an AWS account
https://portal.aws.amazon.com/billing/signup
Jupyter Notebook
Windows/macOS
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Computer-Vision-on-AWS. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Once the model is hosted, you can start analyzing your images using the DetectAnomalies API.”
A block of code is set as follows:
{ "SubscriptionArn": "arn:aws:sns:region:account:AmazonRekognitionPersonTrackingTopic:04877b15-7c19-4ce5-b958-969c5b9a1ecb" }When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[aws sns subscribe \ --region us-east-2 \ --topic-arn arn:aws:sns:region:account:AmazonRekognitionPersonTrackingTopic \ --protocol sqs \ --notification-endpoint arn:aws:sqs:region:account:PersonTrackingQueueAny command-line input or output is written as follows:
$ git clone https://github.com/PacktPublishing/Computer-Vision-on-AWS $ cd Computer-Vision-on-AWS/07_LookoutForVisionBold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “If you’re a first-time user of the service, it will ask permission to create an S3 bucket to store your project files. Click Create S3 bucket.”
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you’ve read Computer Vision on AWS, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link belowhttps://packt.link/free-ebook/9781801078689
Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directlyAs a machine learning engineer or data scientist, this section helps you better understand how CV can be applied to solve business challenges and gives a comprehensive overview of the AWS AI/ML services available.
This first part consists of three cumulative chapters that will cover the core concepts of CV, a detailed introduction to Amazon Rekognition, and how to create a custom classification model using Amazon Rekognition Custom Labels.
By the end of this part, you will understand how to apply CV to accelerate your business outcomes, what AWS AI/ML services for your CV workloads, and how to use Amazon Rekognition for tasks including classification and object detection.
This part comprises the following chapters:
Chapter 1, Computer Vision Applications and AWS AI/ML OverviewChapter 2, Interacting with Amazon RekognitionChapter 3, Creating Custom Models with Amazon Rekognition Custom LabelsIn the past decade, the field of computer vision (CV) has rapidly advanced. Research in deep learning (DL) techniques has helped computers mimic human brains to “see” content in videos and images and transform it into actionable insights. There are examples of the wide variety of applications of CV all around us, including self-driving cars, text and handwriting detection, classifying different types of skin cancer in images, industrial equipment inspection, and detecting faces and objects in videos. Despite recent advancements, the availability of vast amounts of data from disparate sources has posed challenges in creating scalable CV solutions that achieve high-quality results. Automating a production CV pipeline is a cumbersome task requiring many steps. You may be asking, “How do I get started?” and “What are the best practices?”.
If you are a machine learning (ML) engineer or data scientist or want to better understand how to build and implement comprehensive CV solutions on Amazon Web Services (AWS), this book is for you. We provide practical code examples, tips, and step-by-step explanations to help you quickly deploy and automate production CV models. We assume that you have intermediate-level knowledge of artificial intelligence (AI) and ML concepts. In this first chapter, we will introduce CV and address implementation challenges, discuss the prevalence of CV across a variety of use cases, and learn about AWS AI/ML services.
In this chapter, we will cover the following:
Understanding CVSolving business challenges with CVExploring AWS AI/ML servicesSetting up your AWS environmentYou will need a computer with internet access to create an AWS account to set up Amazon SageMaker to run the code samples in the following chapters. The Python code and sample datasets for the solutions discussed are available at https://github.com/PacktPublishing/Computer-Vision-on-AWS.
CV is a domain within AI and ML. It enables computers to detect and understand visual inputs (videos and images) to make predictions:
Figure 1.1 – CV is a subdomain of AI and ML
Before we discuss the inner workings of a CV system, let’s summarize the different types of ML algorithms:
Supervised learning (SL)—Takes a set of labeled input data and predicts a known target value. For example, a model may be trained on a set of labeled dog images. When a new unlabeled dog image is processed by the model, the model correctly predicts that the image is a dog instead of a cat.Unsupervised learning (UL)—Unlabeled data is provided, and patterns or structures need to be found within the data since no labeled target value is present. One example of UL is a targeted marketing campaign where customers need to be segmented into groups based on various common attributes such as demographics.Semi-supervised learning—consists of unlabeled and labeled data. This is beneficial for CV tasks, since it is a time-consuming process to label individual images. With this method, only some of the images in the dataset need to be labeled, in order to label and classify the unlabeled images.Now that we’ve covered the different types of ML training methods, how does this relate to CV? DL algorithms are commonly used to solve CV problems. These algorithms are composed of artificial neural networks (ANNs) containing layers of nodes, which function like a neuron in a human brain. A neural network (NN) has multiple layers, including one or more input layers, hidden layers, and output layers. Input data flows through the input layers. The nodes perform transformations of the input data in the hidden layers and produce output to the output layer. The output layer is where predictions of the input data occur. The following figure shows an example of a deep NN (DNN) architecture:
Figure 1.2 – DNN architecture
How does this architecture apply to real-world applications? With CV and DL technology, you can detect patterns in images and use these patterns for classification. One type of NN that excels in classifying images is a convolutional NN (CNN). CNNs were inspired by ANNs. The way the nodes in a CNN communicate replicates how animals visualize the world. One application of CNNs is classifying X-ray images to assist doctors with medical diagnoses:
Figure 1.3 – Image classification of X-rays
There are multiple types of problems that CV can solve that we will highlight throughout this book. Localization locates one or more objects in an image and draws a bounding box around the object(s). Object detection uses localization and classification to identify and classify one or multiple objects in an image. These tasks are more complicated than image classification. Faster R-CNN (Regions with CNN), SSD (single shot detector), and YOLO (you only look once) are other types of DNN models that can be used for object detection tasks. These models are designed for performance such as decreasing latency and increasing accuracy.
Segmentation—including instance segmentation and semantic segmentation—highlights the pixels of an image, instead of objects, and classifies them. Segmentation can also be applied to videos to detect black frames, color bars, end credits, and shot changes:
Figure 1.4 – Examples of different CV problem types
Despite recent advances in CV and DL, there are still challenges within the field. CV systems are complex, there are vast amounts of data to process, and considerations need to be taken before training a model. It is important to understand the data available since a model is only as good as the quality of your data, and the steps required to prepare the data for model training.
CV deals with images and videos, which are a form of unstructured data. Unstructured data does not have a predefined data model and cannot be stored in a database row and column format. This type of data poses unique challenges compared to tabular data. More processing is required to transform the data into a usable format. A computer sees an image as a matrix of pixel values. A pixel is a set of numbers between 0-255 in the red, green, blue (RGB) system. Images vary in their resolutions, dimensions, and colors. In order to train a model, CV algorithms require that images are normalized such that they are the same size. Additional image processing techniques include resizing, rotating, enhancing the resolution, and converting from RGB to grayscale. Another technique is image masking, which allows us to focus on a region of interest. In the following photos, we apply a mask to highlight the motorcycle:
Figure 1.5 – Applying an image mask to highlight the motorcycle
Preprocessing is important since images are often large and take up lots of storage. Resizing an image and converting it to grayscale can speed up the ML training process. However, this technique is not always optimal for the problem we’re trying to solve. For example, in medical image analysis such as skin cancer diagnosis, the colors of the samples are relevant for a proper diagnosis. This is why it’s important to have a complete understanding of the business problem you’re trying to solve before choosing how to process your data. In the following chapters, we’ll provide code examples that detail various image preprocessing steps.
Features or attributes in ML are important input data characteristics that affect the output or target variable of a model. Distinct features in an image help a model differentiate objects from one another. Determining relevant features depends on the context of your business problem. If you’re trying to identify a Golden Retriever dog in a group of images also containing cats, then height is an important feature. However, if you’re looking to classify different types of dogs, then height is not always a distinguishing feature since Golden Retrievers are similar in height to many other dog breeds. In this case, color and coat length might be moreuseful features.
Data annotation or data labeling is the process of labeling your input datasets. It helps derive value from your unstructured data for SL. Some of the challenges with data labeling are that it is a manual process that is time-consuming, humans have a bias for labeling an object, and it’s difficult to scale. Amazon SageMaker Ground Truth Plus (https://aws.amazon.com/sagemaker/data-labeling/) helps address these challenges by automating this process. It contains a labeling user interface (UI) and quality workflow customizations. The labeling is done by an expert workforce with domain knowledge of the ML tasks to complete. This improves the label quality and leads to better training datasets. In Chapter 9, we will cover a code example using SageMaker Ground Truth Plus.
Amazon Rekognition Custom Labels (https://aws.amazon.com/rekognition/custom-labels-features/) also provides a visual interface to label your images. Labels can be applied to the entire image or you can create bounding boxes to label specific objects. In the next two chapters, we will discuss Amazon Rekognition and Rekognition Custom Labels in more detail.
In this section, we discussed the architecture behind DL CV algorithms. We also covered data processing, feature engineering, and data labeling considerations to create high-quality training datasets. In the next section, we will discuss the evolution of CV and how it can be applied to many different business use cases.
CV has tremendous business value across a variety of industries and use cases. There have also been recent technological advancements that are generating excitement within the field. The first use case of CV was noted over 60 years ago when a digital scanner was used to transform images into grids of numbers. Today, vision transformers and generative AI allow us to quickly create images and videos from text prompts. The applications of CV are evident across every industry, including healthcare, manufacturing, media and entertainment, retail, agriculture, sports, education, and transportation. Deriving meaningful insights from images and videos has helped accelerate business efficiency and improved the customer experience. In this section, we will briefly cover the latest CV implementations and highlight use cases that we will be diving deeper into throughout this book.
New applications of CV
In 1961, Lawrence Roberts, who is often considered the “father” of CV, presented in his paper Machine Perception of Three-Dimensional Solids (https://dspace.mit.edu/bitstream/handle/1721.1/11589/33959125-MIT.pdf) how a computer could construct a 3D array of objects from a 2D photograph. This groundbreaking paper led researchers to explore the value of image recognition and object detection. Since the discovery of NNs and DL, the field of CV has made great strides in developing more accurate and efficient models. Earlier, we reviewed some of these models, such as CNN and YOLO. These models are widely adopted for a variety of CV tasks. Recently, a new model called vision transformers has emerged that outperforms CNN in terms of accuracy and efficiency. Before we review vision transformers in more detail, let’s summarize the idea of transformers and their relevance in CV.
In order to understand transformers, we first need to explore a DL concept that is used in natural language processing (NLP), called attention. An introduction to transformers and self-attention was first presented in the paper Attention is All You Need (https://arxiv.org/pdf/1706.03762.pdf). The attention mechanism is used in RNN sequence-to-sequence (seq2seq) models. One example of an application of seq2seq models is language translation. This model is composed of an encoder and a decoder. The encoder processes the input sequence, and the decoder generates the transformed output. There are hidden state vectors that take the input sequence and the context vector from the encoder and send them to the decoder to predict the output sequence. The following diagram is an illustration of these concepts:
Figure 1.6 – Translating a sentence from English to German using a seq2seq model
In the above, we pay attention to the context of the words in the input to determine the next sequence when generating the output. Another example of attention from Attention is All You Need weighs the importance of different inputs when making predictions. Here is a sentiment analysis example from the paper for a hotel service task, where the bold words are considered relevant:
Figure 1.7 – Example of attention for sentiment analysis from “Attention is All You Need”
A transformer relies on self-attention, which is defined in the paper as “an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence”. Transformers are important in the application of NLP because they capture the relationship and context of words in text. Take a look at the following sentences:
Andy Jassy is the current CEO of Amazon. He was previously the CEO of Amazon Web Services.
Using transformers, we are able to understand that “He” in the second sentence is referring to Andy Jassy. Without this context of the subject in the first sentence, it is difficult to understand the relationship between the rest of the words in the text.
Now that we’ve reviewed transformers and explained their importance in NLP, how does this relate to CV? The vision transformer was introduced in a 2021 paper, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (https://arxiv.org/pdf/2010.11929v2.pdf
