Computer Vision on AWS - Lauren Mullennex - E-Book

Computer Vision on AWS E-Book

Lauren Mullennex

0,0
32,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Computer vision (CV) is a field of artificial intelligence that helps transform visual data into actionable insights to solve a wide range of business challenges. This book provides prescriptive guidance to anyone looking to learn how to approach CV problems for quickly building and deploying production-ready models.
You’ll begin by exploring the applications of CV and the features of Amazon Rekognition and Amazon Lookout for Vision. The book will then walk you through real-world use cases such as identity verification, real-time video analysis, content moderation, and detecting manufacturing defects that’ll enable you to understand how to implement AWS AI/ML services. As you make progress, you'll also use Amazon SageMaker for data annotation, training, and deploying CV models. In the concluding chapters, you'll work with practical code examples, and discover best practices and design principles for scaling, reducing cost, improving the security posture, and mitigating bias of CV workloads.
By the end of this AWS book, you'll be able to accelerate your business outcomes by building and implementing CV into your production environments with the help of AWS AI/ML services.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 276

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Computer Vision on AWS

Build and deploy real-world CV solutions with Amazon Rekognition, Lookout for Vision, and SageMaker

Lauren Mullennex

Nate Bachmeier

Jay Rao

BIRMINGHAM—MUMBAI

Computer Vision on AWS

Copyright © 2023 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Dinesh Chaudhary

Content Development Editor: Joseph Sunil

Technical Editor: Sweety Pagaria

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Manju Arasan

Production Designer: Ponraj Dhandapani

Marketing Coordinator: Shifa Ansari

First published: April 2023

Production reference: 1240323

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80107-868-9

www.packtpub.com

To my father, brother, mother, and Denise – there are not enough words to express how grateful I am for your support and guidance. Thank you for your encouragement and for teaching me how to persevere.

– Lauren Mullennex

To my parents, sister, and my wife – who always believed in me. Thank you for your continued love and support!

– Jay Rao

Contributors

About the authors

Lauren Mullennex is a senior AI/ML specialist solutions architect at AWS. She has broad experience in infrastructure, DevOps, and cloud architecture across multiple industries. She has published multiple AWS AI/ML blogs, spoken at AWS conferences, and focuses on developing solutions using CV and MLOps.

Nate Bachmeier is a principal solutions architect at AWS (PhD. CS, MBA). He nomadically explores the world one cloud integration at a time, focusing on the financial service industry.

Jay Rao is a principal solutions architect at AWS. He enjoys providing technical and strategic guidance to customers and helping them design and implement solutions.

About the reviewer

Morteza Kiadi is a seasoned cloud computing and machine learning technical instructor with over fifteen years of expertise. Morteza pursued his Ph.D. in AI and Optimization while held several positions in enterprises and startups. As a senior technical trainer at AWS, Morteza instructs all machine learning courses in AWS, supporting AWS customers in mastering challenging topics and cutting-edge technology. Morteza utilized his skills in academia, industry, teaching, and analyzing other publications to render this book accessible to its readers by providing insightful and impartial reviews.

I am extremely grateful to Emily Webber for inspiring me to review this book. I am thankful to the entire Packt Publishing team for your ongoing assistance in the publishing of this book.

Table of Contents

Preface

Part 1: Introduction to CV on AWS and Amazon Rekognition

1

Computer Vision Applications and AWS AI/ML Services Overview

Technical requirements

Understanding CV

CV architecture and applications

Data processing and feature engineering

Data labeling

Solving business challenges with CV

Contactless check-in and checkout

Video analysis

Content moderation

CV at the edge

Exploring AWS AI/ML services

AWS AI services

Amazon SageMaker

Setting up your AWS environment

Creating an Amazon SageMaker Jupyter notebook instance

Summary

2

Interacting with Amazon Rekognition

Technical requirements

The Amazon Rekognition console

Using the Label detection demo

Examining the API request

Examining the API response

Other demos

Monitoring Amazon Rekognition

Quick recap

Detecting Labels using the API

Uploading the images to S3

Initializing the boto3 client

Detect the Labels

Using the Label information

Using bounding boxes

Quick recap

Cleanup

Summary

3

Creating Custom Models with Amazon Rekognition Custom Labels

Technical requirements

Introducing Amazon Rekognition Custom Labels

Benefits of Amazon Rekognition Custom Labels

Creating a model using Rekognition Custom Labels

Deciding the model type based on your business goal

Creating a model

Improving the model

Starting your model

Analyzing an image

Stopping your model

Building a model to identify Packt’s logo

Step 1 – Collecting your images

Step 2 – Creating a project

Step 3 – Creating training and test datasets

Step 4 – Adding labels to the project

Step 5 – Drawing bounding boxes on your training and test datasets

Step 6 – Training your model

Validating that the model works

Step 1 – Starting your model

Step 2 – Analyzing an image with your model

Step 3 – Stopping your model

Summary

Part 2: Applying CVto Real-World Use Cases

4

Using Identity Verification to Build a Contactless Hotel Check-In System

Technical requirements

Prerequisites

Creating the image bucket

Uploading the sample images

Creating the profile table

Introducing collections

Creating a collection

Describing a collection

Deleting a collection

Quick recap

Describing the user journeys

Registering a new user

Authenticating a user

Registering a new user with an ID card

Updating the user profile

Implementing the solution

Checking image quality

Indexing face information

Search existing faces

Quick recap

Supporting ID cards

Reading an ID card

Using the CompareFaces API

Quick recap

Guidance for identity verification on AWS

Solution overview

Deployment process

Cleanup

Summary

5

Automating a Video Analysis Pipeline

Technical requirements

Creating the video bucket

Uploading content to Amazon S3

Creating the person-tracking topic

Subscribing a message queue to the person-tracking topic

Creating the person-tracking publishing role

Setting up IP cameras

Quick recap

Using IP cameras

Installing OpenCV

Installing additional modules

Connecting with OpenCV

Viewing the frame

Uploading the frame

Reporting frame metrics

Quick recap

Using the PersonTracking API

Uploading the video to Amazon S3

Using the StartPersonTracking API

Receiving the completion notification

Using the GetPersonTracking API

Reviewing the GetPersonTracking response

Viewing the frame

Quick recap

Summary

6

Moderating Content with AWS AI Services

Technical requirements

Moderating images

Using the DetectModerationLabels API

Using top-level categories

Using secondary-level categories

Putting it together

Quick recap

Moderating videos

Creating the supporting resources

Finding the resource ARNs

Uploading the sample video to Amazon S3

Using the StartContentModeration API

Examining the completion notification

Using the GetContentModeration API

Quick recap

Using AWS Lambda to automate the workflow

Implement the Start Analysis Handler

Implementing the Get Results Handler

Publishing function changes

Experiment with the end-to-end

Summary

Part 3: CV at the edge

7

Introducing Amazon Lookout for Vision

Technical requirements

Introducing Amazon Lookout for Vision

The benefits of Amazon Lookout for Vision

Creating a model using Amazon Lookout for Vision

Choosing the model type based on your business goals

Creating a model

Starting your model

Analyzing an image

Stopping your model

Building a model to identify damaged pills

Step 1 – collecting your images

Step 2 – creating a project

Step 3 – creating the training and test datasets

Step 4 – verifying the dataset

Step 5 – training your model

Validating it works

Step 1 – trial detection

Step 2 – starting your model

Step 3 – analyzing an image with your model

Step 4 – stopping your model

Summary

8

Detecting Manufacturing Defects Using CV at the Edge

Technical requirements

Understanding ML at the edge

Deploying a model at the edge using Lookout for Vision and AWS IoT Greengrass

Step 1 – Launch an Amazon EC2 instance

Step 2 – Create an IAM role and attach it to an EC2 instance

Step 3 – Install AWS IoT Greengrass V2

Step 4 – Upload training and test datasets to S3

Step 5 – Create a project

Step 6 – Create training and test datasets

Step 7 – Train the model

Step 8 – Package the model

Step 9 – Configure IoT Greengrass IAM permissions

Step 10 – Deploy the model

Step 11 – Run inference on the model

Step 12 – Clean up resources

Summary

Part 4: Building CV Solutions with Amazon SageMaker

9

Labeling Data with Amazon SageMaker Ground Truth

Technical requirements

Introducing Amazon SageMaker Ground Truth

Benefits of Amazon SageMaker Ground Truth

Automated data labeling

Labeling Packt logos in images using Amazon SageMaker Ground Truth

Step 1 – collect your images

Step 2 – create a labeling job

Step 3 – specify the job details

Step 4 – specify worker details

Step 5 – providing labeling instructions

Step 6 – start labeling

Step 7 – output data

Importing the labeled data with Rekognition Custom Labels

Step 1 – create the project

Step 2 – create training and test datasets

Step 3 – model training

Summary

10

Using Amazon SageMaker for Computer Vision

Technical requirements

Fetching the LabelMe-12 dataset

Installing TensorFlow 2.0

Installing matplotlib

Using the built-in image classifier

Upload the dataset to Amazon S3

Prepare the job channels

Start the training job

Monitoring and troubleshooting

Quick recap

Handling binary metadata files

Declaring the Label class

Reading the annotations file

Declaring the Annotation class

Validate parsing the file

Restructure the files

Load the dataset

Quick recap

Summary

Part 5: Best Practices for Production-Ready CV Workloads

11

Integrating Human-in-the-Loop with Amazon Augmented AI (A2I)

Technical requirements

Introducing Amazon A2I

Core concepts of Amazon A2I

Learning how to build a human review workflow

Creating a labeling workforce

Setting up an A2I human review workflow or flow definition

Initiating a human loop

Leveraging Amazon A2I with Amazon Rekognition to review images

Step 1 – Collecting your images

Step 2 – Creating a work team

Step 3 – Creating a human review workflow

Step 4 – Starting a human loop

Step 5 – Checking the human loop status

Step 6 – Reviewing the output data

Summary

12

Best Practices for Designing an End-to-End CV Pipeline

Defining a problem that CV can solve and processing data

Developing a CV model

Training

Evaluating

Tuning

Deploying and monitoring a CV model

Shadow testing

A/B testing

Blue/Green deployment strategy

Monitoring

Developing an MLOps strategy

SageMaker MLOps features

Workflow automation tools

Using the AWS Well-Architected Framework

Cost optimization

Operational excellence

Reliability

Performance efficiency

Security

Sustainability

Summary

13

Applying AI Governance in CV

Understanding AI governance

Defining risks, documentation, and compliance

Data risks and detecting bias

Auditing, traceability, and versioning

Monitoring and visibility

MLOps

Responsibilities of business stakeholders

Applying AI governance in CV

Types of biases

Mitigating bias in identity verification workflows

Using Amazon SageMaker for governance

ML governance capabilities with Amazon SageMaker

Amazon SageMaker Clarify for explainable AI

Summary

Index

Other Books You May Enjoy

Preface

Computer vision (CV) transforms visual data into actionable insights to solve many business challenges. In recent years, due to the availability of increased computing power and access to vast amounts of data, CV has become more accessible. Amazon Web Services (AWS) has played an important role in democratizing CV by providing services to build, train, and deploy CV models.

In this book, you will begin by exploring the applications of CV and features of Amazon Rekognition and Amazon Lookout for Vision. Then, you’ll walk through real-world use cases such as identity verification, real-time video analysis, content moderation, and detecting manufacturing defects to understand how to implement AWS AI/ML services. You’ll also use Amazon SageMaker for data annotation, training, and deploying CV models. As you progress, you’ll learn best practices and design principles for scaling, reducing cost, improving the security posture, and mitigating the bias of CV workloads.

By the end of this book, you’ll be able to accelerate your business outcomes by building and implementing CV into your production environments with AWS AI/ML services.

Who this book is for

If you are a machine learning engineer, a data scientist, or want to better understand best practices and how to build comprehensive CV solutions on AWS, this book is for you. Knowledge of AWS basics is required to grasp the concepts covered in this book more effectively. A solid understanding of ML concepts and the Python programming language will also be beneficial.

What this book covers

Chapter 1, Computer Vision Applications and AWS AI/ML Overview, provides an introduction to CV and summarizes use cases where CV can be applied to solve business challenges. It also includes an overview of the AWS AI/ML services.

Chapter 2, Interacting with Amazon Rekognition, covers an overview of Amazon Rekognition and details the different capabilities available, including walking through the Amazon Rekognition console, and how to use the APIs.

Chapter 3, Creating Custom Models with Amazon Rekognition Custom Labels, provides a detailed introduction to Amazon Rekognition Custom Labels, what its benefits are, and a code example to train a custom object detection model.

Chapter 4, Using Identity Verification to Build a Contactless Hotel Check-In System, dives deep into a real-world use case using Amazon Rekognition and other AWS AI services to build applications that demonstrate how to solve business challenges using core CV capabilities. A code example is provided to build a mobile application for customers to register their faces and check into a fictional hotel kiosk system.

Chapter 5, Automating a Video Analysis Pipeline, dives deep into a real-world use case using Amazon Rekognition to build an application that demonstrates how to solve business challenges using core CV capabilities. A code example is provided to build a real-time video analysis pipeline using Amazon Rekognition Video APIs.

Chapter 6, Moderating Content with AWS AI Services, dives deep into a real-world use case using Amazon Rekognition and other AWS AI services to build applications that demonstrate how to solve business challenges using core CV capabilities. A code example is provided to build content moderation workflows.

Chapter 7, Introducing Amazon Lookout for Computer Vision, provides a detailed introduction to Amazon Lookout for Vision, what its functions are, and a code example to train a model to detect anomalies.

Chapter 8, Detecting Manufacturing Defects Using CV at the Edge, dives deeper into Amazon Lookout for Vision, covers the benefits of deploying CV at the edge, and walks through a code example to train a model to detect anomalies in manufacturing parts.

Chapter 9, Labeling Data with Amazon SageMaker Ground Truth, provides a detailed introduction to Amazon SageMaker Ground Truth, what its benefits are, and a code example to integrate a human labeling job into offline data labeling workflows.

Chapter 10, Using Amazon SageMaker for ComputerVision, dives deeper into Amazon SageMaker, covers its capabilities, and walks through a code example to train a model using a built-in image classifier.

Chapter 11, Integrating Human-in-the-Loop with Amazon Augmented AI, provides a detailed introduction to Amazon Augmented AI (Amazon A2I), what its functions are, and a code example that uses human reviewers to improve the accuracy of your CV workflows.

Chapter 12, Best Practices for Designing an End-to-End CV Pipeline, covers best practices that can be applied to CV workloads across the entire ML lifecycle, including considerations for cost optimization, scaling, security, and developing an MLOps strategy.

Chapter 13, Applying AI Governance in CV, discusses the purpose of establishing an AI governance framework, introduces Amazon SageMaker for ML governance, and provides an overview of the importance of mitigating bias.

To get the most out of this book

You will need access to an AWS account, so before getting started, we recommend that you create one.

Software/hardware covered in the book

Operating system requirements/Account creation requirements

Access to or signing up for an AWS account

https://portal.aws.amazon.com/billing/signup

Jupyter Notebook

Windows/macOS

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Computer-Vision-on-AWS. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Once the model is hosted, you can start analyzing your images using the DetectAnomalies API.”

A block of code is set as follows:

{     "SubscriptionArn": "arn:aws:sns:region:account:AmazonRekognitionPersonTrackingTopic:04877b15-7c19-4ce5-b958-969c5b9a1ecb" }

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[aws sns subscribe \   --region us-east-2 \   --topic-arn arn:aws:sns:region:account:AmazonRekognitionPersonTrackingTopic \   --protocol sqs \   --notification-endpoint arn:aws:sqs:region:account:PersonTrackingQueue

Any command-line input or output is written as follows:

$ git clone https://github.com/PacktPublishing/Computer-Vision-on-AWS $ cd Computer-Vision-on-AWS/07_LookoutForVision

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “If you’re a first-time user of the service, it will ask permission to create an S3 bucket to store your project files. Click Create S3 bucket.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Computer Vision on AWS, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781801078689

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

Part 1: Introduction to CV on AWS and Amazon Rekognition

As a machine learning engineer or data scientist, this section helps you better understand how CV can be applied to solve business challenges and gives a comprehensive overview of the AWS AI/ML services available.

This first part consists of three cumulative chapters that will cover the core concepts of CV, a detailed introduction to Amazon Rekognition, and how to create a custom classification model using Amazon Rekognition Custom Labels.

By the end of this part, you will understand how to apply CV to accelerate your business outcomes, what AWS AI/ML services for your CV workloads, and how to use Amazon Rekognition for tasks including classification and object detection.

This part comprises the following chapters:

Chapter 1, Computer Vision Applications and AWS AI/ML OverviewChapter 2, Interacting with Amazon RekognitionChapter 3, Creating Custom Models with Amazon Rekognition Custom Labels

1

Computer Vision Applications and AWS AI/ML Services Overview

In the past decade, the field of computer vision (CV) has rapidly advanced. Research in deep learning (DL) techniques has helped computers mimic human brains to “see” content in videos and images and transform it into actionable insights. There are examples of the wide variety of applications of CV all around us, including self-driving cars, text and handwriting detection, classifying different types of skin cancer in images, industrial equipment inspection, and detecting faces and objects in videos. Despite recent advancements, the availability of vast amounts of data from disparate sources has posed challenges in creating scalable CV solutions that achieve high-quality results. Automating a production CV pipeline is a cumbersome task requiring many steps. You may be asking, “How do I get started?” and “What are the best practices?”.

If you are a machine learning (ML) engineer or data scientist or want to better understand how to build and implement comprehensive CV solutions on Amazon Web Services (AWS), this book is for you. We provide practical code examples, tips, and step-by-step explanations to help you quickly deploy and automate production CV models. We assume that you have intermediate-level knowledge of artificial intelligence (AI) and ML concepts. In this first chapter, we will introduce CV and address implementation challenges, discuss the prevalence of CV across a variety of use cases, and learn about AWS AI/ML services.

In this chapter, we will cover the following:

Understanding CVSolving business challenges with CVExploring AWS AI/ML servicesSetting up your AWS environment

Technical requirements

You will need a computer with internet access to create an AWS account to set up Amazon SageMaker to run the code samples in the following chapters. The Python code and sample datasets for the solutions discussed are available at https://github.com/PacktPublishing/Computer-Vision-on-AWS.

Understanding CV

CV is a domain within AI and ML. It enables computers to detect and understand visual inputs (videos and images) to make predictions:

Figure 1.1 – CV is a subdomain of AI and ML

Before we discuss the inner workings of a CV system, let’s summarize the different types of ML algorithms:

Supervised learning (SL)—Takes a set of labeled input data and predicts a known target value. For example, a model may be trained on a set of labeled dog images. When a new unlabeled dog image is processed by the model, the model correctly predicts that the image is a dog instead of a cat.Unsupervised learning (UL)—Unlabeled data is provided, and patterns or structures need to be found within the data since no labeled target value is present. One example of UL is a targeted marketing campaign where customers need to be segmented into groups based on various common attributes such as demographics.Semi-supervised learning—consists of unlabeled and labeled data. This is beneficial for CV tasks, since it is a time-consuming process to label individual images. With this method, only some of the images in the dataset need to be labeled, in order to label and classify the unlabeled images.

CV architecture and applications

Now that we’ve covered the different types of ML training methods, how does this relate to CV? DL algorithms are commonly used to solve CV problems. These algorithms are composed of artificial neural networks (ANNs) containing layers of nodes, which function like a neuron in a human brain. A neural network (NN) has multiple layers, including one or more input layers, hidden layers, and output layers. Input data flows through the input layers. The nodes perform transformations of the input data in the hidden layers and produce output to the output layer. The output layer is where predictions of the input data occur. The following figure shows an example of a deep NN (DNN) architecture:

Figure 1.2 – DNN architecture

How does this architecture apply to real-world applications? With CV and DL technology, you can detect patterns in images and use these patterns for classification. One type of NN that excels in classifying images is a convolutional NN (CNN). CNNs were inspired by ANNs. The way the nodes in a CNN communicate replicates how animals visualize the world. One application of CNNs is classifying X-ray images to assist doctors with medical diagnoses:

Figure 1.3 – Image classification of X-rays

There are multiple types of problems that CV can solve that we will highlight throughout this book. Localization locates one or more objects in an image and draws a bounding box around the object(s). Object detection uses localization and classification to identify and classify one or multiple objects in an image. These tasks are more complicated than image classification. Faster R-CNN (Regions with CNN), SSD (single shot detector), and YOLO (you only look once) are other types of DNN models that can be used for object detection tasks. These models are designed for performance such as decreasing latency and increasing accuracy.

Segmentation—including instance segmentation and semantic segmentation—highlights the pixels of an image, instead of objects, and classifies them. Segmentation can also be applied to videos to detect black frames, color bars, end credits, and shot changes:

Figure 1.4 – Examples of different CV problem types

Despite recent advances in CV and DL, there are still challenges within the field. CV systems are complex, there are vast amounts of data to process, and considerations need to be taken before training a model. It is important to understand the data available since a model is only as good as the quality of your data, and the steps required to prepare the data for model training.

Data processing and feature engineering

CV deals with images and videos, which are a form of unstructured data. Unstructured data does not have a predefined data model and cannot be stored in a database row and column format. This type of data poses unique challenges compared to tabular data. More processing is required to transform the data into a usable format. A computer sees an image as a matrix of pixel values. A pixel is a set of numbers between 0-255 in the red, green, blue (RGB) system. Images vary in their resolutions, dimensions, and colors. In order to train a model, CV algorithms require that images are normalized such that they are the same size. Additional image processing techniques include resizing, rotating, enhancing the resolution, and converting from RGB to grayscale. Another technique is image masking, which allows us to focus on a region of interest. In the following photos, we apply a mask to highlight the motorcycle:

Figure 1.5 – Applying an image mask to highlight the motorcycle

Preprocessing is important since images are often large and take up lots of storage. Resizing an image and converting it to grayscale can speed up the ML training process. However, this technique is not always optimal for the problem we’re trying to solve. For example, in medical image analysis such as skin cancer diagnosis, the colors of the samples are relevant for a proper diagnosis. This is why it’s important to have a complete understanding of the business problem you’re trying to solve before choosing how to process your data. In the following chapters, we’ll provide code examples that detail various image preprocessing steps.

Features or attributes in ML are important input data characteristics that affect the output or target variable of a model. Distinct features in an image help a model differentiate objects from one another. Determining relevant features depends on the context of your business problem. If you’re trying to identify a Golden Retriever dog in a group of images also containing cats, then height is an important feature. However, if you’re looking to classify different types of dogs, then height is not always a distinguishing feature since Golden Retrievers are similar in height to many other dog breeds. In this case, color and coat length might be moreuseful features.

Data labeling

Data annotation or data labeling is the process of labeling your input datasets. It helps derive value from your unstructured data for SL. Some of the challenges with data labeling are that it is a manual process that is time-consuming, humans have a bias for labeling an object, and it’s difficult to scale. Amazon SageMaker Ground Truth Plus (https://aws.amazon.com/sagemaker/data-labeling/) helps address these challenges by automating this process. It contains a labeling user interface (UI) and quality workflow customizations. The labeling is done by an expert workforce with domain knowledge of the ML tasks to complete. This improves the label quality and leads to better training datasets. In Chapter 9, we will cover a code example using SageMaker Ground Truth Plus.

Amazon Rekognition Custom Labels (https://aws.amazon.com/rekognition/custom-labels-features/) also provides a visual interface to label your images. Labels can be applied to the entire image or you can create bounding boxes to label specific objects. In the next two chapters, we will discuss Amazon Rekognition and Rekognition Custom Labels in more detail.

In this section, we discussed the architecture behind DL CV algorithms. We also covered data processing, feature engineering, and data labeling considerations to create high-quality training datasets. In the next section, we will discuss the evolution of CV and how it can be applied to many different business use cases.

Solving business challenges with CV

CV has tremendous business value across a variety of industries and use cases. There have also been recent technological advancements that are generating excitement within the field. The first use case of CV was noted over 60 years ago when a digital scanner was used to transform images into grids of numbers. Today, vision transformers and generative AI allow us to quickly create images and videos from text prompts. The applications of CV are evident across every industry, including healthcare, manufacturing, media and entertainment, retail, agriculture, sports, education, and transportation. Deriving meaningful insights from images and videos has helped accelerate business efficiency and improved the customer experience. In this section, we will briefly cover the latest CV implementations and highlight use cases that we will be diving deeper into throughout this book.

New applications of CV

In 1961, Lawrence Roberts, who is often considered the “father” of CV, presented in his paper Machine Perception of Three-Dimensional Solids (https://dspace.mit.edu/bitstream/handle/1721.1/11589/33959125-MIT.pdf) how a computer could construct a 3D array of objects from a 2D photograph. This groundbreaking paper led researchers to explore the value of image recognition and object detection. Since the discovery of NNs and DL, the field of CV has made great strides in developing more accurate and efficient models. Earlier, we reviewed some of these models, such as CNN and YOLO. These models are widely adopted for a variety of CV tasks. Recently, a new model called vision transformers has emerged that outperforms CNN in terms of accuracy and efficiency. Before we review vision transformers in more detail, let’s summarize the idea of transformers and their relevance in CV.

In order to understand transformers, we first need to explore a DL concept that is used in natural language processing (NLP), called attention. An introduction to transformers and self-attention was first presented in the paper Attention is All You Need (https://arxiv.org/pdf/1706.03762.pdf). The attention mechanism is used in RNN sequence-to-sequence (seq2seq) models. One example of an application of seq2seq models is language translation. This model is composed of an encoder and a decoder. The encoder processes the input sequence, and the decoder generates the transformed output. There are hidden state vectors that take the input sequence and the context vector from the encoder and send them to the decoder to predict the output sequence. The following diagram is an illustration of these concepts:

Figure 1.6 – Translating a sentence from English to German using a seq2seq model

In the above, we pay attention to the context of the words in the input to determine the next sequence when generating the output. Another example of attention from Attention is All You Need weighs the importance of different inputs when making predictions. Here is a sentiment analysis example from the paper for a hotel service task, where the bold words are considered relevant:

Figure 1.7 – Example of attention for sentiment analysis from “Attention is All You Need”

A transformer relies on self-attention, which is defined in the paper as “an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence”. Transformers are important in the application of NLP because they capture the relationship and context of words in text. Take a look at the following sentences:

Andy Jassy is the current CEO of Amazon. He was previously the CEO of Amazon Web Services.

Using transformers, we are able to understand that “He” in the second sentence is referring to Andy Jassy. Without this context of the subject in the first sentence, it is difficult to understand the relationship between the rest of the words in the text.

Now that we’ve reviewed transformers and explained their importance in NLP, how does this relate to CV? The vision transformer was introduced in a 2021 paper, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (https://arxiv.org/pdf/2010.11929v2.pdf