E-Book
40,81 €

Mastering Computer Vision with TensorFlow 2.x E-Book

Krishnendu Kar

0,0

40,81 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Apply neural network architectures to build state-of-the-art computer vision applications using the Python programming language

Key Features

Gain a fundamental understanding of advanced computer vision and neural network models in use today

Cover tasks such as low-level vision, image classification, and object detection

Develop deep learning models on cloud platforms and optimize them using TensorFlow Lite and the OpenVINO toolkit

Book Description

Computer vision allows machines to gain human-level understanding to visualize, process, and analyze images and videos. This book focuses on using TensorFlow to help you learn advanced computer vision tasks such as image acquisition, processing, and analysis. You'll start with the key principles of computer vision and deep learning to build a solid foundation, before covering neural network architectures and understanding how they work rather than using them as a black box. Next, you'll explore architectures such as VGG, ResNet, Inception, R-CNN, SSD, YOLO, and MobileNet. As you advance, you'll learn to use visual search methods using transfer learning. You'll also cover advanced computer vision concepts such as semantic segmentation, image inpainting with GAN's, object tracking, video segmentation, and action recognition. Later, the book focuses on how machine learning and deep learning concepts can be used to perform tasks such as edge detection and face recognition. You'll then discover how to develop powerful neural network models on your PC and on various cloud platforms. Finally, you'll learn to perform model optimization methods to deploy models on edge devices for real-time inference. By the end of this book, you'll have a solid understanding of computer vision and be able to confidently develop models to automate tasks.

What you will learn

Explore methods of feature extraction and image retrieval and visualize different layers of the neural network model

Use TensorFlow for various visual search methods for real-world scenarios

Build neural networks or adjust parameters to optimize the performance of models

Understand TensorFlow DeepLab to perform semantic segmentation on images and DCGAN for image inpainting

Evaluate your model and optimize and integrate it into your application to operate at scale

Get up to speed with techniques for performing manual and automated image annotation

Who this book is for

This book is for computer vision professionals, image processing professionals, machine learning engineers and AI developers who have some knowledge of machine learning and deep learning and want to build expert-level computer vision applications. In addition to familiarity with TensorFlow, Python knowledge will be required to get started with this book.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 448

Veröffentlichungsjahr: 2020

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Mastering Computer Vision with TensorFlow 2.x

Build advanced computer vision applications using machine learning and deep learning techniques

Krishnendu Kar

BIRMINGHAM - MUMBAI

Mastering Computer Vision with TensorFlow 2.x

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor:Mrinmayee KawalkarAcquisition Editor:Nelson MorrisContent Development Editor:Nazia ShaikhSenior Editor: Ayaan HodaTechnical Editor: Utkarsha S. KadamCopy Editor: Safis EditingProject Coordinator:Aishwarya MohanProofreader: Safis EditingIndexer:Manju ArasanProduction Designer:Nilesh Mohite

First published: May 2020

Production reference: 1140520

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-83882-706-9

www.packt.com

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Fully searchable for easy access to vital information

Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Krishnendu Kar is passionate about research on computer vision and solving AI problems to make our life simpler. His core expertise is deep learning - computer vision, IoT, and agile software development. Krish is also a passionate app developer and has a dash cam-based object and lane detection and turn by turn navigation and fitness app in the iOS app store - Nity Map AI Camera & Run timer.

I want to thank my parents, my wife, and kids for supporting me during the 80 hours a week journey for the last year to complete the book. Special thanks to all the Packt past and current members and to my God, for giving me the strength, and the technical reviewer for reviewing my book contents, making suggestions, and answering all my queries in a timely manner.

About the reviewers

Meng-Chieh Ling has a Ph.D. degree in theoretical condensed matter physics from Karlsruhe Institute of Technology in Germany. He switched from physics to data science to pursue a successful career. After working for AGT International in Darmstadt for 2 years, he joined CHECK24 Fashion as a data scientist in Düsseldorf. His responsibilities include applying machine learning to improve the efficiency of data cleansing, automatic attribute tagging with deep learning, and developing image-based recommendation systems.

Amin Ahmadi Tazehkandi is an Iranian author, software engineer, and computer vision expert. He has worked at numerous software companies across the globe and has a long list of awards and achievements, including a countrywide hackathon win and an award-winning paper. Amin is an avid blogger and long-time contributor to the open source, cross-platform, and computer vision developer communities. He is the proud author of Computer Vision with OpenCV 3 and Qt5, and Hands-On Algorithms for Computer Vision.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Title Page

Mastering Computer Vision with TensorFlow 2.x

About Packt

Why subscribe?

Contributors

About the author

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Introduction to Computer Vision and Neural Networks

Computer Vision and TensorFlow Fundamentals

Technical requirements

Detecting edges using image hashing and filtering

Using a Bayer filter for color pattern formation

Creating an image vector

Transforming an image

Linear filtering—convolution with kernels

Image smoothing

The mean filter

The median filter

The Gaussian filter

Image filtering with OpenCV

Image gradient

Image sharpening

Mixing the Gaussian and Laplacian operations

Detecting edges in an image

The Sobel edge detector

The Canny edge detector

Extracting features from an image

Image matching using OpenCV

Object detection using Contours and the HOG detector

Contour detection

Detecting a bounding box

The HOG detector

Limitations of the contour detection method

An overview of TensorFlow, its ecosystem, and installation

TensorFlow versus PyTorch

TensorFlow Installation

Summary

Content Recognition Using Local Binary Patterns

Processing images using LBP

Generating an LBP pattern

Understanding the LBP histogram

Histogram comparison methods

The computational cost of LBP

Applying LBP to texture recognition

Matching face color with foundation color – LBP and its limitations

Matching face color with foundation color – color matching technique

Summary

Facial Detection Using OpenCV and CNN

Applying Viola-Jones AdaBoost learning and the Haar cascade classifier for face recognition

Selecting Haar-like features 

Creating an integral image

Running AdaBoost training

Attentional cascade classifiers

Training the cascade detector

Predicting facial key points using a deep neural network

Preparing the dataset for key-point detection

Processing key-point data

Preprocessing before being input into the Keras–Python code 

Preprocessing within the Keras–Python code 

Defining the model architecture

Training the model to make key point predictions

Predicting facial expressions using a CNN

Overview of 3D face detection

Overview of hardware design for 3D reconstruction

Overview of 3D reconstruction and tracking

Overview of parametric tracking

Summary

Deep Learning on Images

Understanding CNNs and their parameters

Convolution

Convolution over volume – 3 x 3 filter

Convolution over volume – 1 x 1 filter

Pooling

Padding 

Stride

Activation

Fully connected layers

Regularization

Dropout

Internal covariance shift and batch normalization 

Softmax

Optimizing CNN parameters

Baseline case

Iteration 1 – CNN parameter adjustment

Iteration 2 – CNN parameter adjustment

Iteration 3 – CNN parameter adjustment

Iteration 4 – CNN parameter adjustment

Visualizing the layers of a neural network

Building a custom image classifier model and visualizing its layers

Neural network input and parameters

Input image 

Defining the train and validation generators

Developing the model 

Compiling and training the model

Inputting a test image and converting it into a tensor

Visualizing the first layer of activation

Visualizing multiple layers of activation

Training an existing advanced image classifier model and visualizing its layers

Summary

Section 2: Advanced Concepts of Computer Vision with TensorFlow

Neural Network Architecture and Models

Overview of AlexNet

Overview of VGG16

Overview of Inception

GoogLeNet detection

Overview of ResNet

Overview of R-CNN

Image segmentation 

Clustering-based segmentation

Graph-based segmentation

Selective search

Region proposal

Feature extraction

Classification of the image

Bounding box regression

Overview of Fast R-CNN

Overview of Faster R-CNN

Overview of GANs

Overview of GNNs

Spectral GNN

Overview of Reinforcement Learning

Overview of Transfer Learning

Summary

Visual Search Using Transfer Learning

Coding deep learning models using TensorFlow

Downloading weights

Decoding predictions

Importing other common features

Constructing a model

Inputting images from a directory

Loop function for importing multiple images and processing using TensorFlow Keras

Developing a transfer learning model using TensorFlow

Analyzing and storing data

Importing TensorFlow libraries

Setting up model parameters

Building an input data pipeline

Training data generator

Validation data generator

Constructing the final model using transfer learning

Saving a model with checkpoints

Plotting training history

Understanding the architecture and applications of visual search

The architecture of visual search

Visual search code and explanation

Predicting the class of an uploaded image

Predicting the class of all images

Working with a visual search input pipeline using tf.data

Summary

Object Detection Using YOLO

An overview of YOLO

The concept of IOU

How does YOLO detect objects so fast?

The YOLO v3 neural network architecture

A comparison of YOLO and Faster R-CNN

An introduction to Darknet for object detection

Detecting objects using Darknet

Detecting objects using Tiny Darknet

Real-time prediction using Darknet

YOLO versus YOLO v2 versus YOLO v3 

When to train a model?

Training your own image set with YOLO v3 to develop a custom model

Preparing images

Generating annotation files

Converting .xml files to .txt files

Creating a combined train.txt and test.txt file

Creating a list of class name files

Creating a YOLO .data file

Adjusting the YOLO configuration file

Enabling the GPU for training

Start training

An overview of the Feature Pyramid Network and RetinaNet

Summary

Semantic Segmentation and Neural Style Transfer

Overview of TensorFlow DeepLab for semantic segmentation

Spatial Pyramid Pooling

Atrous convolution

Encoder-decoder network

Encoder module

Decoder module

Semantic segmentation in DeepLab – example

Google Colab, Google Cloud TPU, and TensorFlow

Artificial image generation using DCGANs

Generator

Discriminator

Training

Image inpainting using DCGAN

TensorFlow DCGAN – example

Image inpainting using OpenCV

Understanding neural style transfer

Summary

Section 3: Advanced Implementation of Computer Vision with TensorFlow

Action Recognition Using Multitask Deep Learning

Human pose estimation – OpenPose

Theory behind OpenPose 

Understanding the OpenPose code

Human pose estimation – stacked hourglass model

Understanding the hourglass model

Coding an hourglass model

argparse block

Training an hourglass network

Creating the hourglass network

Front module

Left half-block

Connect left to right

Right half-block

Head block

Hourglass training

Human pose estimation – PoseNet

Top-down approach

Bottom-up approach

PoseNet implementation

Applying human poses for gesture recognition

Action recognition using various methods

Recognizing actions based on an accelerometer

Combining video-based actions with pose estimation

Action recognition using the 4D method

Summary

Object Detection Using R-CNN, SSD, and R-FCN

An overview of SSD

An overview of R-FCN

An overview of the TensorFlow object detection API

Detecting objects using TensorFlow on Google Cloud

Detecting objects using TensorFlow Hub

Training a custom object detector using TensorFlow and Google Colab

Collecting and formatting images as .jpg files

Annotating images to create a .xml file

Separating the file by train and test folders

Configuring parameters and installing the required packages

Creating TensorFlow records

Preparing the model and configuring the training pipeline

Monitoring training progress using TensorBoard

TensorBoard running on a local machine

TensorBoard running on Google Colab

Training the model

Running an inference test

Caution when using the neural network model

An overview of Mask R-CNN and a Google Colab demonstration

Developing an object tracker model to complement the object detector

Centroid-based tracking

SORT tracking

DeepSORT tracking

The OpenCV tracking method

Siamese network-based tracking

SiamMask-based tracking

Summary

Section 4: TensorFlow Implementation at the Edge and on the Cloud

Deep Learning on Edge Devices with CPU/GPU Optimization

Overview of deep learning on edge devices

Techniques used for GPU/CPU optimization

Overview of MobileNet

Image processing with a Raspberry Pi

Raspberry Pi hardware setup

Raspberry Pi camera software setup

OpenCV installation in Raspberry Pi

OpenVINO installation in Raspberry Pi

Installing the OpenVINO toolkit components

Setting up the environmental variable

Adding a USB rule

Running inference using Python code

Advanced inference

Face detection, pedestrian detection, and vehicle detection

Landmark models

Models for action recognition

License plate, gaze, and person detection

Model conversion and inference using OpenVINO

Running inference in a Terminal using ncappzoo

Converting the pre-trained model for inference

Converting from a TensorFlow model developed using Keras

Converting a TensorFlow model developed using the TensorFlow Object Detection API

Summary of the OpenVINO Model inference process

Application of TensorFlow Lite

Converting a TensorFlow model into tflite format

Python API

TensorFlow Object Detection API – tflite_convert

TensorFlow Object Detection API – toco

Model optimization

Object detection on Android phones using TensorFlow Lite

Object detection on Raspberry Pi using TensorFlow Lite

Image classification

Object detection

Object detection on iPhone using TensorFlow Lite and Create ML

TensorFlow Lite conversion model for iPhone

Core ML

Converting a TensorFlow model into Core ML format

A summary of various annotation methods

Outsource labeling work to a third party

Automated or semi-automated labeling

Summary

Cloud Computing Platform for Computer Vision

Training an object detector in GCP

Creating a project in GCP

The GCP setup

The Google Cloud Storage bucket setup

Setting up a bucket using the GCP API

Setting up a bucket using Ubuntu Terminal

Setting up the Google Cloud SDK

Linking your terminal to the Google Cloud project and bucket

Installing the TensorFlow object detection API

Preparing the dataset

TFRecord and labeling map data

Data preparation

Data upload

The model.ckpt files

The model config file

Training in the cloud

Viewing the model output in TensorBoard

The model output and conversion into a frozen graph

Executing export tflite graph.py from Google Colab

Training an object detector in the AWS SageMaker cloud platform

Setting up an AWS account, billing, and limits

Converting a .xml file to JSON format

Uploading data to the S3 bucket

Creating a notebook instance and beginning training

Fixing some common failures during training

Training an object detector in the Microsoft Azure cloud platform

Creating an Azure account and setting up Custom Vision

Uploading training images and tagging them

Training at scale and packaging

Application packaging

The general idea behind cloud-based visual search

Analyzing images and search mechanisms in various cloud platforms

Visual search using GCP

Visual search using AWS

Visual search using Azure

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Computer vision is a technique by which machines gain human-level ability to visualize, process, and analyze images or videos. This book will focus on using TensorFlow to develop and train deep neural networks to solve advanced computer vision problems and deploy solutions on mobile and edge devices.

You will start with the key principles of computer vision and deep learning and learn about various models and architectures, along with their pros and cons. You will cover various architectures, such as VGG, ResNet, Inception, R-CNN, YOLO, and many more. You will use various visual search methods using transfer learning. The book will help you to learn about various advanced concepts of computer vision, including semantic segmentation, image inpainting, object tracking, video segmentation, and action recognition. You will explore how various machine learning and deep learning concepts can be applied in computer vision tasks such as edge detection and face recognition. Later in the book, you will focus on performance tuning to optimize performance, deploying dynamic models to improve processing power, and scaling to handle various computer vision challenges.

By the end of the book, you will have an in-depth understanding of computer vision and will know how to develop models to automate tasks.

Who this book is for

This book is for computer vision professionals, image processing professionals, machine learning engineers, and AI developers who have some knowledge of machine learning and deep learning and want to build expert-level computer vision applications. Familiarity with Python programming and TensorFlow will be required for this book.

What this book covers

Chapter 1, Computer Vision and TensorFlow Fundamentals, discusses the foundational concepts of computer vision and TensorFlow to prepare you for the later, more advanced chapters of this book. We will look at how to perform image hashing and filtering. Then, we will learn about various methods of feature extraction and image retrieval. Moving on, we will learn about contour-based object detection, histogram of oriented gradients and various feature matching methods. Then, we will look at an overview of the high-level TensorFlow software and its different components and subsystems. The chapter provides many hands-on coding exercises for object detection, image filtering and feature matching.

Chapter 2, Content Recognition Using Local Binary Patterns, discusses local binary feature descriptor and the histogram for the classification of textured and non-textured images. You will learn to tune local binary pattern (LBP) parameters and calculate histogram difference between LBPs to match identical pattern between images. The chapter provides two coding exercises – one for matching flooring patterns and the other for matching face color with foundation color.

Chapter 3, Facial Detection Using OpenCV and CNNs, starts with Viola-Jones face- and key-feature detection and move on to the advanced concept of the neural-network-based facial key points detection and facial expressions recognition. The chapter will end by looking at the advanced concept of 3D face detection. The chapter provides two coding exercise one for OpenCV based face detection in webcam and the other one is a CNN based end to end pipeline for facial key point detection. The end to end neural network pipeline consists of facial image collection by cropping face images from webcam, annotating key points in face image, data ingestion into a CNN, building a CNN model, training and finally evaluating the trained model of key points against face images.

Chapter 4, Deep Learning on Images, delves into how edge detection is used to create convolution operations over volume and how different convolution parameters such as filter size, dimensions, and operation type affect the convolution volume. This chapter will give you a very detailed overview of how a neural network sees an image and how it uses that visualization to classify images. The chapter provides a TensorFlow Keras based coding exercise to construct a neural network and visualize an image as it goes through its different layers. You will then compare the network model's accuracy and visualization to an advanced network such as VGG 16 or Inception.

Chapter 5, Neural Network Architecture and Models, explores different neural network architectures and models. This will give you an understanding of how the concepts learned in the first and fourth chapters are applied in various scenarios by changing the parameters for the convolution, pooling, activation, fully connected, and softmax layers. Hopefully, with these exercises, you will develop an understanding of a range of neural network models, which will give you a solid foundation as a computer vision engineer.

Chapter 6,Visual Search Using Transfer Learning, is where you are going to use TensorFlow to input data into models and develop visual search methods for real-life situations. You will learn how to input images and their categories into the TensorFlow model using the Keras data generator and TensorFlow tf.data API and then cut a portion of pretrained model and add your own model content at the end to develop your own classifier. The idea behind these exercises is to learn how to code in TensorFlow for the neural network models you learned about in the fourth and fifth chapters.

Chapter 7, Object Detection Using YOLO, introducing two single-stage, fast object detection methods—You Only Look Once (YOLO) and RetinaNet. In this chapter, you will learn about different YOLO models, finding out how to change their configuration parameters and make inferences with them. You will also learn how to process your own images to train a custom YOLO v3 model using Darknet.

Chapter 8, Semantic Segmentation and Neural Style Transfer, discusses how deep neural network is used to segment images into spatial regions, thereby producing artificial images and transferring styles from one image to another. We will perform hands on exercise for semantic segmentation using TensorFlow DeepLab and write TensorFlow codes for neural style transfer in Google Colab. We will also generate artificial images using DCGAN and perform image inpainting using OpenCV.

Chapter 9, Action Recognition Using Multitask Deep Learning, explains how to develop multitask neural network models for the recognition of actions, such as the movement of a hand, mouth, head, or leg, to detect the type of action using a vision-based system. This will then be supplemented with a deep neural network model using cell phone accelerometer data to validate the action.

Chapter 10, Object Detection Using R-CNN, SSD, and R-FCN, marks the beginning of an end-to-end (E2E) object detection framework by developing a solid foundation of data ingestion and training pipeline followed by model development. Here, you will gain a deep insight into the various object detection models, such as R-CNN, single-shot detector (SSD), region-based fully convolutional networks (R-FCNs), and Mask R-CNN, and perform hands-on exercises using Google Cloud and Google Colab notebooks. We will also carry out a detailed exercise on how to train your own custom image to develop an object detection model using a TensorFlow object detection API. We will end the chapter with a deep overview of various object tracking methods and a hands-on exercise using Google Colab notebooks.

Chapter 11, Deep Learning on Edge Devices with CPU/GPU Optimization, discusses how to take the generated model and deploy it on edge devices and production systems. This will result in a complete end-to-end TensorFlow object detection model implementation. In particular, TensorFlow models have been developed, converted, and optimized using the TensorFlow Lite and Intel OpenVisual Inference and Neural NetworkOptimization (VINO) architectures and deployed to Raspberry Pi, Android, and iPhone. Although this chapter focuses mainly on object detection on Raspberry Pi, Android, and iPhone, the approach discussed can be extended to image classification, style transfer, and action recognition for any edge devices under consideration.

Chapter 12,Cloud Computing Platform for Computer Vision, discusses how to package your application for training and deployment in Google Cloud Platform (GCP), Amazon Web Services (AWS), and the Microsoft Azure cloud platform. You will learn how to prepare your data, upload to cloud data storage, and begin to monitor the training. You will also learn how to send an image or an image vector to the cloud platform for analysis and get a JSON response back. This chapter discusses a single application as well as running distributed TensorFlow on the compute engine. After training is complete, this chapter will discuss how to evaluate your model and integrate it into your application to operate at scale.

To get the most out of this book

If you are a beginner in computer vision and TensorFlow and you're trying to master the subject, it is better to go through the book's chapters in sequence rather than jumping around. The book slowly builds on the concepts of computer vision and neural networks and then ends with a code sample. Be sure to get a good grasp of the concepts and architecture presented and then apply the code sample.

We could not upload our image data to GitHub due to size limitations. You can either use images from your own camera or download image datasets from Kaggle:

Food images (for the burger-and-fries sample): Take photos using your cell phone camera.

Kaggle furniture detector:

https://www.kaggle.com/akkithetechie/furniture-detector

If you do not understand a concept at first, revisit it and also read any cited papers.

Most of the code is written in Jupyter Notebook environments, so make sure that you have downloaded Anaconda. You also need to download TensorFlow 2.0 – follow the instructions in Chapter 1, Computer Vision and TensorFlow Fundamentals, for that.

Much of the object detection training is done using Google Colab – Chapter 10, Object Detection Using R-CNN, SSD and R-FCN, and Chapter 11, Deep Learning on Edge with CPU/GPU Optimization, provide explanations of how to use Google Colab.

If you want to deploy your computer vision code to edge devices and you're thinking about what to purchase, visit Chapter 11, Deep Learning on Edge Devices with CPU/GPU Optimization, for a detailed analysis of various devices.

The book relies heavily on terminal usage – make sure you have developed a basic understanding of that before reading anything from Chapter 7, Object Detection Using YOLO, onward.

Chapter 12, Cloud Computing Platform for Computer Vision, deals with cloud computing, so you must have an Amazon Web Services, Azure, or Google Cloud Platform account for this. Cloud computing can get expensive if you are not keeping track of your hours. Many providers give you free access to services for some time, but after that, charges can go up if your project is still open, even if you are not training. Remember to shut down your project before you end your account to stop accruing charges. If you have technical questions on cloud computing and are stuck, then you can read the documentation of the relevant cloud computing platform. Also, you can open a technical work ticket for a fee; typically, they are addressed within 1-2 business days.

The best way to get the most out of this book is to read the theory, get an understanding of why a model is developed the way it is, try the sample exercises, and then update the code to suit your needs.

If you have any questions about any section of the book and get stuck, you can always contact me on LinkedIn (https://www.linkedin.com/in/krish-kar-554739b2/ext).

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

www.packt.com

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

Enter the name of the book in the

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Computer-Vision-with-TensorFlow-2.0. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838827069_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1: Introduction to Computer Vision and Neural Networks

In this section, you will develop your understanding of the theory as well as learn hands-on techniques about the application of a convolutional neural network for image processing. You will learn key concepts such as image filtering, feature maps, edge detection, the convolutional operation, activation functions, and the use of fully connected and softmax layers in relation to image classification and object detection. The chapters provide many hands-on examples of an end-to-end computer vision pipeline using TensorFlow, Keras, and OpenCV. The most important learning that you will take from these chapters is to develop an understanding and intuition behind different convolutional operations – how images are transformed through different layers of convolutional neural networks.

By the end of this section, you will be able to do the following:

Understand how image filters transform an image (chapter 1)

Apply various types of image filters for edge detection (

chapter

Detect simple objects using OpenCV contour detection and

Histogram of Oriented Gradients

(

HOG

) (

Chapter

Find the similarity between objects using

Scale-invariant feature transform

(

SIFT

Local Binary Patterns

(

LBP

) pattern matching, and color matching (

chapters

1 and 2)

Face detection using the OpenCV cascade detector (

chapter

Input big data into a neural network from

a CSV file list and parse the data to recognize columns, which can then be fed to the neural network as

and

values (

chapter

Facial keypoint and facial expression recognition (

chapter

Develop an annotation file for facial keypoints (

chapter

Input big data into a neural network from files using the Keras data generator method (

chapter

Construct your own neural network and optimize its parameters to improve accuracy (chapter 4)

Write code to transform an image through different layers of the convolutional neural network (chapter 4)

This section comprises the following chapters:

Chapter 1

Computer Vision and TensorFlow Fundamentals

Chapter 2

Content Recognition Using Local Binary Pattern

Chapter 3

Facial Detection Using OpenCV and CNN

Chapter 4

Deep Learning on Images

Computer Vision and TensorFlow Fundamentals

Computer vision is rapidly expanding in many different applications as traditional techniques, such as image thresholding, filtering, and edge detection, have been augmented by deep learning methods. TensorFlow is a widely used, powerful machine learning tool created by Google. It has user configurable APIs available to train and build complex neural network model in your local PC or in the cloud and optimize and deploy at scale in edge devices.

In this chapter, you will gain an understanding of advanced computer vision concepts using TensorFlow. This chapter discusses the foundational concepts of computer vision and TensorFlow to prepare you for the later, more advanced chapters of this book. We will look at how to perform image hashing and filtering. Then, we will learn about various methods of feature extraction and image retrieval. Moving on, we will learn about visual search in applications, its methods, and the challenges we might face. Then, we will look at an overview of the high-level TensorFlow software and its different components and subsystems.

The topics we will be covering in this chapter are as follows:

Detecting edges using image hashing and filtering

Extracting features from an image

Object detection using Contours and the HOG detector

An overview of TensorFlow, its ecosystem, and installation

Technical requirements

If you have not done so already, install Anaconda from https://www.anaconda.com. Anaconda is a package manager for Python. You also need to install OpenCV for all of the computer vision work you will be carrying out, using pip install opencv-python. OpenCV is a library of built-in programming functions for computer vision work.

Using a Bayer filter for color pattern formation

A Bayer filter transforms a raw image into a natural, color-processed image by applying a demosaic algorithm. The image sensor consists of photodiodes, which produce electrically charged photons proportional to the brightness of the light. The photodiodes are grayscale in nature. Bayer filters are used to convert the grayscale image to color. The color image from the Bayer filter goes through an Image Signal Processing (ISP) which involves several weeks of manual adjustment of various parameters to produce desired image quality for human vision. Several research work are currently ongoing to convert the manual ISP to a CNN based processing to produce an image and then merge the CNN with image classification or object detection model to produce one coherent neural network pipeline that takes Bayer color image and detects object with bounding boxes. Details of such work can be found in the 2019 paper by Sivalogeswaran Ratnasingam titled Deep Camera: A Fully Convolutional Neural Network for Image Signal Processing. The link for the paper is shown here: http://openaccess.thecvf.com/content_ICCVW_2019/papers/LCI/Ratnasingam_Deep_Camera_A_Fully_Convolutional_Neural_Network_for_Image_Signal_ICCVW_2019_paper.pdf.

Here is an example of a Bayer filter:

In the preceding diagram, we can observe the following:

The Bayer filter consists of

Red

(

Green

(

), and

Blue

(

) channels in a predefined pattern, such that there is twice the number of G channels compared to B and R.

The G, R, and B channels are alternately distributed.

Most channel combinations are RGGB, GRGB, or RGBG.

Each channel will only let a specific color to pass through, the combination of colors from different

channels

produce a pattern as shown in the preceding image.

Image smoothing

In image smoothing, the high-frequency noise from an image is removed by applying low-pass filters, such as the following:

A mean filter

A median filter

A Gaussian filter

This blurs the image and is performed by applying a pixel whose end values do not change sign and do not differ in value appreciably.

Image filtering is typically done by sliding a box filter over an image. A box filter is represented by an kernel divided by (n*m), where n is the number of rows and m is the number of columns. For a 3 x 3 kernel this looks as follows:

Let's say this kernel is applied to the RGB image described previously. For reference, the 3 x 3 image value is shown here:

The mean filter

The mean filter filters the image with an average value after the convolution operation of the box kernel is carried out with the image. The resulting array after matrix multiplication will be as follows:

The mean value is 42 and replaces the center intensity value of 166 in the image, as you can see in the following array. The remaining values of the image will be converted in a similar manner:

The median filter

The median filter filters the image value with the median value after the convolution operation of the box kernel is carried out on the image. The resulting array after matrix multiplication will be as follows:

The median value is 48 and replaces the center intensity value of 166 in the image, as shown in the following array. The remaining values of the image will be converted in a similar manner:

The Gaussian filter

The Gaussian kernel is represented by the following equation:

is the standard deviation of the distribution and k is the kernel size.

For the standard deviation () of 1, and the 3 x 3 kernel (k3), the Gaussian kernel looks as follows:

In this example, when the Gaussian kernel is applied, the image is transformed as follows:

So, in this case, the center intensity value is 54. Compare this value with the median and mean filter values.

Image gradient

The image gradient calculates the change in pixel intensity in a given direction. The change in pixel intensity is obtained by performing a convolution operation on an image with a kernel, as shown here:

The kernel is chosen such that the two extreme rows or columns have opposite signs (positive and negative) so it produces a difference operator when multiplying and summing across the image pixel. Let's take a look at the following example:

The horizontal kernel

The vertical kernel

The image gradient described here is a fundamental concept for computer vision:

The image gradient can be calculated in both the

and

directions.

By using the image gradient, edges and corners are determined.

The edges and corners pack a lot of information about the shape or feature of an image.

So, the image gradient is a mechanism that converts lower-order pixel information to higher-order image features, which is used by convolution operation for image classification.

Image sharpening

In image sharpening, the low-frequency noise from an image is removed by applying a high-pass filter (difference operator), which results in the line structure and edges becoming more visible. Image sharpening is also known as a Laplace operation, which is represented by the second derivative, shown here:

Because of the difference operator, the four adjacent cells relative to the midpoint of the kernel always have opposite signs. So, if the midpoint of the kernel is positive, the four adjacent cells are negative, and vice versa. Let's take a look at the following example:

Note that the advantage of the second-order derivative over the first-order derivative is that the second-order derivative will always go through zero crossings. So, the edges can be determined by looking at the zero-crossing point (0 value) rather than the magnitude of the gradients (which can change from image to image and within a given image) for the first-order gradient.

Mixing the Gaussian and Laplacian operations

So far, you have learned that the Gaussian operation blurs the image and the Laplacian operation sharpens the image. But why do we need each operation, and in what situation is each operation used?

An image consists of characteristics, features, and other non-feature objects. Image recognition is all about extracting features from an image and eliminating the non-feature objects. We recognize an image as a particular object, such as a car, because its features are more prominent compared to its non-features. Gaussian filtering is the method of suppressing the non-features from the features, which blurs the image.

Applying it multiple times blurs the image more and suppresses both the features and the non-features. But since the features are stronger, they can be extracted by applying Laplacian gradients. This is the reason why we convolve two or more times with a Gaussian kernel of sigma and then apply the Laplacian operation to distinctly show the features. This is a common technique used in most convolution operations for object detection.

The following figure shows the input 3 x 3 image section, the kernel value, the output value after the convolution operation, and the resulting image:

The preceding figure shows various Gaussian and oblique kernels and how a 3 x 3 section of the image is transformed by applying the kernel. The following figure is a continuation of the preceding one:

The preceding representation clearly shows how the image becomes more blurred or sharp based on the type of convolution operation. This comprehension of the convolution operation is fundamental as we learn more about using the CNN to optimize kernel selection in various stages of the CNN.

Detecting edges in an image

Edge detection is the most fundamental way of processing in computer vision to find features in an image based on the change in brightness and image intensity. A change in brightness results from discontinuity in depth, orientation, illumination, or corners. The edge detection method can be based on the first or second order:

The following graph illustrates the edge detection mechanism graphically:

Here, you can see that the intensity of the image changes from dark to bright around the midway point, so the edge of the image is at the middle point. The first derivative (the intensity gradient) goes up and then down at the midway point, so the edge detection can be calculated by looking at the maximum value of the first derivative. However, the problem with the first derivative method is, depending on the input function, the maximum value can change, so the threshold value of the maximum value cannot be predetermined. However, the second derivative, as shown, always goes through zero points at the edges.

Sobel and Canny are the first-order edge detection methods, while the second-order method is a Laplacian edge detector.