40,81 €
Apply neural network architectures to build state-of-the-art computer vision applications using the Python programming language
Key Features
Book Description
Computer vision allows machines to gain human-level understanding to visualize, process, and analyze images and videos. This book focuses on using TensorFlow to help you learn advanced computer vision tasks such as image acquisition, processing, and analysis. You'll start with the key principles of computer vision and deep learning to build a solid foundation, before covering neural network architectures and understanding how they work rather than using them as a black box. Next, you'll explore architectures such as VGG, ResNet, Inception, R-CNN, SSD, YOLO, and MobileNet. As you advance, you'll learn to use visual search methods using transfer learning. You'll also cover advanced computer vision concepts such as semantic segmentation, image inpainting with GAN's, object tracking, video segmentation, and action recognition. Later, the book focuses on how machine learning and deep learning concepts can be used to perform tasks such as edge detection and face recognition. You'll then discover how to develop powerful neural network models on your PC and on various cloud platforms. Finally, you'll learn to perform model optimization methods to deploy models on edge devices for real-time inference. By the end of this book, you'll have a solid understanding of computer vision and be able to confidently develop models to automate tasks.
What you will learn
Who this book is for
This book is for computer vision professionals, image processing professionals, machine learning engineers and AI developers who have some knowledge of machine learning and deep learning and want to build expert-level computer vision applications. In addition to familiarity with TensorFlow, Python knowledge will be required to get started with this book.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 448
Veröffentlichungsjahr: 2020
Copyright © 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor:Mrinmayee KawalkarAcquisition Editor:Nelson MorrisContent Development Editor:Nazia ShaikhSenior Editor: Ayaan HodaTechnical Editor: Utkarsha S. KadamCopy Editor: Safis EditingProject Coordinator:Aishwarya MohanProofreader: Safis EditingIndexer:Manju ArasanProduction Designer:Nilesh Mohite
First published: May 2020
Production reference: 1140520
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-83882-706-9
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Krishnendu Kar is passionate about research on computer vision and solving AI problems to make our life simpler. His core expertise is deep learning - computer vision, IoT, and agile software development. Krish is also a passionate app developer and has a dash cam-based object and lane detection and turn by turn navigation and fitness app in the iOS app store - Nity Map AI Camera & Run timer.
Meng-Chieh Ling has a Ph.D. degree in theoretical condensed matter physics from Karlsruhe Institute of Technology in Germany. He switched from physics to data science to pursue a successful career. After working for AGT International in Darmstadt for 2 years, he joined CHECK24 Fashion as a data scientist in Düsseldorf. His responsibilities include applying machine learning to improve the efficiency of data cleansing, automatic attribute tagging with deep learning, and developing image-based recommendation systems.
Amin Ahmadi Tazehkandi is an Iranian author, software engineer, and computer vision expert. He has worked at numerous software companies across the globe and has a long list of awards and achievements, including a countrywide hackathon win and an award-winning paper. Amin is an avid blogger and long-time contributor to the open source, cross-platform, and computer vision developer communities. He is the proud author of Computer Vision with OpenCV 3 and Qt5, and Hands-On Algorithms for Computer Vision.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Mastering Computer Vision with TensorFlow 2.x
About Packt
Why subscribe?
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Introduction to Computer Vision and Neural Networks
Computer Vision and TensorFlow Fundamentals
Technical requirements
Detecting edges using image hashing and filtering
Using a Bayer filter for color pattern formation
Creating an image vector
Transforming an image
Linear filtering—convolution with kernels
Image smoothing
The mean filter
The median filter
The Gaussian filter
Image filtering with OpenCV
Image gradient
Image sharpening
Mixing the Gaussian and Laplacian operations
Detecting edges in an image
The Sobel edge detector
The Canny edge detector
Extracting features from an image
Image matching using OpenCV
Object detection using Contours and the HOG detector
Contour detection
Detecting a bounding box
The HOG detector
Limitations of the contour detection method
An overview of TensorFlow, its ecosystem, and installation
TensorFlow versus PyTorch
TensorFlow Installation
Summary
Content Recognition Using Local Binary Patterns
Processing images using LBP
Generating an LBP pattern
Understanding the LBP histogram
Histogram comparison methods
The computational cost of LBP
Applying LBP to texture recognition
Matching face color with foundation color – LBP and its limitations
Matching face color with foundation color – color matching technique
Summary
Facial Detection Using OpenCV and CNN
Applying Viola-Jones AdaBoost learning and the Haar cascade classifier for face recognition
Selecting Haar-like features 
Creating an integral image
Running AdaBoost training
Attentional cascade classifiers
Training the cascade detector
Predicting facial key points using a deep neural network
Preparing the dataset for key-point detection
Processing key-point data
Preprocessing before being input into the Keras–Python code 
Preprocessing within the Keras–Python code 
Defining the model architecture
Training the model to make key point predictions
Predicting facial expressions using a CNN
Overview of 3D face detection
Overview of hardware design for 3D reconstruction
Overview of 3D reconstruction and tracking
Overview of parametric tracking
Summary
Deep Learning on Images
Understanding CNNs and their parameters
Convolution
Convolution over volume – 3 x 3 filter
Convolution over volume – 1 x 1 filter
Pooling
Padding 
Stride
Activation
Fully connected layers
Regularization
Dropout
Internal covariance shift and batch normalization 
Softmax
Optimizing CNN parameters
Baseline case
Iteration 1 – CNN parameter adjustment
Iteration 2 – CNN parameter adjustment
Iteration 3 – CNN parameter adjustment
Iteration 4 – CNN parameter adjustment
Visualizing the layers of a neural network
Building a custom image classifier model and visualizing its layers
Neural network input and parameters
Input image 
Defining the train and validation generators
Developing the model 
Compiling and training the model
Inputting a test image and converting it into a tensor
Visualizing the first layer of activation
Visualizing multiple layers of activation
Training an existing advanced image classifier model and visualizing its layers
Summary
Section 2: Advanced Concepts of Computer Vision with TensorFlow
Neural Network Architecture and Models
Overview of AlexNet
Overview of VGG16
Overview of Inception
GoogLeNet detection
Overview of ResNet
Overview of R-CNN
Image segmentation 
Clustering-based segmentation
Graph-based segmentation
Selective search
Region proposal
Feature extraction
Classification of the image
Bounding box regression
Overview of Fast R-CNN
Overview of Faster R-CNN
Overview of GANs
Overview of GNNs
Spectral GNN
Overview of Reinforcement Learning
Overview of Transfer Learning
Summary
Visual Search Using Transfer Learning
Coding deep learning models using TensorFlow
Downloading weights
Decoding predictions
Importing other common features
Constructing a model
Inputting images from a directory
Loop function for importing multiple images and processing using TensorFlow Keras
Developing a transfer learning model using TensorFlow
Analyzing and storing data
Importing TensorFlow libraries
Setting up model parameters
Building an input data pipeline
Training data generator
Validation data generator
Constructing the final model using transfer learning
Saving a model with checkpoints
Plotting training history
Understanding the architecture and applications of visual search
The architecture of visual search
Visual search code and explanation
Predicting the class of an uploaded image
Predicting the class of all images
Working with a visual search input pipeline using tf.data
Summary
Object Detection Using YOLO
An overview of YOLO
The concept of IOU
How does YOLO detect objects so fast?
The YOLO v3 neural network architecture
A comparison of YOLO and Faster R-CNN
An introduction to Darknet for object detection
Detecting objects using Darknet
Detecting objects using Tiny Darknet
Real-time prediction using Darknet
YOLO versus YOLO v2 versus YOLO v3 
When to train a model?
Training your own image set with YOLO v3 to develop a custom model
Preparing images
Generating annotation files
Converting .xml files to .txt files
Creating a combined train.txt and test.txt file
Creating a list of class name files
Creating a YOLO .data file
Adjusting the YOLO configuration file
Enabling the GPU for training
Start training
An overview of the Feature Pyramid Network and RetinaNet
Summary
Semantic Segmentation and Neural Style Transfer
Overview of TensorFlow DeepLab for semantic segmentation
Spatial Pyramid Pooling
Atrous convolution
Encoder-decoder network
Encoder module
Decoder module
Semantic segmentation in DeepLab – example
Google Colab, Google Cloud TPU, and TensorFlow
Artificial image generation using DCGANs
Generator
Discriminator
Training
Image inpainting using DCGAN
TensorFlow DCGAN – example
Image inpainting using OpenCV
Understanding neural style transfer
Summary
Section 3: Advanced Implementation of Computer Vision with TensorFlow
Action Recognition Using Multitask Deep Learning
Human pose estimation – OpenPose
Theory behind OpenPose 
Understanding the OpenPose code
Human pose estimation – stacked hourglass model
Understanding the hourglass model
Coding an hourglass model
argparse block
Training an hourglass network
Creating the hourglass network
Front module
Left half-block
Connect left to right
Right half-block
Head block
Hourglass training
Human pose estimation – PoseNet
Top-down approach
Bottom-up approach
PoseNet implementation
Applying human poses for gesture recognition
Action recognition using various methods
Recognizing actions based on an accelerometer
Combining video-based actions with pose estimation
Action recognition using the 4D method
Summary
Object Detection Using R-CNN, SSD, and R-FCN
An overview of SSD
An overview of R-FCN
An overview of the TensorFlow object detection API
Detecting objects using TensorFlow on Google Cloud
Detecting objects using TensorFlow Hub
Training a custom object detector using TensorFlow and Google Colab
Collecting and formatting images as .jpg files
Annotating images to create a .xml file
Separating the file by train and test folders
Configuring parameters and installing the required packages
Creating TensorFlow records
Preparing the model and configuring the training pipeline
Monitoring training progress using TensorBoard
TensorBoard running on a local machine
TensorBoard running on Google Colab
Training the model
Running an inference test
Caution when using the neural network model
An overview of Mask R-CNN and a Google Colab demonstration
Developing an object tracker model to complement the object detector
Centroid-based tracking
SORT tracking
DeepSORT tracking
The OpenCV tracking method
Siamese network-based tracking
SiamMask-based tracking
Summary
Section 4: TensorFlow Implementation at the Edge and on the Cloud
Deep Learning on Edge Devices with CPU/GPU Optimization
Overview of deep learning on edge devices
Techniques used for GPU/CPU optimization
Overview of MobileNet
Image processing with a Raspberry Pi
Raspberry Pi hardware setup
Raspberry Pi camera software setup
OpenCV installation in Raspberry Pi
OpenVINO installation in Raspberry Pi
Installing the OpenVINO toolkit components
Setting up the environmental variable
Adding a USB rule
Running inference using Python code
Advanced inference
Face detection, pedestrian detection, and vehicle detection
Landmark models
Models for action recognition
License plate, gaze, and person detection
Model conversion and inference using OpenVINO
Running inference in a Terminal using ncappzoo
Converting the pre-trained model for inference
Converting from a TensorFlow model developed using Keras
Converting a TensorFlow model developed using the TensorFlow Object Detection API
Summary of the OpenVINO Model inference process
Application of TensorFlow Lite
Converting a TensorFlow model into tflite format
Python API
TensorFlow Object Detection API – tflite_convert
TensorFlow Object Detection API – toco
Model optimization
Object detection on Android phones using TensorFlow Lite
Object detection on Raspberry Pi using TensorFlow Lite
Image classification
Object detection
Object detection on iPhone using TensorFlow Lite and Create ML
TensorFlow Lite conversion model for iPhone
Core ML
Converting a TensorFlow model into Core ML format
A summary of various annotation methods
Outsource labeling work to a third party
Automated or semi-automated labeling
Summary
Cloud Computing Platform for Computer Vision
Training an object detector in GCP
Creating a project in GCP
The GCP setup
The Google Cloud Storage bucket setup
Setting up a bucket using the GCP API
Setting up a bucket using Ubuntu Terminal
Setting up the Google Cloud SDK
Linking your terminal to the Google Cloud project and bucket
Installing the TensorFlow object detection API
Preparing the dataset
TFRecord and labeling map data
Data preparation
Data upload
The model.ckpt files
The model config file
Training in the cloud
Viewing the model output in TensorBoard
The model output and conversion into a frozen graph
Executing export tflite graph.py from Google Colab
Training an object detector in the AWS SageMaker cloud platform
Setting up an AWS account, billing, and limits
Converting a .xml file to JSON format
Uploading data to the S3 bucket
Creating a notebook instance and beginning training
Fixing some common failures during training
Training an object detector in the Microsoft Azure cloud platform
Creating an Azure account and setting up Custom Vision
Uploading training images and tagging them
Training at scale and packaging
Application packaging
The general idea behind cloud-based visual search
Analyzing images and search mechanisms in various cloud platforms
Visual search using GCP
Visual search using AWS
Visual search using Azure
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
Computer vision is a technique by which machines gain human-level ability to visualize, process, and analyze images or videos. This book will focus on using TensorFlow to develop and train deep neural networks to solve advanced computer vision problems and deploy solutions on mobile and edge devices.
You will start with the key principles of computer vision and deep learning and learn about various models and architectures, along with their pros and cons. You will cover various architectures, such as VGG, ResNet, Inception, R-CNN, YOLO, and many more. You will use various visual search methods using transfer learning. The book will help you to learn about various advanced concepts of computer vision, including semantic segmentation, image inpainting, object tracking, video segmentation, and action recognition. You will explore how various machine learning and deep learning concepts can be applied in computer vision tasks such as edge detection and face recognition. Later in the book, you will focus on performance tuning to optimize performance, deploying dynamic models to improve processing power, and scaling to handle various computer vision challenges.
By the end of the book, you will have an in-depth understanding of computer vision and will know how to develop models to automate tasks.
This book is for computer vision professionals, image processing professionals, machine learning engineers, and AI developers who have some knowledge of machine learning and deep learning and want to build expert-level computer vision applications. Familiarity with Python programming and TensorFlow will be required for this book.
Chapter 1, Computer Vision and TensorFlow Fundamentals, discusses the foundational concepts of computer vision and TensorFlow to prepare you for the later, more advanced chapters of this book. We will look at how to perform image hashing and filtering. Then, we will learn about various methods of feature extraction and image retrieval. Moving on, we will learn about contour-based object detection, histogram of oriented gradients and various feature matching methods. Then, we will look at an overview of the high-level TensorFlow software and its different components and subsystems. The chapter provides many hands-on coding exercises for object detection, image filtering and feature matching.
Chapter 2, Content Recognition Using Local Binary Patterns, discusses local binary feature descriptor and the histogram for the classification of textured and non-textured images. You will learn to tune local binary pattern (LBP) parameters and calculate histogram difference between LBPs to match identical pattern between images. The chapter provides two coding exercises – one for matching flooring patterns and the other for matching face color with foundation color.
Chapter 3, Facial Detection Using OpenCV and CNNs, starts with Viola-Jones face- and key-feature detection and move on to the advanced concept of the neural-network-based facial key points detection and facial expressions recognition. The chapter will end by looking at the advanced concept of 3D face detection. The chapter provides two coding exercise one for OpenCV based face detection in webcam and the other one is a CNN based end to end pipeline for facial key point detection. The end to end neural network pipeline consists of facial image collection by cropping face images from webcam, annotating key points in face image, data ingestion into a CNN, building a CNN model, training and finally evaluating the trained model of key points against face images.
Chapter 4, Deep Learning on Images, delves into how edge detection is used to create convolution operations over volume and how different convolution parameters such as filter size, dimensions, and operation type affect the convolution volume. This chapter will give you a very detailed overview of how a neural network sees an image and how it uses that visualization to classify images. The chapter provides a TensorFlow Keras based coding exercise to construct a neural network and visualize an image as it goes through its different layers. You will then compare the network model's accuracy and visualization to an advanced network such as VGG 16 or Inception.
Chapter 5, Neural Network Architecture and Models, explores different neural network architectures and models. This will give you an understanding of how the concepts learned in the first and fourth chapters are applied in various scenarios by changing the parameters for the convolution, pooling, activation, fully connected, and softmax layers. Hopefully, with these exercises, you will develop an understanding of a range of neural network models, which will give you a solid foundation as a computer vision engineer.
Chapter 6,Visual Search Using Transfer Learning, is where you are going to use TensorFlow to input data into models and develop visual search methods for real-life situations. You will learn how to input images and their categories into the TensorFlow model using the Keras data generator and TensorFlow tf.data API and then cut a portion of pretrained model and add your own model content at the end to develop your own classifier. The idea behind these exercises is to learn how to code in TensorFlow for the neural network models you learned about in the fourth and fifth chapters.
Chapter 7, Object Detection Using YOLO, introducing two single-stage, fast object detection methods—You Only Look Once (YOLO) and RetinaNet. In this chapter, you will learn about different YOLO models, finding out how to change their configuration parameters and make inferences with them. You will also learn how to process your own images to train a custom YOLO v3 model using Darknet.
Chapter 8, Semantic Segmentation and Neural Style Transfer, discusses how deep neural network is used to segment images into spatial regions, thereby producing artificial images and transferring styles from one image to another. We will perform hands on exercise for semantic segmentation using TensorFlow DeepLab and write TensorFlow codes for neural style transfer in Google Colab. We will also generate artificial images using DCGAN and perform image inpainting using OpenCV.
Chapter 9, Action Recognition Using Multitask Deep Learning, explains how to develop multitask neural network models for the recognition of actions, such as the movement of a hand, mouth, head, or leg, to detect the type of action using a vision-based system. This will then be supplemented with a deep neural network model using cell phone accelerometer data to validate the action.
Chapter 10, Object Detection Using R-CNN, SSD, and R-FCN, marks the beginning of an end-to-end (E2E) object detection framework by developing a solid foundation of data ingestion and training pipeline followed by model development. Here, you will gain a deep insight into the various object detection models, such as R-CNN, single-shot detector (SSD), region-based fully convolutional networks (R-FCNs), and Mask R-CNN, and perform hands-on exercises using Google Cloud and Google Colab notebooks. We will also carry out a detailed exercise on how to train your own custom image to develop an object detection model using a TensorFlow object detection API. We will end the chapter with a deep overview of various object tracking methods and a hands-on exercise using Google Colab notebooks.
Chapter 11, Deep Learning on Edge Devices with CPU/GPU Optimization, discusses how to take the generated model and deploy it on edge devices and production systems. This will result in a complete end-to-end TensorFlow object detection model implementation. In particular, TensorFlow models have been developed, converted, and optimized using the TensorFlow Lite and Intel OpenVisual Inference and Neural NetworkOptimization (VINO) architectures and deployed to Raspberry Pi, Android, and iPhone. Although this chapter focuses mainly on object detection on Raspberry Pi, Android, and iPhone, the approach discussed can be extended to image classification, style transfer, and action recognition for any edge devices under consideration.
Chapter 12,Cloud Computing Platform for Computer Vision, discusses how to package your application for training and deployment in Google Cloud Platform (GCP), Amazon Web Services (AWS), and the Microsoft Azure cloud platform. You will learn how to prepare your data, upload to cloud data storage, and begin to monitor the training. You will also learn how to send an image or an image vector to the cloud platform for analysis and get a JSON response back. This chapter discusses a single application as well as running distributed TensorFlow on the compute engine. After training is complete, this chapter will discuss how to evaluate your model and integrate it into your application to operate at scale.
If you are a beginner in computer vision and TensorFlow and you're trying to master the subject, it is better to go through the book's chapters in sequence rather than jumping around. The book slowly builds on the concepts of computer vision and neural networks and then ends with a code sample. Be sure to get a good grasp of the concepts and architecture presented and then apply the code sample.
We could not upload our image data to GitHub due to size limitations. You can either use images from your own camera or download image datasets from Kaggle:
Food images (for the burger-and-fries sample): Take photos using your cell phone camera.
Kaggle furniture detector:
https://www.kaggle.com/akkithetechie/furniture-detector
If you do not understand a concept at first, revisit it and also read any cited papers.
Most of the code is written in Jupyter Notebook environments, so make sure that you have downloaded Anaconda. You also need to download TensorFlow 2.0 – follow the instructions in Chapter 1, Computer Vision and TensorFlow Fundamentals, for that.
Much of the object detection training is done using Google Colab – Chapter 10, Object Detection Using R-CNN, SSD and R-FCN, and Chapter 11, Deep Learning on Edge with CPU/GPU Optimization, provide explanations of how to use Google Colab.
If you want to deploy your computer vision code to edge devices and you're thinking about what to purchase, visit Chapter 11, Deep Learning on Edge Devices with CPU/GPU Optimization, for a detailed analysis of various devices.
The book relies heavily on terminal usage – make sure you have developed a basic understanding of that before reading anything from Chapter 7, Object Detection Using YOLO, onward.
Chapter 12, Cloud Computing Platform for Computer Vision, deals with cloud computing, so you must have an Amazon Web Services, Azure, or Google Cloud Platform account for this. Cloud computing can get expensive if you are not keeping track of your hours. Many providers give you free access to services for some time, but after that, charges can go up if your project is still open, even if you are not training. Remember to shut down your project before you end your account to stop accruing charges. If you have technical questions on cloud computing and are stuck, then you can read the documentation of the relevant cloud computing platform. Also, you can open a technical work ticket for a fee; typically, they are addressed within 1-2 business days.
The best way to get the most out of this book is to read the theory, get an understanding of why a model is developed the way it is, try the sample exercises, and then update the code to suit your needs.
If you have any questions about any section of the book and get stuck, you can always contact me on LinkedIn (https://www.linkedin.com/in/krish-kar-554739b2/ext).
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Computer-Vision-with-TensorFlow-2.0. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838827069_ColorImages.pdf.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
In this section, you will develop your understanding of the theory as well as learn hands-on techniques about the application of a convolutional neural network for image processing. You will learn key concepts such as image filtering, feature maps, edge detection, the convolutional operation, activation functions, and the use of fully connected and softmax layers in relation to image classification and object detection. The chapters provide many hands-on examples of an end-to-end computer vision pipeline using TensorFlow, Keras, and OpenCV. The most important learning that you will take from these chapters is to develop an understanding and intuition behind different convolutional operations – how images are transformed through different layers of convolutional neural networks.
By the end of this section, you will be able to do the following:
Understand how image filters transform an image (chapter 1)
Apply various types of image filters for edge detection (
chapter
1)
Detect simple objects using OpenCV contour detection and
Histogram of Oriented Gradients
(
HOG
) (
Chapter
1)
Find the similarity between objects using
Scale-invariant feature transform
(
SIFT
),
Local Binary Patterns
(
LBP
) pattern matching, and color matching (
chapters
1 and 2)
Face detection using the OpenCV cascade detector (
chapter
3)
Input big data into a neural network from
a CSV file list and parse the data to recognize columns, which can then be fed to the neural network as
x
and
y
values (
chapter
3)
Facial keypoint and facial expression recognition (
chapter
3)
Develop an annotation file for facial keypoints (
chapter
3)
Input big data into a neural network from files using the Keras data generator method (
chapter
4)
Construct your own neural network and optimize its parameters to improve accuracy (chapter 4)
Write code to transform an image through different layers of the convolutional neural network (chapter 4)
This section comprises the following chapters:
Chapter 1
,
Computer Vision and TensorFlow Fundamentals
Chapter 2
,
Content Recognition Using Local Binary Pattern
Chapter 3
,
Facial Detection Using OpenCV and CNN
Chapter 4
,
Deep Learning on Images
Computer vision is rapidly expanding in many different applications as traditional techniques, such as image thresholding, filtering, and edge detection, have been augmented by deep learning methods. TensorFlow is a widely used, powerful machine learning tool created by Google. It has user configurable APIs available to train and build complex neural network model in your local PC or in the cloud and optimize and deploy at scale in edge devices.
In this chapter, you will gain an understanding of advanced computer vision concepts using TensorFlow. This chapter discusses the foundational concepts of computer vision and TensorFlow to prepare you for the later, more advanced chapters of this book. We will look at how to perform image hashing and filtering. Then, we will learn about various methods of feature extraction and image retrieval. Moving on, we will learn about visual search in applications, its methods, and the challenges we might face. Then, we will look at an overview of the high-level TensorFlow software and its different components and subsystems.
The topics we will be covering in this chapter are as follows:
Detecting edges using image hashing and filtering
Extracting features from an image
Object detection using Contours and the HOG detector
An overview of TensorFlow, its ecosystem, and installation
If you have not done so already, install Anaconda from https://www.anaconda.com. Anaconda is a package manager for Python. You also need to install OpenCV for all of the computer vision work you will be carrying out, using pip install opencv-python. OpenCV is a library of built-in programming functions for computer vision work.
A Bayer filter transforms a raw image into a natural, color-processed image by applying a demosaic algorithm. The image sensor consists of photodiodes, which produce electrically charged photons proportional to the brightness of the light. The photodiodes are grayscale in nature. Bayer filters are used to convert the grayscale image to color. The color image from the Bayer filter goes through an Image Signal Processing (ISP) which involves several weeks of manual adjustment of various parameters to produce desired image quality for human vision. Several research work are currently ongoing to convert the manual ISP to a CNN based processing to produce an image and then merge the CNN with image classification or object detection model to produce one coherent neural network pipeline that takes Bayer color image and detects object with bounding boxes. Details of such work can be found in the 2019 paper by Sivalogeswaran Ratnasingam titled Deep Camera: A Fully Convolutional Neural Network for Image Signal Processing. The link for the paper is shown here: http://openaccess.thecvf.com/content_ICCVW_2019/papers/LCI/Ratnasingam_Deep_Camera_A_Fully_Convolutional_Neural_Network_for_Image_Signal_ICCVW_2019_paper.pdf.
Here is an example of a Bayer filter:
In the preceding diagram, we can observe the following:
The Bayer filter consists of
Red
(
R
),
Green
(
G
), and
Blue
(
B
) channels in a predefined pattern, such that there is twice the number of G channels compared to B and R.
The G, R, and B channels are alternately distributed.
Most channel combinations are RGGB, GRGB, or RGBG.
Each channel will only let a specific color to pass through, the combination of colors from different
channels
produce a pattern as shown in the preceding image.
In image smoothing, the high-frequency noise from an image is removed by applying low-pass filters, such as the following:
A mean filter
A median filter
A Gaussian filter
This blurs the image and is performed by applying a pixel whose end values do not change sign and do not differ in value appreciably.
Image filtering is typically done by sliding a box filter over an image. A box filter is represented by an kernel divided by (n*m), where n is the number of rows and m is the number of columns. For a 3 x 3 kernel this looks as follows:
Let's say this kernel is applied to the RGB image described previously. For reference, the 3 x 3 image value is shown here:
The mean filter filters the image with an average value after the convolution operation of the box kernel is carried out with the image. The resulting array after matrix multiplication will be as follows:
The mean value is 42 and replaces the center intensity value of 166 in the image, as you can see in the following array. The remaining values of the image will be converted in a similar manner:
The median filter filters the image value with the median value after the convolution operation of the box kernel is carried out on the image. The resulting array after matrix multiplication will be as follows:
The median value is 48 and replaces the center intensity value of 166 in the image, as shown in the following array. The remaining values of the image will be converted in a similar manner:
The Gaussian kernel is represented by the following equation:
is the standard deviation of the distribution and k is the kernel size.
For the standard deviation () of 1, and the 3 x 3 kernel (k3), the Gaussian kernel looks as follows:
In this example, when the Gaussian kernel is applied, the image is transformed as follows:
So, in this case, the center intensity value is 54. Compare this value with the median and mean filter values.
The image gradient calculates the change in pixel intensity in a given direction. The change in pixel intensity is obtained by performing a convolution operation on an image with a kernel, as shown here:
The kernel is chosen such that the two extreme rows or columns have opposite signs (positive and negative) so it produces a difference operator when multiplying and summing across the image pixel. Let's take a look at the following example:
The horizontal kernel
:
The vertical kernel
:
The image gradient described here is a fundamental concept for computer vision:
The image gradient can be calculated in both the
x
and
y
directions.
By using the image gradient, edges and corners are determined.
The edges and corners pack a lot of information about the shape or feature of an image.
So, the image gradient is a mechanism that converts lower-order pixel information to higher-order image features, which is used by convolution operation for image classification.
In image sharpening, the low-frequency noise from an image is removed by applying a high-pass filter (difference operator), which results in the line structure and edges becoming more visible. Image sharpening is also known as a Laplace operation, which is represented by the second derivative, shown here:
Because of the difference operator, the four adjacent cells relative to the midpoint of the kernel always have opposite signs. So, if the midpoint of the kernel is positive, the four adjacent cells are negative, and vice versa. Let's take a look at the following example:
Note that the advantage of the second-order derivative over the first-order derivative is that the second-order derivative will always go through zero crossings. So, the edges can be determined by looking at the zero-crossing point (0 value) rather than the magnitude of the gradients (which can change from image to image and within a given image) for the first-order gradient.
So far, you have learned that the Gaussian operation blurs the image and the Laplacian operation sharpens the image. But why do we need each operation, and in what situation is each operation used?
An image consists of characteristics, features, and other non-feature objects. Image recognition is all about extracting features from an image and eliminating the non-feature objects. We recognize an image as a particular object, such as a car, because its features are more prominent compared to its non-features. Gaussian filtering is the method of suppressing the non-features from the features, which blurs the image.
Applying it multiple times blurs the image more and suppresses both the features and the non-features. But since the features are stronger, they can be extracted by applying Laplacian gradients. This is the reason why we convolve two or more times with a Gaussian kernel of sigma and then apply the Laplacian operation to distinctly show the features. This is a common technique used in most convolution operations for object detection.
The following figure shows the input 3 x 3 image section, the kernel value, the output value after the convolution operation, and the resulting image:
The preceding figure shows various Gaussian and oblique kernels and how a 3 x 3 section of the image is transformed by applying the kernel. The following figure is a continuation of the preceding one:
The preceding representation clearly shows how the image becomes more blurred or sharp based on the type of convolution operation. This comprehension of the convolution operation is fundamental as we learn more about using the CNN to optimize kernel selection in various stages of the CNN.
Edge detection is the most fundamental way of processing in computer vision to find features in an image based on the change in brightness and image intensity. A change in brightness results from discontinuity in depth, orientation, illumination, or corners. The edge detection method can be based on the first or second order:
The following graph illustrates the edge detection mechanism graphically:
Here, you can see that the intensity of the image changes from dark to bright around the midway point, so the edge of the image is at the middle point. The first derivative (the intensity gradient) goes up and then down at the midway point, so the edge detection can be calculated by looking at the maximum value of the first derivative. However, the problem with the first derivative method is, depending on the input function, the maximum value can change, so the threshold value of the maximum value cannot be predetermined. However, the second derivative, as shown, always goes through zero points at the edges.
Sobel and Canny are the first-order edge detection methods, while the second-order method is a Laplacian edge detector.
