40,81 €
A practical guide to understanding the core machine learning and deep learning algorithms, and implementing them to create intelligent image processing systems using OpenCV 4
Key Features
Book Description
OpenCV is an opensource library for building computer vision apps. The latest release, OpenCV 4, offers a plethora of features and platform improvements that are covered comprehensively in this up-to-date second edition.
You'll start by understanding the new features and setting up OpenCV 4 to build your computer vision applications. You will explore the fundamentals of machine learning and even learn to design different algorithms that can be used for image processing. Gradually, the book will take you through supervised and unsupervised machine learning. You will gain hands-on experience using scikit-learn in Python for a variety of machine learning applications. Later chapters will focus on different machine learning algorithms, such as a decision tree, support vector machines (SVM), and Bayesian learning, and how they can be used for object detection computer vision operations. You will then delve into deep learning and ensemble learning, and discover their real-world applications, such as handwritten digit classification and gesture recognition. Finally, you'll get to grips with the latest Intel OpenVINO for building an image processing system.
By the end of this book, you will have developed the skills you need to use machine learning for building intelligent computer vision applications with OpenCV 4.
What you will learn
Who this book is for
This book is for Computer Vision professionals, machine learning developers, or anyone who wants to learn machine learning algorithms and implement them using OpenCV 4. If you want to build real-world Computer Vision and image processing applications powered by machine learning, then this book is for you. Working knowledge of Python programming is required to get the most out of this book.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 498
Veröffentlichungsjahr: 2019
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Pawan RamchandaniAcquisition Editor: Aniruddha PatilContent Development Editor:Pratik AndradeSenior Editor: Ayaan HodaTechnical Editor: Dinesh PawarCopy Editor: Safis EditingProject Coordinator:Vaidehi SawantProofreader: Safis EditingIndexer:Rekha NairProduction Designer:Alishon Mendonsa
First published: July 2017 Second edition: September 2019
Production reference: 1060919
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78953-630-0
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Aditya Sharma is a senior engineer at Robert Bosch working on solving real-world autonomous computer vision problems. At Robert Bosch, he also secured first place at an AI hackathon 2019. He has been associated with some of the premier institutes of India, including IIT Mandi and IIIT Hyderabad. At IIT, he published papers on medical imaging using deep learning at ICIP 2019 and MICCAI 2019. At IIIT, his work revolved around document image super-resolution.
He is a motivated writer and has written many articles on machine learning and deep learning for DataCamp and LearnOpenCV. Aditya runs his own YouTube channel and has contributed as a speaker at the NCVPRIPG conference (2017) and Aligarh Muslim University for a workshop on deep learning.
Vishwesh Ravi Shrimali graduated from BITS Pilani, where he studied mechanical engineering, in 2018. Since then, he has been working with BigVision LLC on deep learning and computer vision and is also involved in creating official OpenCV courses. He has a keen interest in programming and AI and has applied that interest in mechanical engineering projects. He has also written multiple blogs on OpenCV and deep learning on LearnOpenCV, a leading blog on computer vision. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.
Michael Beyeler is a postdoctoral fellow in neuroengineering and data science at the University of Washington, where he is working on computational models of bionic vision in order to improve the perceptual experience of blind patients implanted with a retinal prosthesis (bionic eye). His work lies at the intersection of neuroscience, computer engineering, computer vision, and machine learning. He is also an active contributor to several open source software projects, and has professional programming experience in Python, C/C++, CUDA, MATLAB, and Android. Michael received a PhD in computer science from the University of California, Irvine, and an MSc in biomedical engineering and a BSc in electrical engineering from ETH Zurich, Switzerland.
Wilson Choo is a deep learning engineer working on deep learning modeling research. He has a deep interest in creating applications that implement deep learning, computer vision, and machine learning.
His past work includes the validation and benchmarking of Intel OpenVINO Toolkit algorithms, as well as custom Android OS validation. He has experience in integrating deep learning applications in different hardware and OSes. His native programming languages are Java, Python, and C++.
Robert B. Fisher has a PhD from the University of Edinburgh, where he also served as a college dean of research. He is currently the industrial liaison committee chair for the International Association for Pattern Recognition. His research covers topics mainly in high-level computer vision and 3D video analysis, which has led to 5 books and 300 peer-reviewed scientific articles or book chapters (Google H-index: 46). Most recently, he has been the coordinator of an EC-funded project that's developing a gardening robot. He has developed several online computer vision resources with over 1 million hits. He is a fellow of the International Association for Pattern Recognition and the British Machine Vision Association.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Machine Learning for OpenCV 4 Second Edition
About Packt
Why subscribe?
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Fundamentals of Machine Learning and OpenCV
A Taste of Machine Learning
Technical requirements
Getting started with machine learning
Problems that machine learning can solve
Getting started with Python
Getting started with OpenCV
Installation
Getting the latest code for this book
Getting to grips with Python's Anaconda distribution
Installing OpenCV in a conda environment
Verifying the installation
Getting a glimpse of OpenCV's ml module
Applications of machine learning
What's new in OpenCV 4.0?
Summary
Working with Data in OpenCV
Technical requirements
Understanding the machine learning workflow
Dealing with data using OpenCV and Python
Starting a new IPython or Jupyter session
Dealing with data using Python's NumPy package
Importing NumPy
Understanding NumPy arrays
Accessing single array elements by indexing
Creating multidimensional arrays
Loading external datasets in Python
Visualizing the data using Matplotlib
Importing Matplotlib
Producing a simple plot
Visualizing data from an external dataset
Dealing with data using OpenCV's TrainData container in C++
Summary
First Steps in Supervised Learning
Technical requirements
Understanding supervised learning
Having a look at supervised learning in OpenCV
Measuring model performance with scoring functions
Scoring classifiers using accuracy, precision, and recall
Scoring regressors using mean squared error, explained variance, and R squared
Using classification models to predict class labels
Understanding the k-NN algorithm
Implementing k-NN in OpenCV
Generating the training data
Training the classifier
Predicting the label of a new data point
Using regression models to predict continuous outcomes
Understanding linear regression
Linear regression in OpenCV
Using linear regression to predict Boston housing prices
Loading the dataset
Training the model
Testing the model
Applying Lasso and ridge regression
Classifying iris species using logistic regression
Understanding logistic regression
Loading the training data
Making it a binary classification problem
Inspecting the data
Splitting data into training and test sets
Training the classifier
Testing the classifier
Summary
Representing Data and Engineering Features
Technical requirements
Understanding feature engineering
Preprocessing data
Standardizing features
Normalizing features
Scaling features to a range
Binarizing features
Handling the missing data
Understanding dimensionality reduction
Implementing Principal Component Analysis (PCA) in OpenCV
Implementing independent component analysis (ICA)
Implementing non-negative matrix factorization (NMF)
Visualizing the dimensionality reduction using t-Distributed Stochastic Neighbor Embedding (t-SNE)
Representing categorical variables
Representing text features
Representing images
Using color spaces
Encoding images in the RGB space
Encoding images in the HSV and HLS space
Detecting corners in images
Using the star detector and BRIEF descriptor
Using Oriented FAST and Rotated BRIEF (ORB)
Summary
Section 2: Operations with OpenCV
Using Decision Trees to Make a Medical Diagnosis
Technical requirements
Understanding decision trees
Building our first decision tree
Generating new data
Understanding the task by understanding the data
Preprocessing the data
Constructing the tree
Visualizing a trained decision tree
Investigating the inner workings of a decision tree
Rating the importance of features
Understanding the decision rules
Controlling the complexity of decision trees
Using decision trees to diagnose breast cancer
Loading the dataset
Building the decision tree
Using decision trees for regression
Summary
Detecting Pedestrians with Support Vector Machines
Technical requirement
Understanding linear SVMs
Learning optimal decision boundaries
Implementing our first SVM
Generating the dataset
Visualizing the dataset
Preprocessing the dataset
Building the support vector machine
Visualizing the decision boundary
Dealing with nonlinear decision boundaries
Understanding the kernel trick
Knowing our kernels
Implementing nonlinear SVMs
Detecting pedestrians in the wild
Obtaining the dataset
Taking a glimpse at the histogram of oriented gradients (HOG)
Generating negatives
Implementing the SVM
Bootstrapping the model
Detecting pedestrians in a larger image
Further improving the model
Multiclass classification using SVMs
About the data
Attribute information
Summary
Implementing a Spam Filter with Bayesian Learning
Technical requirements
Understanding Bayesian inference
Taking a short detour through probability theory
Understanding Bayes' theorem
Understanding the Naive Bayes classifier
Implementing your first Bayesian classifier
Creating a toy dataset
Classifying the data with a normal Bayes classifier
Classifying the data with a Naive Bayes classifier
Visualizing conditional probabilities
Classifying emails using the Naive Bayes classifier
Loading the dataset
Building a data matrix using pandas
Preprocessing the data
Training a normal Bayes classifier
Training on the full dataset
Using n-grams to improve the result
Using TF-IDF to improve the result
Summary
Discovering Hidden Structures with Unsupervised Learning
Technical requirements
Understanding unsupervised learning
Understanding k-means clustering
Implementing our first k-means example
Understanding expectation-maximization
Implementing our expectation-maximization solution
Knowing the limitations of expectation-maximization
The first caveat – no guarantee of finding the global optimum
The second caveat – we must select the number of clusters beforehand
The third caveat – cluster boundaries are linear
The fourth caveat – k-means is slow for a large number of samples
Compressing color spaces using k-means
Visualizing the true-color palette
Reducing the color palette using k-means
Classifying handwritten digits using k-means
Loading the dataset
Running k-means
Organizing clusters as a hierarchical tree
Understanding hierarchical clustering
Implementing agglomerative hierarchical clustering
Comparing clustering algorithms
Summary
Section 3: Advanced Machine Learning with OpenCV
Using Deep Learning to Classify Handwritten Digits
Technical requirements
Understanding the McCulloch-Pitts neuron
Understanding the perceptron
Implementing your first perceptron
Generating a toy dataset
Fitting the perceptron to data
Evaluating the perceptron classifier
Applying the perceptron to data that is not linearly separable
Understanding multilayer perceptrons
Understanding gradient descent
Training MLPs with backpropagation
Implementing a MLP in OpenCV
Preprocessing the data
Creating an MLP classifier in OpenCV
Customizing the MLP classifier
Training and testing the MLP classifier
Getting acquainted with deep learning
Getting acquainted with Keras
Classifying handwritten digits
Loading the MNIST dataset
Preprocessing the MNIST dataset
Training an MLP using OpenCV
Training a deep neural network using Keras
Preprocessing the MNIST dataset
Creating a convolutional neural network
Model summary
Fitting the model
Summary
Ensemble Methods for Classification
Technical requirements
Understanding ensemble methods
Understanding averaging ensembles
Implementing a bagging classifier
Implementing a bagging regressor
Understanding boosting ensembles
Weak learners
Implementing a boosting classifier
Implementing a boosting regressor
Understanding stacking ensembles
Combining decision trees into a random forest
Understanding the shortcomings of decision trees
Implementing our first random forest
Implementing a random forest with scikit-learn
Implementing extremely randomized trees
Using random forests for face recognition
Loading the dataset
Preprocessing the dataset
Training and testing the random forest
Implementing AdaBoost
Implementing AdaBoost in OpenCV
Implementing AdaBoost in scikit-learn
Combining different models into a voting classifier
Understanding different voting schemes
Implementing a voting classifier
Plurality
Summary
Selecting the Right Model with Hyperparameter Tuning
Technical requirements
Evaluating a model
Evaluating a model the wrong way
Evaluating a model in the right way
Selecting the best model
Understanding cross-validation
Manually implementing cross-validation in OpenCV
Using scikit-learn for k-fold cross-validation
Implementing leave-one-out cross-validation
Estimating robustness using bootstrapping
Manually implementing bootstrapping in OpenCV
Assessing the significance of our results
Implementing Student's t-test
Implementing McNemar's test
Tuning hyperparameters with grid search
Implementing a simple grid search
Understanding the value of a validation set
Combining grid search with cross-validation
Combining grid search with nested cross-validation
Scoring models using different evaluation metrics
Choosing the right classification metric
Choosing the right regression metric
Chaining algorithms together to form a pipeline
Implementing pipelines in scikit-learn
Using pipelines in grid searches
Summary
Using OpenVINO with OpenCV
Technical requirements
Introduction to OpenVINO
OpenVINO toolkit installation
OpenVINO components
Interactive face detection demo
Using OpenVINO Inference Engine with OpenCV
Using OpenVINO Model Zoo with OpenCV
Image classification using OpenCV with OpenVINO Inference Engine
Image classification using OpenVINO
Image classification using OpenCV with OpenVINO
Summary
Conclusion
Technical requirements
Approaching a machine learning problem
Building your own estimator
Writing your own OpenCV-based classifier in C++
Writing your own scikit-learn-based classifier in Python
Where to go from here
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
As the world changes and humans build smarter and better machines, the demand for machine learning and computer vision experts increases. Machine learning, as the name suggests, is the process of a machine learning to make predictions given a certain set of parameters as input. Computer vision, on the other hand, gives a machine vision; that is, it makes the machine aware of visual information. When you combine these technologies, you get a machine that can use visual data to make predictions, which brings machines one step closer to having human capabilities. When you add deep learning to it, the machine can even surpass human capabilities in terms of making predictions. This might seem far-fetched, but with AI systems taking over decision-based systems, this has actually become a reality. You have AI cameras, AI monitors, AI sound systems, AI-powered processors, and more. We cannot promise you that you will be able to build an AI camera after reading this book, but we do intend to provide you with the tools necessary for you to do so. The most powerful tool that we are going to introduce is the OpenCV library, which is the world's largest computer vision library. Even though its use in machine learning is not very common, we have provided some examples and concepts on how it can be used for machine learning. We have gone with a hands-on approach in this book and we recommend that you try out every single piece of code present in this book to build an application that showcases your knowledge. The world is changing and this book is our way of helping young minds change it for the better.
We have tried to explain all the concepts from scratch to make the book suitable for beginners as well as advanced readers. We recommend that readers have some basic knowledge of Python programming, but it's not mandatory. Whenever you encounter some Python syntax that you are not able to understand, make sure you look it up on the internet. Help is always provided to those who look for it.
Chapter 1, A Taste of Machine Learning, starts us off with installing the required software and Python modules for this book.
Chapter 2, Working with Data in OpenCV, takes a look at some basic OpenCV functions.
Chapter 3, First Steps in Supervised Learning, will cover the basics of supervised learning methods in machine learning. We will have a look at some examples of supervised learning methods using OpenCV and the scikit-learn library in Python.
Chapter 4, Representing Data and Engineering Features, will cover concepts such as feature detection and feature recognition using ORB in OpenCV. We will also try to understand important concepts such as the curse of dimensionality.
Chapter 5, Using Decision Trees to Make a Medical Diagnosis, will introduce decision trees and important concepts related to them, including the depth of trees and techniques such as pruning. We will also cover a practical application of predicting breast cancer diagnoses using decision trees.
Chapter 6, Detecting Pedestrians with Support Vector Machines, will start off with an introduction to support vector machines and how they can be implemented in OpenCV. We will also cover an application of pedestrian detection using OpenCV.
Chapter 7, Implementing a Spam Filter with Bayesian Learning, will discuss techniques such as the Naive Bayes algorithm, multinomial Naive Bayes, and more, as well as how they can be implemented. Finally, we will build a machine learning application to classify data into spam and ham.
Chapter 8, Discovering Hidden Structures with Unsupverised Learning, will be our first introduction to the second class of machine learning algorithms—unsupervised learning. We will discuss techniques such as clustering using k-nearest neighbors, k-means, and more.
Chapter 9, Using Deep Learning to Classify Handwritten Digits, will introduce deep learning techniques and we will see how we can use deep neural networks to classify images from the MNIST dataset.
Chapter 10, Ensemble Methods for Classification, will cover topics such as random forest, bagging, and boosting for classification purposes.
Chapter 11, Selecting the Right Model with Hyperparameter Tuning, will go over the process of selecting the optimum set of parameters in various machine learning methods in order to improve the performance of a model.
Chapter 12, Using OpenVINO with OpenCV, will introduce OpenVINO Toolkit, which was introduced in OpenCV 4.0. We will also go over how we can use it in OpenCV using image classification as an example.
Chapter 13, Conclusion, will provide a summary of the major topics that we have covered in the book and talk about what you can do next.
We recommend that you go through any good Python programming book or online tutorials or videos, if you are a beginner in Python. You can also have a look at DataCamp (http://www.datacamp.com) to learn Python using interactive lessons.
We also recommend that you learn some basic concepts about the Matplotlib library in Python. You can try out this tutorial for that: https://www.datacamp.com/community/tutorials/matplotlib-tutorial-python.
You don't need to have anything installed on your system for this book before starting it. We will cover all the installation steps in the first chapter.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
Support
tab.
Click on
Code Downloads
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-for-OpenCV-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789536300_ColorImages.pdf.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
In the very first section of this book, we will go over the basics of machine learning and OpenCV, starting with installing the required libraries, and then moving on to basic OpenCV functions, the basics of supervised learning and their applications, and finally, feature detection and recognition using OpenCV.
This section includes the following chapters:
Chapter 1
,
A Taste of Machine Learning
Chapter 2
,
Working with Data in OpenCV
Chapter 3
,
First Steps in Supervised Learning
Chapter 4
,
Representing Data and Engineering Features
So, you have decided to enter the field of machine learning. That's great!
Nowadays, machine learning is all around us—from protecting our email, to automatically tagging our friends in pictures, to predicting what movies we like. As a form of artificial intelligence, machine learning enables computers to learn through experience; to make predictions about the future using collected data from the past. On top of that, computer vision is one of today's most exciting application fields of machine learning, with deep learning and convolutional neural networks driving innovative systems such as self-driving cars and Google's DeepMind.
However, fret not; your application does not need to be as large-scale or world-changing as the previous examples in order to benefit from machine learning. In this chapter, we will talk about why machine learning has become so popular and discuss the kinds of problems that it can solve. We will then introduce the tools that we need in order to solve machine learning problems using OpenCV. Throughout the book, I will assume that you already have a basic knowledge of OpenCV and Python, but that there is always room to learn more. We will also go over how you can install OpenCV on your local system so that you can try out the code on your own.
Are you ready then? In this chapter, we will go over the following concepts:
What is machine learning and what are its categories?
Important Python concepts
Getting started with OpenCV
Installing Python and the required modules on the local system
Applications of machine learning
What's new in OpenCV 4.0?
You can refer to the code for this chapter at the following link: https://github.com/PacktPublishing/Machine-Learning-for-OpenCV-Second-Edition/tree/master/Chapter01.
Here is a short summary of the software and hardware requirements:
OpenCV version 4.1.x (4.1.0 or 4.1.1 will both work just fine).
Python version 3.6 (any Python version 3.x will be fine).
Anaconda Python 3 for installing Python and the required modules.
You can use any OS—macOS, Windows, and Linux-based OS—with this book. We recommend you have at least 4 GB RAM in your system.
You don't need to have a GPU to run the code provided with this book.
Machine learning has been around for at least 60 years. Growing out of the quest for artificial intelligence, early machine learning systems inferred the hand-coded rules of if...else statements to process data and make decisions. Think of a spam filter whose job is to parse incoming emails and move unwanted messages to a spam folder as shown here in the following diagram:
We could come up with a blacklist of words that, whenever they show up in a message, would mark an email as spam. This is a simple example of a hand-coded expert system. (We will build a smarter one in Chapter 7, Implementing a Spam Filter with Bayesian Learning.)
These expert decision rules can become arbitrarily complicated if we are allowed to combine and nest them in what is known as a decision tree (Chapter 5, Using Decision Trees to Make a Medical Diagnosis). Then, it becomes possible to make more informed decisions that involve a series of decision steps. You should note that even though decision trees do look like a set of if...else conditions, it's way more than that and is actually a kind of machine learning algorithm that we will explore in Chapter 5, Using Decision Trees to Make a Medical Diagnosis.
Hand-coding these decision rules is sometimes feasible but it has two major disadvantages:
The logic required to make a decision applies only to a specific task in a single domain. For example, there is no way that we could use this spam filter to tag our friends in a picture. Even if we wanted to change the spam filter to do something slightly different, such as filtering out phishing emails (intended to steal your personal data) in general, we would have to redesign all the decision rules.
Designing rules by hand requires a deep understanding of the problem. We would have to know exactly what type of emails constitute spam, including all possible exceptions. This is not as easy as it seems; otherwise, we wouldn't often be double-checking our spam folder for important messages that might have been accidentally filtered out. For other domain problems, it is simply not possible to design the rules by hand.
This is where machine learning comes in. Sometimes, tasks cannot be defined well—except maybe by example—and we would like machines to make sense of and solve the tasks by themselves. Other times, it is possible that important relationships and correlations are hidden among large piles of data that we as humans might have missed (see Chapter 8, Discovering Hidden Structures with Unsupervised Learning). When dealing with large amounts of data, machine learning can often be used to extract these hidden relationships (also known as data mining).
A good example of where man-made expert systems have failed is in detecting faces in images. Silly, isn't it? Today, every smartphone can detect a face in an image. However, 20 years ago, this problem was largely unsolved. The reason for this was the way humans think about what constitutes a face was not very helpful to machines. As humans, we tend not to think in pixels. If we were asked to detect a face, we would probably just look for the defining features of a face, such as eyes, nose, mouth, and so on. But how would we tell a machine what to look for, when all the machine knows is that images have pixels and pixels have a certain shade of gray? For the longest time, this difference in image representation basically made it impossible for a human to come up with a good set of decision rules that would allow a machine to detect a face in an image. We will talk about different approaches to this problem in Chapter 4, Representing Data and Engineering Features.
However, with the advent of convolutional neural networks and deep learning (Chapter 9, Using Deep Learning to Classify Handwritten Digits), machines have become as successful as us when it comes to recognizing faces. All we had to do was simply present a large collection of images of faces to the machine. Most approaches also require some form of annotation about where the faces are in the training data. From there on, the machine was able to discover the set of characteristics that would allow it to identify a face, without having to approach the problem in the same way as we would. This is the true power of machine learning.
Most machine learning problems belong to one of the following three main categories:
In supervised learning, we have what is referred to as the label for a data point. Now, this can be the class of an object that is captured in the image, a bounding box around a face, the digit present in the image, or anything else. Think of it as a teacher who teaches but also tells you what the correct answer is to a problem. Now, the student can try to devise a model or an equation that takes into account all the problems and their correct answers and finds out the answer to a problem that does (or does not) have a correct answer. The data that goes into learning the model is called the
training data
and the data on which the process/model is tested is called
test data
. These predictions come in two flavors, such as identifying new photos with the correct animal (called a
classification
problem
) or
assigning
accurate sale prices to other used cars (called a
regression
problem
). Don't worry if this seems a little over your
head
for now—we will have the entirety of the book to nail down the details.
In unsupervised learning, data points have no
labels
associated with them (
Chapter 8
,
Discovering Hidden Structures with
Unsupervised Learning
). Think of it like a class where the instructor gives you a jumbled puzzle and leaves it up to you to figure out what to do. Here, the most common result is the
c
lusters, which contain objects with similar characteristics. It can also result in different ways of looking at higher dimensional data (complex data) so that it appears simpler.
Reinforcement learning is about maximizing a reward in a problem. So, if the teacher gives you a candy for every correct answer and punishes you for every incorrect answer, he/she is reinforcing the concepts by making you increase the number of times you receive candies rather than the number of times you are subjected to a punishment.
These three main categories are illustrated in the following diagram:
Now that we have covered the main machine learning categories, let's go over some concepts in Python that will prove very useful along the journey of this book.
Python has become the common language for many data science and machine learning applications, thanks to its great number of open source libraries for processes such as data loading, data visualization, statistics, image processing, and natural language processing. One of the main advantages of using Python is the ability to interact directly with the code, using a Terminal or other tools such as the Jupyter Notebook, which we'll look at shortly.
If you have mostly been using OpenCV in combination with C++, I would strongly suggest that you switch to Python, at least for the purpose of studying this book. This decision has not been made out of spite! Quite the contrary: I have done my fair share of C/C++ programming—especially in combination with GPU computing via NVIDIA's Compute Unified Device Architecture (CUDA)—and I like it a lot. However, I consider Python to be a better choice if you want to pick up a new topical skill because you can do more by typing less. This will help reduce the cognitive load. Rather than getting annoyed by the syntactic subtleties of C++ or wasting hours trying to convert data from one format to another, Python will help you concentrate on the topic at hand: becoming an expert in machine learning.
Being the avid user of OpenCV that I believe you are, I probably don't have to convince you about the power of OpenCV.
Built to provide a common infrastructure for computer vision applications, OpenCV has become a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. According to their own documentation, OpenCV has a user community of more than 47,000 people and has been downloaded over seven million times. That's pretty impressive! As an open source project, it is very easy for researchers, businesses, and government bodies to utilize and modify already available code.
That being said, a number of open source machine learning libraries have popped up as part of the recent machine learning boom that provide far more functionality than OpenCV. A prominent example is scikit-learn, which provides a number of state-of-the-art machine learning algorithms as well as a wealth of online tutorials and code snippets. As OpenCV was developed mainly to provide computer vision algorithms, its machine learning functionality is restricted to a single module, called ml. As we will see in this book, OpenCV still provides a number of state-of-the-art algorithms, but sometimes lacks a bit in functionality. In these rare cases, instead of reinventing the wheel, we will simply use scikit-learn for our purposes.
Last but not least, installing OpenCV using the Python Anaconda distribution is essentially a one-liner as we'll see in the following sections.
Before we get started, let's make sure that we have all the tools and libraries installed that are necessary to create a fully functioning data science environment. After downloading the latest code for this book from GitHub, we are going to install the following software:
Python's Anaconda distribution, based on Python 3.6 or higher
OpenCV 4.1
Some
supporting
packages
You can get the latest code for this book from GitHub: https://github.com/PacktPublishing/Machine-Learning-for-OpenCV-Second-Edition. You can either download a .zip package (beginners) or clone the repository using Git (intermediate users).
If you choose to go with git, the first step is to make sure it is installed (https://git-scm.com/downloads).
Then, open a Terminal (or Command Prompt, as it is called in Windows):
On Windows 10, right-click on the Start Menu button, and select
Command Prompt
.
On macOS X, press
Cmd
+
Space
to open spotlight search, then type
terminal
, and hit
Enter
.
On Ubuntu, Linux/Unix and friends, press
Ctrl + Alt + T
. On Red Hat, right-click on the desktop and choose
Open Terminal
from the menu.
Navigate to a directory where you want the code downloaded:
cd Desktop
Then you can grab a local copy of the latest code by typing the following:
git clone https://github.com/PacktPublishing/Machine-Learning-for-OpenCV-Second-Edition.git
This will download the latest code in a folder called OpenCV-ML.
After a while, the code might change online. In that case, you can update your local copy by running the following command from within the OpenCV-ML directory:
git pull origin master
Anaconda is a free Python distribution developed by Continuum Analytics that is made for scientific computing. It works across Windows, Linux, and macOS X platforms and is free, even for commercial use. However, the best thing about it is that it comes with a number of preinstalled packages that are essential for data science, math, and engineering. These packages include the following:
NumPy
: A fundamental package for scientific computing in Python that provides functionality for multidimensional arrays, high-level mathematical functions, and pseudo-random number generators
SciPy
: A
collection
of functions for scientific computing in Python
that
provides advanced linear algebra routines, mathematical function optimization, signal processing, and so on
scikit-learn
: An open source machine
learning
library in Python
that
provides useful helper functions and infrastructure that OpenCV lacks
Matplotlib
: The primary scientific plotting library in Python, which provides functionality for producing line charts, histograms, scatter plots, and so on
Jupyter Notebook
: An interactive environment for the
running
of code in a web browser that also includes functionalities of markdown, which in turn helps in maintaining well commented and detailed project notebooks
An installer for our platform of choice (Windows, macOS X, or Linux) can be found on the Continuum website, https://www.anaconda.com/download. I recommend using the Python 3.6-based distribution, as Python 2 is no longer under active development.
To run the installer, do one of the following:
On Windows, double-click on the
.exe
file and follow the instructions on the screen
On macOS X, double-click on the
.pkg
file and follow the instructions on the screen
On Linux, open a Terminal and run the
.sh
script using bash as shown here:
$ bash Anaconda3-2018.12-Linux-x86_64.sh # Python 3.6 based
In addition, Python Anaconda comes with conda—a simple package manager similar to apt-get on Linux. After successful installation, we can install new packages by typing the following command in the Terminal:
$ conda install package_name
Here, package_name is the actual name of the package that we want to install.
Existing packages can be updated using the following command:
$ conda update package_name
We can also search for packages using the following command:
$ anaconda search -t conda package_name
This will bring up a whole list of packages made available by developers. For example, searching for a package named opencv, we get the following hits:
This will bring up a long list of users who have OpenCV packages installed, allowing us to locate users that have our version of the software installed on our own platform. A package called package_name from a user called user_name can then be installed as follows:
$ conda install -c user_name package_name
Finally, conda provides something called an environment, which allows us to manage different versions of Python and/or packages installed in them. This means we could have a separate environment where we have all packages necessary to run OpenCV 4.1 with Python 3.6. In the following section, we will create an environment that contains all the packages needed to run the code in this book.
We will carry out the following steps to install OpenCV:
In a Terminal, navigate to the
directory
where you
downloaded
the following code:
$ cd Desktop/OpenCV-ML
Then, run the following command to create a conda environment based on Python 3.6, which will also install all the necessary packages listed in the
environment.yml
file (available in the GitHub repository)
in one fell swoop:
$ conda create env -f environment.yml
You can also have a look at the following
environment.yml
file:
name
: OpenCV-ML
channels
: - conda-forge
dependencies
: - python==3.6 - numpy==1.15.4 - scipy==1.1.0 - scikit-learn==0.20.1 - matplotlib - jupyter==1.0 - notebook==5.7.4 - pandas==0.23.4 - theano - keras==2.2.4 - mkl-service==1.1.2 - pip -
pip
: - opencv-contrib-python==4.1.0.25
To activate the environment, type one of the following, depending on your platform:
$ source activate OpenCV-ML # on Linux / Mac OS X
$ activate OpenCV-ML # on Windows
When we close the Terminal, the session will be deactivated—so we will have to run this last command again the next time we open a new Terminal. We can also deactivate the environment by hand:
$ source deactivate # on Linux / Mac OS X
$ deactivate # on Windows
And done! Let's verify whether all this installation was successful or not.
It's a good idea to double-check our installation. While our terminal is still open, we start IPython, which is an interactive shell to run Python commands:
$ ipython
Next, make sure that you are running (at least) Python 3.6 and not Python 2.7. You might see the version number displayed in IPython's welcome message. If not, you can run the following commands:
In [1]: import sys... print(sys.version) 3.6.0 | packaged by conda-forge | (default, Feb 9 2017, 14:36:55) [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
Now try to import OpenCV as follows:
In [2]: import cv2
You should get no error messages. Then, try to find out the version number like so:
In [3]: cv2.__version__Out[3]: '4.0.0'
Make sure that OpenCV's version number reads at 4.0.0; otherwise, you will not be able to use some OpenCV functionality later on.
You can then exit the IPython shell by typing exit, or hitting Ctrl + D and confirming that you want to quit.
Alternatively, you can run the code in a web browser thanks to the Jupyter Notebook. If you have never heard of Jupyter Notebooks or played with them before, trust me—you will love them! If you followed the directions as mentioned earlier and installed the Python Anaconda stack, Jupyter is already installed and ready to go. In a Terminal, type this:
$ jupyter notebook
This will automatically open a browser window, showing a list of files in the current directory. Click on the OpenCV-ML folder, then on the notebooks folder, and voila! Here you will find all the code for this book, ready to be explored:
The notebooks are arranged by chapter and section. For the most part, they contain only the relevant code, but no additional information or explanations. These are reserved for those who support our effort by buying this book—so thank you!
Simply click on a notebook of your choice, such as 01.00-A-Taste-of-Machine-Learning.ipynb, and you will be able to run the code yourself by selecting Kernel|Restart | Run All:
There are a few handy keyboard shortcuts for navigating Jupyter Notebooks. However, the only ones that you need to know about right now are the following:
Click in a cell (note the highlighted region in the preceding screenshot—that's referred to as a cell) in order to edit it
While the cell is selected, hit
Ctrl
+
Enter
to execute the code in it
Alternatively, hit
Shift
+
Enter
to execute a cell and select the cell below it
Hit
Esc
to exit write mode, then hit
A
to insert a cell above the currently selected one and
B
to insert a cell below
However, I strongly encourage you to follow along with the book by actually typing out the commands yourself, preferably in an IPython shell or an empty Jupyter Notebook. There is no better way to learn how to code than by getting your hands dirty. Even better if you make mistakes—we have all been there. At the end of the day, it's all about learning by doing!
Starting with OpenCV 3.1, all machine learning related functions in OpenCV have been grouped into the ml module. This has been the case for the C++ API for quite some time. You can get a glimpse of what's to come by displaying all functions in the ml module:
In [4]: dir(cv2.ml)Out[4]: ['ANN_MLP_ANNEAL', 'ANN_MLP_BACKPROP', 'ANN_MLP_GAUSSIAN', 'ANN_MLP_IDENTITY', 'ANN_MLP_LEAKYRELU', 'ANN_MLP_NO_INPUT_SCALE', 'ANN_MLP_NO_OUTPUT_SCALE', ... '__spec__']
This is all good but you will be wondering by now why should you even learn machine learning, and what are its applications? Let's answer this question in the next section.
Machine learning, artificial intelligence, deep learning, and data science are four terms that I believe are going to change the way we have always looked at things. Let's see if I can convince you why I believe so.
From making a computer learn how to play Go and defeat the world champion of the very same game to using the same branch to detect whether a person has a tumor or not just by seeing their brain's CT Scan, machine learning has left its mark in every single domain. One of the projects that I worked on was using machine learning to determine the residual life cycle of boiler water wall tubes in thermal power plants. The proposed solution was successful in saving a huge amount of money by using the tubes more efficiently. If you thought that machine learning applications are limited to engineering and medical science, then you are wrong. Researchers have applied machine learning concepts to process newspapers and predict the effect of news on the chances of a particular candidate winning the US presidential elections.
Deep learning and computer vision concepts have been applied to colorize black and white movies (have a look at this blog post—https://www.learnopencv.com/convolutional-neural-network-based-image-colorization-using-opencv/), to create super-slow-motion movies, to restore torn out portions of famous artworks, and more.
I hope I have managed to convince you about the importance and the power of machine learning. You have made the right decision to explore this field. But, if you are not a computer science engineer and are worried that you might end up working in a domain that is not your favorite, do not worry. Machine learning is an extra skillset that you can always apply to a problem of your choice.
So, we come to the last section of the very first chapter. I will keep it short and to the point since you as a reader can safely skip it. The topic of our discussion is OpenCV 4.0.
OpenCV 4.0 is a result of three and a half years of hard work and bug fixes by OpenCV and was finally released in November 2018. In this section, we will look at some of the major changes and new features in OpenCV 4.0:
With the OpenCV 4.0 release, OpenCV has officially become a C++11 library. This means that you have to make sure that a C++11 compliant compiler is present in your system when you are trying to compile OpenCV 4.0.
In continuation of the previous point, a lot of C APIs have been removed. Some of the modules that have been affected include Video IO module (
videoio
), Object Detection module (
objdetect
), and others. File IO for XML, YAML, and JSON have also removed the C API.
OpenCV 4.0 also has a lot of improvements in the DNN module (the deep learning module). ONNX support has been added.
Intel OpenVINO
also marks its presence in the new OpenCV version. We will be looking into this in some more detail in our later chapters.
OpenCL acceleration has been fixed on AMD and NVIDIA GPUs.
OpenCV Graph API has also been added, which is a highly efficient engine for image processing and other operations.
As in every OpenCV release, there have been a lot of changes with the purpose of improving the performance. Some new features such as QR Code Detection and Decoding have also been added.
In short, there have been a lot of changes in OpenCV 4.0 and they have their own uses. For example, ONNX support helps in the portability of models across various languages and frameworks, OpenCL reduces runtime for computer vision applications, Graph API helps in increasing the efficiency of the applications, and the OpenVINO toolkit uses Intel's processors and a model zoo to provide highly efficient deep learning models. We will be focusing primarily on OpenVINO toolkit and DLDT as well as accelerating computer vision applications in later chapters. But, I should also point out here that both OpenCV 3.4.4 and OpenCV 4.0.0 are being modified at a high speed to fix bugs. So, if you are going to use either of them in any application, be prepared to modify your code and installation to incorporate the changes made. On a similar note, OpenCV 4.0.1 and OpenCV 3.4.5 are also out within a few months of their predecessors.
In this chapter, we talked about machine learning at a high abstraction level: what it is, why it is important, and what kinds of problems it can solve. We learned that machine learning problems come in three flavors: supervised learning, unsupervised learning, and reinforcement learning. We talked about the prominence of supervised learning, and that this field can be further divided into two subfields: classification and regression. Classification models allow us to categorize objects into known classes (such as animals into cats and dogs), whereas regression analysis can be used to predict continuous outcomes of target variables (such as the sales price of used cars).
We also learned how to set up a data science environment using the Python Anaconda distribution, how to get the latest code of this book from GitHub, and how to run code in a Jupyter Notebook.
With these tools in hand, we are now ready to start talking about machine learning in more detail. In the next chapter, we will look at the inner workings of machine learning systems and learn how to work with data in OpenCV with the help of common Pythonic tools such as NumPy and Matplotlib.
Now that we have whetted our appetite for machine learning, it is time to delve a little deeper into the different parts that make up a typical machine learning system.
Far too often, you hear someone throw around the phrase, Just apply machine learning to your data!, as if that will instantly solve all of your problems. You can imagine that the reality of this is much more intricate, although, I will admit that nowadays, it is incredibly easy to build your own machine learning system simply by cutting and pasting a few lines of code from the internet. However, to build a system that is truly powerful and effective, it is essential to have a firm grasp of the underlying concepts and an intimate knowledge of the strengths and weaknesses of each method. So, don't worry if you don't consider yourself a machine learning expert just yet. Good things take time.
Earlier, I described machine learning as a subfield of artificial intelligence. This might be true—mainly for historical reasons—but most often, machine learning is simply about making sense of data. Therefore, it might be more suitable to think of machine learning as a subfield of data science, where we build mathematical models to help us to understand data.
Hence, this chapter is all about data. We want to learn how data fits in with machine learning and how to work with data using the tools of our choice: OpenCV and Python.
In this chapter, we will cover the following topics:
Understanding the machine learning workflow
Understanding training data and test data
Learning how to load, store, edit, and visualize data with OpenCV and Python
You can refer to the code for this chapter from the following link: https://github.com/PacktPublishing/Machine-Learning-for-OpenCV-Second-Edition/tree/master/Chapter02.
Here is a summary of the software and hardware requirements:
You will need OpenCV version 4.1.x (4.1.0 or 4.1.1 will both work just fine).
You will need Python version 3.6 (any Python version 3.x will be fine).
You will need Anaconda Python 3 for installing Python and the required modules.
You can use any OS—macOS, Windows, and Linux-based OSes along with this book. We recommend you have at least 4 GB RAM in your system.
You don't need to have a GPU to run the code provided along with this book.
As mentioned earlier, machine learning is all about building mathematical models to understand data. The learning aspect enters this process when we give a machine learning model the capability to adjust its internal parameters; we can tweak these parameters so that the model explains the data better. In a sense, this can be understood as the model learning from the data. Once the model has learned enough—whatever that means—we can ask it to explain newly observed data.
A typical classification process is illustrated in the following diagram:
Let's break it down step by step.
The first thing to notice is that machine learning problems are always split into (at least) two distinct phases:
A
training phase
, during which we aim to train a
machine
learning model on a set of
data
that we call the
training dataset
A test phase, during which we evaluate the learned (or finalized) machine learning model on a new set of never-before-seen data that we call the
test dataset
The importance of splitting our data into a training set and test set cannot be understated. We always evaluate our models on an independent test set because we are interested in knowing how well our models generalize to new data. In the end, isn't this what learning is all about—be it machine learning or human learning? Think back to school, when you were a learner yourself: the problems you had to solve as part of your homework would never show up in exactly the same form in the final exam. The same scrutiny should be applied to a machine learning model; we are not so much interested in how well our models can memorize a set of data points (such as a homework problem), but we want to know how our models will use what they have learned to solve new problems (such as the ones that show up in a final exam) and explain new data points.
