E-Book
34,79 €

Learn OpenCV 4 by Building Projects, E-Book

David Millán Escrivá

0,0

34,79 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Fachliteratur
Sprache: Englisch

Beschreibung

OpenCV is one of the best open source libraries available, and can help you focus on constructing complete projects on image processing, motion detection, and image segmentation. Whether you’re completely new to computer vision, or have a basic understanding of its concepts, Learn OpenCV 4 by Building Projects – Second edition will be your guide to understanding OpenCV concepts and algorithms through real-world examples and projects.
You’ll begin with the installation of OpenCV and the basics of image processing. Then, you’ll cover user interfaces and get deeper into image processing. As you progress through the book, you'll learn complex computer vision algorithms and explore machine learning and face detection. The book then guides you in creating optical flow video analysis and background subtraction in complex scenes. In the concluding chapters, you'll also learn about text segmentation and recognition and understand the basics of the new and improved deep learning module.
By the end of this book, you'll be familiar with the basics of Open CV, such as matrix operations, filters, and histograms, and you'll have mastered commonly used computer vision techniques to build OpenCV projects from scratch.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 309

Veröffentlichungsjahr: 2018

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Der größte Raubzug der Geschichte

Matthias Weik

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Die Jahrhundertlüge, die nur Insider kennen

Heiko Schrang

Organisation für Komplexität

Niels Pfläging

Radikal führen

Reinhard K. Sprenger

30 Minuten Sympathisch und souverän: So geht Vortragen!

Thomas Lorenz

BLACKOUT - Morgen ist es zu spät

Marc Elsberg

The Truth About Employee Engagement

Patrick M. Lencioni

Mensch und Wald

Carsten Wippermann

The Food Truck Handbook

David Weber

Die selbstbestimmte Geburt

Ina May Gaskin

Leseprobe

Learn OpenCV 4 by Building ProjectsSecond Edition

Build real-world computer vision and image processing applications with OpenCV and C++

David Millán Escrivá Vinícius G. Mendonça Prateek Joshi

BIRMINGHAM - MUMBAI

Learn OpenCV 4 by Building Projects Second Edition

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Aaron LazarAcquisition Editor: Sandeep MishraContent Development Editor: Pooja ParvatkarTechnical Editor: Abin SebastianCopy Editor: Safis EditingProject Coordinator: Ulhas KambaliProofreader: Safis EditingIndexer: Pratik ShirodkarGraphics: Tom ScariaProduction Coordinator: Nilesh Mohite

First published: January 2016 Second edition: November 2018

Production reference: 1301118

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78934-122-5

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

Packt.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the authors

David Millán Escrivá was eight years old when he wrote his first program on an 8086 PC using the BASIC language. He completed his studies in IT from the Universitat Politécnica de Valencia with honors in human-computer interaction supported by computer vision with OpenCV (v0.96). He has a master's degree in artificial intelligence, computer graphics, and pattern recognition, focusing on pattern recognition and computer vision. He also has more than nine years' experience in computer vision, computer graphics, and pattern recognition. He is the author of the Damiles Blog, where he publishes articles and tutorials on OpenCV, computer vision in general, and optical character recognition algorithms.

I would like thank to my wife, Izaskun, my daughter, Eider, and my son, Pau, for their unlimited patience and support at all times. They have changed my life and made it awesome every day. I love you all. I would like to thank the OpenCV team and community that gives us this wonderful library. I would also like to thank my co-authors and Packt Publishing for supporting me and helping me to complete this book.

Vinícius G. Mendonçais a computer graphics university professor at Pontifical Catholic University of Paraná (PUCPR). He started programming with C++ back in 1998, and ventured into the field of computer gaming and computer graphics back in 2006. He is currently a mentor at the Apple Developer Academy in Brazil, working with, and teaching, metal, machine learning and computer vision for mobile devices. He has served as a reviewer on other Pack books, including OpenNI Cookbook, and Mastering OpenCV and Computer Vision with OpenCV 3 and Qt5. In his research, he has used Kinect, OpenNI, and OpenCV to recognize Brazilian sign language gestures. His areas of interest include mobile, OpenGL, image processing, computer vision, and project management.

I would like to thank my wife, Thais A. L. Mendonça, for the support she gave me while writing this book. I also dedicate this work to my four girls, Laura, Helena, Alice, and Mariana, and to my stepson, Bruno. My life and work would have no meaning without this great family. I would also like to thank Fabio Binder – my teacher, boss, and mentor – who introduced me to computer graphics and gaming fields, and who has helped me greatly throughout my career.

Prateek Joshi is an artificial intelligence researcher, an author of eight published books, and a TEDx speaker. He has been featured in Forbes 30 Under 30, CNBC, TechCrunch, Silicon Valley Business Journal, and many more publications. He is the founder of Pluto AI, a venture-funded Silicon Valley start-up building an intelligence platform for water facilities. He graduated from the University of Southern California with a Master's degree specializing in Artificial Intelligence. He has previously worked at NVIDIA and Microsoft Research.

About the reviewers

Marc Amberg is an experienced machine learning and computer vision engineer with a proven history of working in the IT and service industries. He is skilled in Python, C/C++, OpenGL, 3D Reconstruction, and Java. He is a strong engineering professional with a master's degree focused on computer science (Image, Vision, and Interactions) from Université des Sciences et Technologies de Lille (Lille I).

Vincent Kok currently works as a software platform application engineer with Intel in the transportation industry sector. He graduated from the University Sains Malaysia (USM) with a degree in electronic engineering. Currently, he is pursuing his master's degree in embedded system engineering at USM. Vincent actively involves himself with the developer community and regularly participates in the Maker Faire, which is held in different parts of the world. He likes to design electronic hardware kits and gives soldering/arduino classes for beginners during his spare time.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Title Page

Learn OpenCV 4 by Building Projects Second Edition

About Packt

Why subscribe?

Packt.com

Contributors

About the authors

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Code in Action

Conventions used

Get in touch

Reviews

Getting Started with OpenCV

Understanding the human visual system

How do humans understand image content?

Why is it difficult for machines to understand image content?

What can you do with OpenCV?

Inbuilt data structures and input/output

Image processing operations

GUI

Video analysis

3D reconstruction

Feature extraction

Object detection

Machine learning

Computational photography

Shape analysis

Optical flow algorithms

Face and object recognition

Surface matching

Text detection and recognition

Deep learning

Installing OpenCV

Windows

Mac OS X

Linux

Summary

An Introduction to the Basics of OpenCV

Technical requirements

Basic CMake configuration file

Creating a library

Managing dependencies

Making the script more complex

Images and matrices

Reading/writing images

Reading videos and cameras

Other basic object types

Vec object type

Scalar object type

Point object type

Size object type

Rect object type

RotatedRect object type

Basic matrix operations

Basic data persistence and storage

Writing to FileStorage

Summary

Learning Graphical User Interfaces

Technical requirements

Introducing the OpenCV user interface

Basic graphical user interface with OpenCV

Adding slider and mouse events to our interfaces

Graphic user interface with Qt

Adding buttons to the user interface

OpenGL support

Summary

Delving into Histogram and Filters

Technical requirements

Generating a CMake script file

Creating the graphical user interface

Drawing a histogram

Image color equalization

Lomography effect

Cartoonize effect

Summary

Automated Optical Inspection, Object Segmentation, and Detection

Technical requirements

Isolating objects in a scene

Creating an application for AOI

Preprocessing the input image

Noise removal

Removing the background using the light pattern for segmentation

Thresholding

Segmenting our input image

The connected components algorithm

The findContours algorithm

Summary

Learning Object Classification

Technical requirements

Introducing machine learning concepts

OpenCV machine learning algorithms

Computer vision and the machine learning workflow

Automatic object inspection classification example

Feature extraction

Training an SVM model

Input image prediction

Summary

Detecting Face Parts and Overlaying Masks

Technical requirements

Understanding Haar cascades

What are integral images?

Overlaying a face mask in a live video

What happened in the code?

Get your sunglasses on

Looking inside the code

Tracking the nose, mouth, and ears

Summary

Video Surveillance, Background Modeling, and Morphological Operations

Technical requirements

Understanding background subtraction

Naive background subtraction

Does it work well?

Frame differencing

How well does it work?

The Mixture of Gaussians approach

What happened in the code?

Morphological image processing

What's the underlying principle?

Slimming the shapes

Thickening the shapes

Other morphological operators

Morphological opening

Morphological closing

Drawing the boundary

Top Hat transform

Black Hat transform

Summary

Learning Object Tracking

Technical requirements

Tracking objects of a specific color

Building an interactive object tracker

Detecting points using the Harris corner detector

Good features to track

Feature-based tracking

Lucas-Kanade method

Farneback algorithm

Summary

Developing Segmentation Algorithms for Text Recognition

Technical requirements

Introducing optical character recognition

Preprocessing stage

Thresholding the image

Text segmentation

Creating connected areas

Identifying paragraph blocks

Text extraction and skewing adjustment

Installing Tesseract OCR on your operating system

Installing Tesseract on Windows

Building the latest library

Setting up Tesseract in Visual Studio

Static linking

Installing Tesseract on Mac

Using the Tesseract OCR library

Creating an OCR function

Sending the output to a file

Summary

Text Recognition with Tesseract

Technical requirements

How the text API works

The scene detection problem

Extremal regions

Extremal region filtering

Using the text API

Text detection

Text extraction

Text recognition

Summary

Deep Learning with OpenCV

Technical requirements

Introduction to deep learning

What is a neural network and how can we learn from data?

Convolutional neural networks

Deep learning in OpenCV

YOLO – real-time object detection

YOLO v3 deep learning model architecture

The YOLO dataset, vocabulary, and model

Importing YOLO into OpenCV

Face detection with SSD

SSD model architecture

Importing SSD face detection into OpenCV

Summary

Preface

OpenCV is one of the most popular libraries used to develop computer vision applications. It enables us to run many different computer vision algorithms in real time. It has been around for many years, and it has become the standard library in this field. One of the main advantages of OpenCV is that it is highly optimized and available on almost all platforms.

This book starts off by giving a brief introduction to the various fields in computer vision and the associated OpenCV functionalities in C++. Each chapter contains real-world examples and code samples to demonstrate the use cases. This helps you to easily grasp the topics and understand how they can be applied in real life. To sum up, this is a practical guide on how to use OpenCV in C++ and build various applications using this library.

Who this book is for

This book is for developers who are new to OpenCV and want to develop computer vision applications with OpenCV in C++. A basic knowledge of C++ would be helpful in understanding this book. This book is also useful for people who want to get started with computer vision and understand the underlying concepts. They should be aware of basic mathematical concepts, such as vectors, matrices, and matrix multiplication, in order to get the most out of this book. During the course of this book, you will learn how to build various computer vision applications from scratch using OpenCV.

What this book covers

Chapter 1, Getting Started with OpenCV, covers installation steps on various operating systems and provides an introduction to the human visual system, as well as various topics in computer vision.

Chapter 2, Introduction to OpenCV Basics, discusses how to read/write images and videos in OpenCV, and also explains how to build a project using CMake.

Chapter 3, Learning Graphical User Interface and Basic Filtering, covers how to build a graphical user interface and mouse event detector to build interactive applications.

Chapter 4, Delving into Histograms and Filters, explores histograms and filters and also shows how we can cartoonize an image.

Chapter 5, Automated Optical Inspection, Object Segmentation, and Detection, describes various image pre-processing techniques, such as noise removal, thresholding, and contour analysis.

Chapter 6, Learning Object Classification, deals with object recognition and machine learning, and how to use support vector machines to build an object classification system.

Chapter 7, Detecting Face Parts and Overlaying Masks, discusses face detection and Haar Cascades, and then explains how these methods can be used to detect various parts of the human face.

Chapter 8, Video Surveillance, Background Modeling, and Morphological Operations, explores background subtraction, video surveillance, and morphological image processing, and describes how they are connected to one another.

Chapter 9, Learning Object Tracking, covers how to track objects in a live video using different techniques, such as color-based and feature-based tracking.

Chapter 10, Developing Segmentation Algorithms for Text Recognition, covers optical character recognition, text segmentation, and provides an introduction to the Tesseract OCR engine.

Chapter 11, Text Recognition with Tesseract, delves deeper into the Tesseract OCR engine to explain how it can be used for text detection, extraction, and recognition.

Chapter 12, Deep Learning with OpenCV, explores how to apply deep learning in OpenCV with two commonly used deep learning architectures: YOLO v3 for object detection, and Single Shot Detector for face detection.

To get the most out of this book

A basic knowledge of C++ would be helpful in understanding this book. The examples are built using the following technologies: OpenCV 4.0; CMake 3.3.x or newer; Tesseract; Leptonica (a dependency of Tesseract); Qt (optional); and OpenGL (optional).

Detailed installation instructions are provided in the relevant chapters.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

www.packt.com

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

Enter the name of the book in the

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Learn-OpenCV-4-By-Building-Projects-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781789341225_ColorImages.pdf.

Code in Action

Visit the following link to check out videos of the code being run:http://bit.ly/2Sfrxgu

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Getting Started with OpenCV

Computer vision applications are interesting and useful, but the underlying algorithms are computationally intensive. With the advent of cloud computing, we are getting more processing power to work with.

The OpenCV library enables us to run computer vision algorithms efficiently in real time. It has been around for many years, and has become the standard library in this field. One of the main advantages of OpenCV is that it is highly optimized, and available on almost all platforms.

This book will cover the various algorithms we will be using, why we are using them, and how to implement them in OpenCV.

In this chapter, we are going to learn how to install OpenCV on various operating systems. We will discuss what OpenCV offers out of the box, and the various things that we can do using the inbuilt functions.

By the end of this chapter, you will be able to answer the following questions:

How do humans process visual data, and how do they understand image content?

What can we do with OpenCV, and what are the various modules available in OpenCV that can be used to achieve those things?

How do we install OpenCV on Windows, Linux, and Mac OS X?

Understanding the human visual system

Before we jump into OpenCV functionalities, we need to understand why those functions were built in the first place. It's important to understand how the human visual system works, so that you can develop the right algorithms.

The goal of computer vision algorithms is to understand the content of images and videos. Humans seem to do it effortlessly! So, how do we get machines to do it with the same accuracy?

Let's consider the following diagram:

The human eye captures all the information that comes along the way, such as color, shape, brightness, and so on. In the preceding image, the human eye captures all the information about the two main objects and stores it in a certain way. Once we understand how our system works, we can take advantage of it to achieve what we want.

For example, here are a few things we need to know:

Our visual system is more sensitive to low-frequency content than high-frequency content. Low-frequency content refers to planar regions where pixel values don't change rapidly, and high-frequency content refers to regions with corners and edges where pixel values fluctuate a lot. We can easily see if there are blotches on a planar surface, but it's difficult to spot something like that on a highly-textured surface.

The human eye is more sensitive to changes in brightness than to changes in color.

Our visual system is sensitive to motion. We can quickly recognize if something is moving in our field of vision, even though we are not directly looking at it.

We tend to make a mental note of salient points in our field of vision. Let's say you look at a white table with four black legs, and a red dot at one of the corners of the table surface. When you look at this table, you'll immediately make a mental note that the surface and legs have opposing colors, and that there is a red dot on one of the corners. Our brain is really smart that way! We do this automatically so that we can immediately recognize an object if we encounter it again.

To get an idea of our field of view, let's look at the top view of a human, and the angles at which we see various things:

Our visual system is actually capable of a lot more, but this should be good enough to get us started. You can explore further by reading up on Human Visual System (HVS) models on the web.

How do humans understand image content?

If you look around, you will see a lot of objects. You encounter many different objects every day, and you recognize them almost instantaneously without any effort. When you see a chair, you don't wait for a few minutes before realizing that it is in fact a chair. You just know that it's a chair right away.

Computers, on the other hand, find it very difficult to do this task. Researchers have been working for many years to find out why computers are not as good as we are at this.

To get an answer to that question, we need to understand how humans do it. The visual data processing happens in the ventral visual stream. This ventral visual stream refers to the pathway in our visual system that is associated with object recognition. It is basically a hierarchy of areas in our brain that helps us recognize objects.

Humans can recognize different objects effortlessly, and can cluster similar objects together. We can do this because we have developed some sort of invariance toward objects of the same class. When we look at an object, our brain extracts the salient points in such a way that factors such as orientation, size, perspective, and illumination don't matter.

A chair that is double the normal size and rotated by 45 degrees is still a chair. We can recognize it easily because of the way we process it. Machines cannot do that so easily. Humans tend to remember an object based on its shape and important features. Regardless of how the object is placed, we can still recognize it.

In our visual system, we build up these hierarchical invariances with respect to position, scale, and viewpoint that help us to be very robust. If you look deeper into our system, you will see that humans have cells in their visual cortex that can respond to shapes such as curves and lines.

As we move further along our ventral stream, we will see more complex cells that are trained to respond to more complex objects such as trees, gates, and so on. The neurons along our ventral stream tend to show an increase in the size of the receptive field. This is coupled with the fact that the complexity of their preferred stimuli increases as well.

Why is it difficult for machines to understand image content?

We now understand how visual data enters the human visual system, and how our system processes it. The issue is that we still don't fully understand how our brain recognizes and organizes this visual data. In machine learning, we just extract some features from images, and ask the computers to learn them using algorithms. We still have these variations, such as shape, size, perspective, angle, illumination, occlusion, and so on.

For example, the same chair looks very different to a machine when you look at it from the profile view. Humans can easily recognize that it's a chair, regardless of how it's presented to us. So, how do we explain this to our machines?

One way to do this would be to store all the different variations of an object, including sizes, angles, perspectives, and so on. But this process is cumbersome and time-consuming. Also, it's actually not possible to gather data that can encompass every single variation. The machines would consume a huge amount of memory and a lot of time to build a model that can recognize these objects.

Even with all this, if an object is partially occluded, computers still won't recognize it. This is because they think this is a new object. So when we build a computer vision library, we need to build the underlying functional blocks that can be combined in many different ways to formulate complex algorithms.

OpenCV provides a lot of these functions, and they are highly optimized. So once we understand what OpenCV is capable of, we can use it effectively to build interesting applications.

Let's go ahead and explore that in the next section.

What can you do with OpenCV?

Using OpenCV, you can pretty much do every computer vision task you can think of. Real-life problems require you to use many computer vision algorithms and modules together to achieve the desired result. So, you just need to understand which OpenCV modules and functions to use, in order to get what you want.

Let's look at what OpenCV can do out of the box.

Inbuilt data structures and input/output

One of the best things about OpenCV is that it provides a lot of in-built primitives to handle operations related to image processing and computer vision. If you have to write something from scratch, you will have to define Image, Point, Rectangle, and so on. These are fundamental to almost any computer vision algorithm.

OpenCV comes with all these basic structures out of the box, contained in the core module. Another advantage is that these structures have already been optimized for speed and memory, and so you don't have to worry about the implementation details.

The imgcodecs module handles reading and writing of image files. When you operate on an input image and create an output image, you can save it as a .jpg or a .png file with a simple command.

You will be dealing with a lot of video files when you work with cameras. The videoio module handles everything related to the input and output of video files. You can easily capture a video from the webcam or read a video file in many different formats. You can even save a bunch of frames as a video file by setting properties such as frames per second, frame size, and so on.

Image processing operations

When you write a computer vision algorithm, there are a lot of basic image processing operations that you will use over and over again. Most of these functions are present in the imgproc module. You can do things such as image filtering, morphological operations, geometric transformations, color conversions, drawing on images, histograms, shape analysis, motion analysis, feature detection, and more.

Let's consider the following photo:

The right image is a rotated version of the one on the left. We can carry out this transformation with a single line in OpenCV.

There is another module, called ximgproc, which contains advanced image processing algorithms such as structured forests for edge detection, domain transform filter, adaptive manifold filter, and so on.

GUI

OpenCV provides a module called highgui that handles all the high-level user interface operations. Let's say you are working on a problem, and you want to check what the image looks like before you proceed to the next step. This module has functions that can be used to create windows to display images and/or videos.

There is a waiting function that will wait until you hit a key on your keyboard before it goes on to the next step. There is also a function that can detect mouse events. This is very useful in developing interactive applications.

Using this functionality, you can draw rectangles on those input windows, and then proceed based on the selected region. Consider the following screenshot:

As you can see, we drew a green rectangle on top of the window. Once we have the coordinates of that rectangle, we can operate only on that region.

Video analysis

Video analysis includes tasks such as analyzing the motion between successive frames in a video, tracking different objects in a video, creating models for video surveillance, and so on. OpenCV provides a module called video that can handle all of this.

There is also a module called videostab that deals with video stabilization. Video stabilization is important, as when you are capturing videos by holding the camera in your hands, there's usually a lot of shake that needs correcting. All modern devices use video stabilization to process the video before it's presented to the end user.

3D reconstruction

3D reconstruction is an important topic in computer vision. Given a set of 2D images, we can reconstruct the 3D scene using relevant algorithms. OpenCV provides algorithms that can find the relationship between various objects in those 2D images to compute their 3D positions in its calib3d module.

This module can also handle camera calibration, which is essential for estimating the parameters of the camera. These parameters define how the camera sees the scene in front of it. We need to know these parameters to design algorithms, or else we might get unexpected results.

Let's consider the following diagram:

As we can see here, the same object is captured from multiple positions. Our job is to reconstruct the original object using these 2D images.

Feature extraction

As we discussed earlier, the human visual system tends to extract the salient features from a given scene to remember it for retrieval later. To mimic this, people started designing various feature extractors that can extract these salient points from a given image. Popular algorithms include Scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and Features From Accelerated Segment Test (FAST).

An OpenCV module called features2d provides functions to detect and extract all these features. Another module called xfeatures2d provides a few more feature extractors, some of which are still in the experimental phase. You can play around with these if you get the chance.

There is also a module called bioinspired that provides algorithms for biologically-inspired computer vision models.

Object detection

Object detection refers to detecting the location of an object in a given image. This process is not concerned with the type of object. If you design a chair detector, it will not tell you whether the chair in a given image is red with a high back, or blue with a low back—it willjust tell you the location of the chair.

Detecting the location of objects is a critical step in many computer vision systems. Consider the following photo:

If you run a chair detector on this image, it will put a green box around all the chairs—but it won't tell you what kind of chair it is.

Object detection used to be a computationally-intensive task because of the number of calculations required to perform the detection at various scales. To solve this, Paul Viola and Michael Jones came up with a great algorithm in their seminal 2001 paper, which you can read at the following link: https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/viola-cvpr-01.pdf. They provided a fast way to design an object detector for any object.

OpenCV has modules called objdetect and xobjdetect that provide the framework to design an object detector. You can use it to develop detectors for random items such as sunglasses, boots, and so on.

Machine learning

Machine learning algorithms are used extensively to build computer vision systems for object recognition, image classification, face detection, visual search, and so on.

OpenCV provides a module called ml, which has many machine learning algorithms bundled into it, including a Bayes classifier, k-nearest neighbors (KNN), support vector machines (SVM), decision trees, neural networks, and more.

It also has a module called Fast Approximate Nearest Neighbor Search Library (FLANN), which contains algorithms for fast nearest neighbor searches in large datasets.

Computational photography

Computational photography refers to using advanced image processing techniques to improve the images captured by cameras. Instead of focusing on optical processes and image capture methods, computational photography uses software to manipulate visual data. Applications include high dynamic range imaging, panoramic images, image relighting, and light field cameras.

Let's look at the following image:

Look at those vivid colors! This is an example of a high dynamic range image, and it wouldn't be possible to get this using conventional image capture techniques. To do this, we have to capture the same scene at multiple exposures, register those images with each other, and then blend them nicely to create this image.

The photo and xphoto modules contain various algorithms that provide algorithms pertaining to computational photography. There is also a module called stitching that provides algorithms to create panoramic images.

The image shown can be found here: https://pixabay.com/en/hdr-high-dynamic-range-landscape-806260/.

Shape analysis

The notion of shape is crucial in computer vision. We analyze visual data by recognizing various different shapes in the image. This is actually an important step in many algorithms.

Let's say you are trying to identify a particular logo in an image. You know that it can appear in various shapes, orientations, and sizes. One good way to get started is to quantify the characteristics of the shape of the object.

The shape module provides all the algorithms required to extract different shapes, measure similarity between them, transform the shapes of objects, and more.

Optical flow algorithms

Optical flow algorithms are used in videos to track features across successive frames. Let's say you want to track a particular object in a video. Running a feature extractor on each frame would be computationally expensive; hence, the process would be slow. So, you just extract the features from the current frame, and then track those features in successive frames.

Optical flow algorithms are heavily used in video-based applications in computer vision. The optflow