OpenCV By Example - Prateek Joshi - E-Book

OpenCV By Example E-Book

Prateek Joshi

0,0
43,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Enhance your understanding of Computer Vision and image processing by developing real-world projects in OpenCV 3

About This Book

  • Get to grips with the basics of Computer Vision and image processing
  • This is a step-by-step guide to developing several real-world Computer Vision projects using OpenCV 3
  • This book takes a special focus on working with Tesseract OCR, a free, open-source library to recognize text in images

Who This Book Is For

If you are a software developer with a basic understanding of Computer Vision and image processing and want to develop interesting Computer Vision applications with Open CV, this is the book for you. Knowledge of C++ is required.

What You Will Learn

  • Install OpenCV 3 on your operating system
  • Create the required CMake scripts to compile the C++ application and manage its dependencies
  • Get to grips with the Computer Vision workflows and understand the basic image matrix format and filters
  • Understand the segmentation and feature extraction techniques
  • Remove backgrounds from a static scene to identify moving objects for video surveillance
  • Track different objects in a live video using various techniques
  • Use the new OpenCV functions for text detection and recognition with Tesseract

In Detail

Open CV is a cross-platform, free-for-use library that is primarily used for real-time Computer Vision and image processing. It is considered to be one of the best open source libraries that helps developers focus on constructing complete projects on image processing, motion detection, and image segmentation.

Whether you are completely new to the concept of Computer Vision or have a basic understanding of it, this book will be your guide to understanding the basic OpenCV concepts and algorithms through amazing real-world examples and projects.

Starting from the installation of OpenCV on your system and understanding the basics of image processing, we swiftly move on to creating optical flow video analysis or text recognition in complex scenes, and will take you through the commonly used Computer Vision techniques to build your own Open CV projects from scratch.

By the end of this book, you will be familiar with the basics of Open CV such as matrix operations, filters, and histograms, as well as more advanced concepts such as segmentation, machine learning, complex video analysis, and text recognition.

Style and approach

This book is a practical guide with lots of tips, and is closely focused on developing Computer vision applications with OpenCV. Beginning with the fundamentals, the complexity increases with each chapter. Sample applications are developed throughout the book that you can execute and use in your own projects.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 293

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

OpenCV By Example
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Getting Started with OpenCV
Understanding the human visual system
How do humans understand image content?
Why is it difficult for machines to understand image content?
What can you do with OpenCV?
In-built data structures and input/output
Image processing operations
Building GUI
Video analysis
3D reconstruction
Feature extraction
Object detection
Machine learning
Computational photography
Shape analysis
Optical flow algorithms
Face and object recognition
Surface matching
Text detection and recognition
Installing OpenCV
Windows
Mac OS X
Linux
Summary
2. An Introduction to the Basics of OpenCV
Basic CMake configuration files
Creating a library
Managing dependencies
Making the script more complex
Images and matrices
Reading/writing images
Reading videos and cameras
Other basic object types
The vec object type
The Scalar object type
The Point object type
The Size object type
The Rect object type
RotatedRect object type
Basic matrix operations
Basic data persistence and storage
Writing to a file storage
Summary
3. Learning the Graphical User Interface and Basic Filtering
Introducing the OpenCV user interface
A basic graphical user interface with OpenCV
The graphical user interface with QT
Adding slider and mouse events to our interfaces
Adding buttons to a user interface
OpenGL support
Summary
4. Delving into Histograms and Filters
Generating a CMake script file
Creating the Graphical User Interface
Drawing a histogram
Image color equalization
Lomography effect
The cartoonize effect
Summary
5. Automated Optical Inspection, Object Segmentation, and Detection
Isolating objects in a scene
Creating an application for AOI
Preprocessing the input image
Noise removal
Removing the background using the light pattern for segmentation
The thresholding operation
Segmenting our input image
The connected component algorithm
The findContours algorithm
Summary
6. Learning Object Classification
Introducing machine learning concepts
Computer Vision and the machine learning workflow
Automatic object inspection classification example
Feature extraction
Training an SVM model
Input image prediction
Summary
7. Detecting Face Parts and Overlaying Masks
Understanding Haar cascades
What are integral images?
Overlaying a facemask in a live video
What happened in the code?
Get your sunglasses on
Looking inside the code
Tracking your nose, mouth, and ears
Summary
8. Video Surveillance, Background Modeling, and Morphological Operations
Understanding background subtraction
Naive background subtraction
Does it work well?
Frame differencing
How well does it work?
The Mixture of Gaussians approach
What happened in the code?
Morphological image processing
What's the underlying principle?
Slimming the shapes
Thickening the shapes
Other morphological operators
Morphological opening
Morphological closing
Drawing the boundary
White Top-Hat transform
Black Top-Hat transform
Summary
9. Learning Object Tracking
Tracking objects of a specific color
Building an interactive object tracker
Detecting points using the Harris corner detector
Shi-Tomasi Corner Detector
Feature-based tracking
The Lucas-Kanade method
The Farneback algorithm
Summary
10. Developing Segmentation Algorithms for Text Recognition
Introducing optical character recognition
The preprocessing step
Thresholding the image
Text segmentation
Creating connected areas
Identifying paragraph blocks
Text extraction and skew adjustment
Installing Tesseract OCR on your operating system
Installing Tesseract on Windows
Setting up Tesseract in Visual Studio
Setting the import and library paths
Configuring the linker
Adding the libraries to the windows path
Installing Tesseract on Mac
Using Tesseract OCR library
Creating a OCR function
Sending the output to a file
Summary
11. Text Recognition with Tesseract
How the text API works
The scene detection problem
Extremal regions
Extremal region filtering
Using the text API
Text detection
Text extraction
Text recognition
Summary
Index

OpenCV By Example

OpenCV By Example

Copyright © 2016 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: January 2016

Production reference: 1150116

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78528-094-8

www.packtpub.com

Credits

Authors

Prateek Joshi

David Millán Escrivá

Vinícius Godoy

Reviewers

Emmanuel d'Angelo

Dr. Bryan Wai-ching CHUNG

Nikolaus Gradwohl

Luis Díaz Más

Commissioning Editor

Ashwin Nair

Acquisition Editor

Tushar Gupta

Content Development Editor

Amey Varangaonkar

Technical Editor

Naveenkumar Jain

Copy Editor

Rashmi Sawant

Project Coordinator

Suzanne Coutinho

Proofreader

Safis Editing

Indexer

Hemangini Bari

Graphics

Kirk D'Penha

Production Coordinator

Shantanu N. Zagade

Cover Work

Shantanu N. Zagade

About the Authors

Prateek Joshi is a Computer Vision researcher and published author. He has over eight years of experience in this field with a primary focus on content-based analysis and deep learning. His work in this field has resulted in multiple patents, tech demos, and research papers at major IEEE conferences. He is the author of OpenCV with Python By Example, Packt Publishing.

He has won many hackathons using a wide variety of technologies related to image recognition. His blog has been visited by users in more than 200 countries, and he has been featured as a guest author in prominent tech magazines. He enjoys blogging on topics, such as artificial intelligence, abstract mathematics, and cryptography. You can visit his blog at www.prateekvjoshi.com.

He is an avid coder who is passionate about building game-changing products. He is particularly interested in intelligent algorithms that can automatically understand the content to produce scene descriptions in terms of constituent objects. He graduated from the University of Southern California and has worked for such companies as Nvidia, Microsoft Research, Qualcomm, and a couple of early stage start-ups in Silicon Valley. You can learn more about him on his personal website at www.prateekj.com.

I would like to thank the reviewers for helping me refine this book. I would also like to thank Packt Publishing for publishing this book. Finally, I would like to thank my family for supporting me through everything.

David Millán Escrivá was eight years old when he wrote his first program on an 8086 PC with Basic language, which enabled the 2D plotting of basic equations. He started with his computer development relationship and created many applications and games.

In 2005, he completed his studies in IT from the Universitat Politécnica de Valencia with honors in human-computer interaction supported by Computer Vision with OpenCV (v0.96). He had a final project based on this subject and published it on HCI Spanish Congress.

In 2014, he completed his Master's degree in artificial intelligence, computer graphics, and pattern recognition, focusing on pattern recognition and Computer Vision.

He participated in Blender source code, an open source and 3D-software project, and worked in his first commercial movie, Plumiferos—Aventuras voladoras, as a computer graphics software developer.

David now has more than 13 years of experience in IT, with more than nine years of experience in Computer Vision, computer graphics, and pattern recognition, working on different projects and start-ups, applying his knowledge of Computer Vision, optical character recognition, and augmented reality.

He is the author of the DamilesBlog (http://blog.damiles.com), where he publishes research articles and tutorials on OpenCV, Computer Vision in general, and optical character recognition algorithms. He is the co-author of Mastering OpenCV with Practical Computer Vision Projects Book and also the reviewer of GnuPlot Cookbook by Lee Phillips, OpenCV Computer Vision with Python by Joseph Howse, Instant Opencv Starter by Jayneil Dalal and Sohil Patel, all published by Packt Publishing.

I would like thank to my wife, Izaskun, my daughter, Eider, and my son, Pau, for their unlimited patience and support in all moments. They have changed my life and made it awesome. Love you all.

I would like to thank the OpenCV team and community that gives us this wonderful library. I would also like to thank my co-authors and Packt Publishing for supporting me and helping me complete this book.

Vinícius Godoy is a computer graphics university professor at PUCPR. He started programming with C++ 18 years ago and ventured into the field of computer gaming and computer graphics 10 years ago. His former experience also includes working as an IT manager in document processing applications in Sinax, a company that focuses in BPM and ECM activities, building games and applications for Positivo Informática, including building an augmented reality educational game exposed at CEBIT and network libraries for Siemens Enterprise Communications (Unify).

As part of his Master's degree research, he used Kinect, OpenNI, and OpenCV to recognize Brazilian sign language gestures. He is currently working with medical imaging systems for his PhD thesis. He was also a reviewer of the OpenNI Cookbook, Packt Publishing.

He is also a game development fan, having a popular site entirely dedicated to the field called Ponto V (http://www.pontov.com.br). He is the cofounder of a start-up company called Blackmuppet. His fields of interest includes image processing, Computer Vision, design patterns, and multithreaded applications.

I would like to thank my wife, who supported me while writing this book. Her incentive and cooperation was decisive.

I would also like to thank Fabio Binder, a teacher who introduced me to computer graphics and gaming fields, which greatly helped me in my computer programming career and brought me to PUCPR, where I had access to several computer graphics-related software.

About the Reviewers

Emmanuel d'Angelo is a photography enthusiast, who managed to make his way in the image processing field. After several years of working as a consultant on various image-related high-tech projects, he is now working as a developer in a photogrammetry start-up. You can find image-related thoughts and code on his technical blog at http://www.computersdontsee.net.

Dr. Bryan, Wai-ching CHUNG is an interactive media artist and design consultant who lives in Hong Kong. His artworks have been exhibited at the World Wide Video Festival, Multimedia Art Asia Pacific, Stuttgart Film Winter Festival, Microwave International New Media Arts Festival, and the China Media Art Festival. In the former Shanghai Expo 2010, he provided interactive design consultancy to various industry leaders in Hong Kong and China. He studied computer science in Hong Kong, interactive multimedia in London, and fine art in Melbourne. He also develops software libraries for the popular open source programming language, Processing. He is the author of the book, Multimedia Programming with Pure Data. Currently, he is working as an assistant professor in the Academy of Visual Arts, Hong Kong Baptist University, where he teaches subjects on interactive arts, computer graphics, and multimedia. His website is http://www.magicandlove.com.

Nikolaus Gradwohl was born in 1976 in Vienna, Austria, and always wanted to become an inventor like Gyro Gearloose. When he got his first Atari, he figured out that being a computer programmer was the closest he could get to that dream. He wrote programs for nearly anything that can be programmed, ranging from an 8-bit microcontroller to mainframes for a living. In his free time, he likes to gain knowledge of programming languages and operating systems.

He is the author of Processing 2: Creative Coding Hotshot, Packt Publishing.

You can see some of his work on his blog at http://www.local-guru.net/.

Luis Díaz Más is a C++ software engineer currently working at Pix4D, where he plays the role of a software architect and develops image processing algorithms that are oriented toward photogrammetry and terrain mapping. He received his PhD in computer science from the University of Cordoba (Spain) that focuses on 3D reconstructions and action recognition. Earlier, he worked for CATEC, a research center for advanced aerospace technologies, where he developed the sensorial systems for UAS (Unmanned Aerial Systems). He has reviewed other OpenCV books published by Packt, and he is continuously looking forward to gaining more knowledge of different topics, such as modern C++ 11/14, Python, CUDA, OpenCL, and so on.

I would like to thank my parents for always supporting me and giving me the freedom to do what I like the most in this life. I would also like to thank my thesis directors, Rafa and Paco, who helped me in my scientific career and from whom I have learned a lot. Finally, a special mention to Celia, the woman who chose to share her life with this software freak and the one who continuously reminds me that there are more things in life apart from programming.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

OpenCV is one of the most popular libraries used to develop Computer Vision applications. It enables us to run many different Computer Vision algorithms in real time. It has been around for many years, and it has become the standard library in this field. One of the main advantages of OpenCV is that it is highly optimized and available on almost all the platforms.

This book starts off by giving a brief introduction of various fields in Computer Vision and the associated OpenCV functionalities in C++. Each chapter contains real-world examples and code samples to demonstrate the use cases. This helps you to easily grasp the topics and understand how they can be applied in real life. To sum it up, this is a practical guide on how to use OpenCV in C++ and build various applications using this library.

What this book covers

Chapter 1, Getting Started with OpenCV, covers installation steps on various operating systems and provides an introduction to the human visual system as well as various topics in Computer Vision.

Chapter 2, An Introduction to the Basics of OpenCV, discusses how to read/write images and videos in OpenCV, and also explains how to build a project using CMake.

Chapter 3, Learning the Graphical User Interface and Basic Filtering, covers how to build a graphical user interface and mouse event detector to build interactive applications.

Chapter 4, Delving into Histograms and Filters, explores histograms and filters and also shows how we can cartoonize an image.

Chapter 5, Automated Optical Inspection, Object Segmentation, and Detection, describes various image preprocessing techniques, such as noise removal, thresholding, and contour analysis.

Chapter 6, Learning Object Classification, deals with object recognition and machine learning, and how to use Support Vector Machines to build an object classification system.

Chapter 7, Detecting Face Parts and Overlaying Masks, discusses face detection and Haar Cascades, and then explains how these methods can be used to detect various parts of the human face.

Chapter 8, Video Surveillance, Background Modeling, and Morphological Operations, explores background subtraction, video surveillance, and morphological image processing and describes how they are connected to each other.

Chapter 9, Learning Object Tracking, covers how to track objects in a live video using different techniques, such as color-based and feature-based tracking.

Chapter 10, Developing Segmentation Algorithms for Text Recognition, covers optical character recognition, text segmentation, and provides an introduction to the Tesseract OCR engine.

Chapter 11, Text Recognition with Tesseract, delves deeper into the Tesseract OCR Engine to explain how it can be used for text detection, extraction, and recognition.

What you need for this book

The examples are built using the following technologies:

OpenCV 3.0 or newerCMake 3.3.x or newerTesseractLeptonica (dependency of Tesseract)QT (optional)OpenGL (optional)

Detailed installation instructions are provided in the relevant chapters.

Who this book is for

This book is for developers who are new to OpenCV and want to develop Computer Vision applications with OpenCV in C++. A basic knowledge of C++ would be helpful to understand this book. This book is also useful for people who want to get started with Computer Vision and understand the underlying concepts. They should be aware of basic mathematical concepts, such as vectors, matrices, matrix multiplication, and so on, to make the most out of this book. During the course of this book, you will learn how to build various Computer Vision applications from scratch using OpenCV.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "For a basic project based on an executable build from one source code file, a two line CMakeLists.txt file is all that is needed ."

A block of code is set as follows:

#include "opencv2/opencv.hpp" using namespace cv; int main(int, char** argv) { FileStorage fs2("test.yml", FileStorage::READ); Mat r; fs2["Result"] >> r; std::cout << r << std::endl; fs2.release(); return 0; }

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

@Path("departments") @Produces(MediaType.APPLICATION_JSON) public class DepartmentResource{ //Class implementation goes here... }

Any command-line input or output is written as follows:

C:\> setx -m OPENCV_DIR D:\OpenCV\Build\x64\vc11

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "To show the control panel we can push the last tool bar button, right click in any part of QT Window and select Display properties window."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Instructions for running examples are available in the README.md file present in the root folder of each project.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/OpenCV_By_Example_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Getting Started with OpenCV

Computer Vision applications are interesting and useful, but the underlying algorithms are computationally intensive. With the advent of cloud computing, we are getting more processing power to work with. The OpenCV library enables you to run Computer Vision algorithms efficiently in real time. It has been around for many years and it has become the standard library in this field. One of the main advantages of OpenCV is that it is highly optimized and available on almost all platforms. The discussions in this book will cover everything, including the algorithm we are using, why we are using it, and how to implement it in OpenCV.

In this chapter, we are going to learn how to install OpenCV on various operating systems. We will discuss what OpenCV offers out of the box and the various things that we can do using the in-built functions.

By the end of this chapter, you will be able to answer the following questions:

How do humans process visual data and how do they understand image content?What can we do with OpenCV and what are the various modules available in OpenCV that can be used to achieve those things?How to install OpenCV on Windows, Linux, and Mac OS X?

Understanding the human visual system

Before we jump into OpenCV functionalities, we need to understand why those functions were built in the first place. It's important to understand how the human visual system works so that you can develop the right algorithms. The goal of the Computer Vision algorithms is to understand the content of images and videos. Humans seem to do it effortlessly! So, how do we get machines to do it with the same accuracy?

Let's consider the following figure:

The human eye captures all the information that comes along such as color, shapes, brightness, and so on. In the preceding image, the human eye captures all the information about the two main objects and stores it in a certain way. Once we understand how our system works, we can take advantage of this to achieve what we want. For example, here are a few things we need to know:

Our visual system is more sensitive to low frequency content than high frequency content. Low frequency content refers to planar regions where pixel values don't change rapidly and high frequency content refers to regions with corners and edges, where pixel values fluctuate a lot. You will have noticed that we can easily see if there are blotches on a planar surface, but it's difficult to spot something like that on a highly textured surface.The human eye is more sensitive to changes in brightness as compared to changes in color.Our visual system is sensitive to motion. We can quickly recognize if something is moving in our field of vision even though we are not directly looking at it.We tend to make a mental note of salient points in our field of vision. Let's consider a white table with four black legs and a red dot at one of the corners of the table surface. When you look at this table, you'll immediately make a mental note that the surface and legs have opposing colors and there is a red dot on one of the corners. Our brain is really smart that way! We do this automatically so that we can immediately recognize it if we encounter it again.

To get an idea of our field of view, let's take a look at the top view of a human and the angles at which we see various things:

Our visual system is actually capable of a lot more things, but this should be good enough to get us started. You can explore further by reading up on Human Visual System Models on the internet.

How do humans understand image content?

If you look around, you will see a lot of objects. You may encounter many different objects every day, and you recognize them almost instantaneously without any effort. When you see a chair, you don't wait for a few minutes before realizing that it is, in fact, a chair. You just know that it's a chair right away! Now, on the other hand, computers find it very difficult to do this task. Researchers have been working for many years to find out why computers are not as good as we are at this.

To get an answer to this question, we need to understand how humans do it. The visual data processing happens in the ventral visual stream. This ventral visual stream refers to the pathway in our visual system that is associated with object recognition. It is basically a hierarchy of areas in our brain that helps us recognize objects. Humans can recognize different objects effortlessly, and we can cluster similar objects together. We can do this because we have developed some sort of invariance toward objects of the same class. When we look at an object, our brain extracts the salient points in such a way that factors such as orientation, size, perspective, and illumination don't matter.

A chair that is double the normal size and rotated by 45 degrees is still a chair. We can easily recognize it because of the way we process it. Machines cannot do this so easily. Humans tend to remember an object based on its shape and important features. Regardless of how the object is placed, we can still recognize it. In our visual system, we build these hierarchical invariances with respect to position, scale, and viewpoint that help us to be very robust.

If you look deeper in our system, you will see that humans have cells in their visual cortex that can respond to shapes, such as curves and lines. As we move further along our ventral stream, we will see more complex cells that are trained to respond to more complex objects, such as trees, gates, and so on. The neurons along our ventral stream tend to show an increase in the size of the receptive field. This is coupled with the fact that the complexity of their preferred stimuli increases as well.

Why is it difficult for machines to understand image content?

We now understand how visual data enters the human visual system and how our system processes it. The issue is that we still don't completely understand how our brain recognizes and organizes this visual data. We just extract some features from images and ask the computer to learn from them using machine learning algorithms. We still have those variations such as shape, size, perspective, angle, illumination, occlusion, and so on. For example, the same chair looks very different to a machine when you look at it from the side view. Humans can easily recognize that it's a chair regardless of how it's presented to us. So, how do we explain this to our machines?

One way to do this would be to store all the different variations of an object, including sizes, angles, perspectives, and so on. But this process is cumbersome and time-consuming! Also, it's actually not possible to gather data that can encompass every single variation. The machines will consume a huge amount of memory and a lot of time to build a model that can recognize these objects. Even with all this, if an object is partially occluded, computers still won't be able to recognize it. This is because they think that this is a new object. So, when we build a Computer Vision library, we need to build the underlying functional blocks that can be combined in many different ways to formulate complex algorithms. OpenCV provides a lot of these functions and they are highly optimized. So, once we understand what OpenCV provides out of the box, we can use it effectively to build interesting applications. Let's go ahead and explore this in the next section.

What can you do with OpenCV?

Using OpenCV, you can pretty much do every Computer Vision task that you can think of. Real-life problems require you to use many blocks together to achieve the desired result. So, you just need to understand what modules and functions to use to get what you want. Let's understand what OpenCV can do out of the box.

In-built data structures and input/output

One of the best things about OpenCV is that it provides a lot of in-built primitives to handle operations related to image processing and Computer Vision. If you have to write something from scratch, you will have to define things, such as an image, point, rectangle, and so on. These are fundamental to almost any Computer Vision algorithm. OpenCV comes with all these basic structures out of the box, and they are contained in the core module. Another advantage is that these structures have already been optimized for speed and memory, so you don't have to worry about the implementation details.

The imgcodecs module handles reading and writing image files. When you operate on an input image and create an output image, you can save it as a jpg or a png file with a simple command. You will be dealing with a lot of video files when you are working with cameras. The videoio module handles everything related to the input/output of video files. You can easily capture a video from a webcam or read a video file in many different formats. You can even save a bunch of frames as a video file by setting properties such as frames per second, frame size, and so on.

Image processing operations

When you write a Computer Vision algorithm, there are a lot of basic image processing operations that you will use over and over again. Most of these functions are present in the imgproc module. You can do things such as image filtering, morphological operations, geometric transformations, color conversions, drawing on images, histograms, shape analysis, motion analysis, feature detection, and so on. Let's consider the following figure:

The right-hand side image is a rotated version of the left-hand side image. We can do this transformation with a single line in OpenCV. There is another module called ximgproc that contains advanced image processing algorithms such as structured forests for edge detection, domain transform filters, adaptive manifold filters, and so on.

Building GUI

OpenCV provides a module called highgui that handles all the high-level user interface operations. Let's say that you are working on a problem and you want to check what the image looks like before you proceed to the next step. This module has functions that can be used to create windows to display images and/or video. There is also a waiting function that will wait until you hit a key on your keyboard before it goes to the next step. There is a function that can detect mouse events as well. This is very useful to develop interactive applications. Using this functionality, you can draw rectangles on these input windows and then proceed based on the selected region.

Consider the following image:

As you can see, we have drawn a green rectangle on the image and applied a negative film effect to that region. Once we have the coordinates of this rectangle, we can operate only on that region.

Video analysis

Video analysis includes tasks such as analyzing the motion between successive frames in a video, tracking different objects in a video, creating models for video surveillance, and so on. OpenCV provides a module called video that can handle all of this. There is a module called videostab that deals with video stabilization. Video stabilization is an important part of video cameras. When you capture videos by holding the camera in your hands, it's hard to keep your hands perfectly steady. If you look at that video as it is, it will look bad and jittery. All modern devices use video stabilization techniques to process the videos before they are presented to the end user.

3D reconstruction

3D reconstruction is an important topic in Computer Vision. Given a set of 2D images, we can reconstruct the 3D scene using the relevant algorithms. OpenCV provides algorithms that can find the relationship between various objects in these 2D images to compute their 3D positions. We have a module called calib3d that can handle all this. This module can also handle camera calibration, which is essential to estimate the parameters of the camera. These parameters are basically the internal parameters of any given camera that uses them to transform the captured scene into an image. We need to know these parameters to design algorithms, or else we might get unexpected results. Let's consider the following figure:

As shown in the preceding image, the same object is captured from multiple poses. Our job is to reconstruct the original object using these 2D images.

Feature extraction

As discussed earlier, the human visual system tends to extract the salient features from a given scene so that it can be retrieved later. To mimic this, people started designing various feature extractors that can extract these salient points from a given image. Some of the popular algorithms include