43,19 €
Enhance your understanding of Computer Vision and image processing by developing real-world projects in OpenCV 3
If you are a software developer with a basic understanding of Computer Vision and image processing and want to develop interesting Computer Vision applications with Open CV, this is the book for you. Knowledge of C++ is required.
Open CV is a cross-platform, free-for-use library that is primarily used for real-time Computer Vision and image processing. It is considered to be one of the best open source libraries that helps developers focus on constructing complete projects on image processing, motion detection, and image segmentation.
Whether you are completely new to the concept of Computer Vision or have a basic understanding of it, this book will be your guide to understanding the basic OpenCV concepts and algorithms through amazing real-world examples and projects.
Starting from the installation of OpenCV on your system and understanding the basics of image processing, we swiftly move on to creating optical flow video analysis or text recognition in complex scenes, and will take you through the commonly used Computer Vision techniques to build your own Open CV projects from scratch.
By the end of this book, you will be familiar with the basics of Open CV such as matrix operations, filters, and histograms, as well as more advanced concepts such as segmentation, machine learning, complex video analysis, and text recognition.
This book is a practical guide with lots of tips, and is closely focused on developing Computer vision applications with OpenCV. Beginning with the fundamentals, the complexity increases with each chapter. Sample applications are developed throughout the book that you can execute and use in your own projects.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 293
Veröffentlichungsjahr: 2016
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: January 2016
Production reference: 1150116
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78528-094-8
www.packtpub.com
Authors
Prateek Joshi
David Millán Escrivá
Vinícius Godoy
Reviewers
Emmanuel d'Angelo
Dr. Bryan Wai-ching CHUNG
Nikolaus Gradwohl
Luis Díaz Más
Commissioning Editor
Ashwin Nair
Acquisition Editor
Tushar Gupta
Content Development Editor
Amey Varangaonkar
Technical Editor
Naveenkumar Jain
Copy Editor
Rashmi Sawant
Project Coordinator
Suzanne Coutinho
Proofreader
Safis Editing
Indexer
Hemangini Bari
Graphics
Kirk D'Penha
Production Coordinator
Shantanu N. Zagade
Cover Work
Shantanu N. Zagade
Prateek Joshi is a Computer Vision researcher and published author. He has over eight years of experience in this field with a primary focus on content-based analysis and deep learning. His work in this field has resulted in multiple patents, tech demos, and research papers at major IEEE conferences. He is the author of OpenCV with Python By Example, Packt Publishing.
He has won many hackathons using a wide variety of technologies related to image recognition. His blog has been visited by users in more than 200 countries, and he has been featured as a guest author in prominent tech magazines. He enjoys blogging on topics, such as artificial intelligence, abstract mathematics, and cryptography. You can visit his blog at www.prateekvjoshi.com.
He is an avid coder who is passionate about building game-changing products. He is particularly interested in intelligent algorithms that can automatically understand the content to produce scene descriptions in terms of constituent objects. He graduated from the University of Southern California and has worked for such companies as Nvidia, Microsoft Research, Qualcomm, and a couple of early stage start-ups in Silicon Valley. You can learn more about him on his personal website at www.prateekj.com.
I would like to thank the reviewers for helping me refine this book. I would also like to thank Packt Publishing for publishing this book. Finally, I would like to thank my family for supporting me through everything.
David Millán Escrivá was eight years old when he wrote his first program on an 8086 PC with Basic language, which enabled the 2D plotting of basic equations. He started with his computer development relationship and created many applications and games.
In 2005, he completed his studies in IT from the Universitat Politécnica de Valencia with honors in human-computer interaction supported by Computer Vision with OpenCV (v0.96). He had a final project based on this subject and published it on HCI Spanish Congress.
In 2014, he completed his Master's degree in artificial intelligence, computer graphics, and pattern recognition, focusing on pattern recognition and Computer Vision.
He participated in Blender source code, an open source and 3D-software project, and worked in his first commercial movie, Plumiferos—Aventuras voladoras, as a computer graphics software developer.
David now has more than 13 years of experience in IT, with more than nine years of experience in Computer Vision, computer graphics, and pattern recognition, working on different projects and start-ups, applying his knowledge of Computer Vision, optical character recognition, and augmented reality.
He is the author of the DamilesBlog (http://blog.damiles.com), where he publishes research articles and tutorials on OpenCV, Computer Vision in general, and optical character recognition algorithms. He is the co-author of Mastering OpenCV with Practical Computer Vision Projects Book and also the reviewer of GnuPlot Cookbook by Lee Phillips, OpenCV Computer Vision with Python by Joseph Howse, Instant Opencv Starter by Jayneil Dalal and Sohil Patel, all published by Packt Publishing.
I would like thank to my wife, Izaskun, my daughter, Eider, and my son, Pau, for their unlimited patience and support in all moments. They have changed my life and made it awesome. Love you all.
I would like to thank the OpenCV team and community that gives us this wonderful library. I would also like to thank my co-authors and Packt Publishing for supporting me and helping me complete this book.
Vinícius Godoy is a computer graphics university professor at PUCPR. He started programming with C++ 18 years ago and ventured into the field of computer gaming and computer graphics 10 years ago. His former experience also includes working as an IT manager in document processing applications in Sinax, a company that focuses in BPM and ECM activities, building games and applications for Positivo Informática, including building an augmented reality educational game exposed at CEBIT and network libraries for Siemens Enterprise Communications (Unify).
As part of his Master's degree research, he used Kinect, OpenNI, and OpenCV to recognize Brazilian sign language gestures. He is currently working with medical imaging systems for his PhD thesis. He was also a reviewer of the OpenNI Cookbook, Packt Publishing.
He is also a game development fan, having a popular site entirely dedicated to the field called Ponto V (http://www.pontov.com.br). He is the cofounder of a start-up company called Blackmuppet. His fields of interest includes image processing, Computer Vision, design patterns, and multithreaded applications.
I would like to thank my wife, who supported me while writing this book. Her incentive and cooperation was decisive.
I would also like to thank Fabio Binder, a teacher who introduced me to computer graphics and gaming fields, which greatly helped me in my computer programming career and brought me to PUCPR, where I had access to several computer graphics-related software.
Emmanuel d'Angelo is a photography enthusiast, who managed to make his way in the image processing field. After several years of working as a consultant on various image-related high-tech projects, he is now working as a developer in a photogrammetry start-up. You can find image-related thoughts and code on his technical blog at http://www.computersdontsee.net.
Dr. Bryan, Wai-ching CHUNG is an interactive media artist and design consultant who lives in Hong Kong. His artworks have been exhibited at the World Wide Video Festival, Multimedia Art Asia Pacific, Stuttgart Film Winter Festival, Microwave International New Media Arts Festival, and the China Media Art Festival. In the former Shanghai Expo 2010, he provided interactive design consultancy to various industry leaders in Hong Kong and China. He studied computer science in Hong Kong, interactive multimedia in London, and fine art in Melbourne. He also develops software libraries for the popular open source programming language, Processing. He is the author of the book, Multimedia Programming with Pure Data. Currently, he is working as an assistant professor in the Academy of Visual Arts, Hong Kong Baptist University, where he teaches subjects on interactive arts, computer graphics, and multimedia. His website is http://www.magicandlove.com.
Nikolaus Gradwohl was born in 1976 in Vienna, Austria, and always wanted to become an inventor like Gyro Gearloose. When he got his first Atari, he figured out that being a computer programmer was the closest he could get to that dream. He wrote programs for nearly anything that can be programmed, ranging from an 8-bit microcontroller to mainframes for a living. In his free time, he likes to gain knowledge of programming languages and operating systems.
He is the author of Processing 2: Creative Coding Hotshot, Packt Publishing.
You can see some of his work on his blog at http://www.local-guru.net/.
Luis Díaz Más is a C++ software engineer currently working at Pix4D, where he plays the role of a software architect and develops image processing algorithms that are oriented toward photogrammetry and terrain mapping. He received his PhD in computer science from the University of Cordoba (Spain) that focuses on 3D reconstructions and action recognition. Earlier, he worked for CATEC, a research center for advanced aerospace technologies, where he developed the sensorial systems for UAS (Unmanned Aerial Systems). He has reviewed other OpenCV books published by Packt, and he is continuously looking forward to gaining more knowledge of different topics, such as modern C++ 11/14, Python, CUDA, OpenCL, and so on.
I would like to thank my parents for always supporting me and giving me the freedom to do what I like the most in this life. I would also like to thank my thesis directors, Rafa and Paco, who helped me in my scientific career and from whom I have learned a lot. Finally, a special mention to Celia, the woman who chose to share her life with this software freak and the one who continuously reminds me that there are more things in life apart from programming.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
OpenCV is one of the most popular libraries used to develop Computer Vision applications. It enables us to run many different Computer Vision algorithms in real time. It has been around for many years, and it has become the standard library in this field. One of the main advantages of OpenCV is that it is highly optimized and available on almost all the platforms.
This book starts off by giving a brief introduction of various fields in Computer Vision and the associated OpenCV functionalities in C++. Each chapter contains real-world examples and code samples to demonstrate the use cases. This helps you to easily grasp the topics and understand how they can be applied in real life. To sum it up, this is a practical guide on how to use OpenCV in C++ and build various applications using this library.
Chapter 1, Getting Started with OpenCV, covers installation steps on various operating systems and provides an introduction to the human visual system as well as various topics in Computer Vision.
Chapter 2, An Introduction to the Basics of OpenCV, discusses how to read/write images and videos in OpenCV, and also explains how to build a project using CMake.
Chapter 3, Learning the Graphical User Interface and Basic Filtering, covers how to build a graphical user interface and mouse event detector to build interactive applications.
Chapter 4, Delving into Histograms and Filters, explores histograms and filters and also shows how we can cartoonize an image.
Chapter 5, Automated Optical Inspection, Object Segmentation, and Detection, describes various image preprocessing techniques, such as noise removal, thresholding, and contour analysis.
Chapter 6, Learning Object Classification, deals with object recognition and machine learning, and how to use Support Vector Machines to build an object classification system.
Chapter 7, Detecting Face Parts and Overlaying Masks, discusses face detection and Haar Cascades, and then explains how these methods can be used to detect various parts of the human face.
Chapter 8, Video Surveillance, Background Modeling, and Morphological Operations, explores background subtraction, video surveillance, and morphological image processing and describes how they are connected to each other.
Chapter 9, Learning Object Tracking, covers how to track objects in a live video using different techniques, such as color-based and feature-based tracking.
Chapter 10, Developing Segmentation Algorithms for Text Recognition, covers optical character recognition, text segmentation, and provides an introduction to the Tesseract OCR engine.
Chapter 11, Text Recognition with Tesseract, delves deeper into the Tesseract OCR Engine to explain how it can be used for text detection, extraction, and recognition.
The examples are built using the following technologies:
Detailed installation instructions are provided in the relevant chapters.
This book is for developers who are new to OpenCV and want to develop Computer Vision applications with OpenCV in C++. A basic knowledge of C++ would be helpful to understand this book. This book is also useful for people who want to get started with Computer Vision and understand the underlying concepts. They should be aware of basic mathematical concepts, such as vectors, matrices, matrix multiplication, and so on, to make the most out of this book. During the course of this book, you will learn how to build various Computer Vision applications from scratch using OpenCV.
In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.
Code words in text are shown as follows: "For a basic project based on an executable build from one source code file, a two line CMakeLists.txt file is all that is needed ."
A block of code is set as follows:
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
Any command-line input or output is written as follows:
New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "To show the control panel we can push the last tool bar button, right click in any part of QT Window and select Display properties window."
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Instructions for running examples are available in the README.md file present in the root folder of each project.
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/OpenCV_By_Example_ColorImages.pdf.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.
Computer Vision applications are interesting and useful, but the underlying algorithms are computationally intensive. With the advent of cloud computing, we are getting more processing power to work with. The OpenCV library enables you to run Computer Vision algorithms efficiently in real time. It has been around for many years and it has become the standard library in this field. One of the main advantages of OpenCV is that it is highly optimized and available on almost all platforms. The discussions in this book will cover everything, including the algorithm we are using, why we are using it, and how to implement it in OpenCV.
In this chapter, we are going to learn how to install OpenCV on various operating systems. We will discuss what OpenCV offers out of the box and the various things that we can do using the in-built functions.
By the end of this chapter, you will be able to answer the following questions:
Before we jump into OpenCV functionalities, we need to understand why those functions were built in the first place. It's important to understand how the human visual system works so that you can develop the right algorithms. The goal of the Computer Vision algorithms is to understand the content of images and videos. Humans seem to do it effortlessly! So, how do we get machines to do it with the same accuracy?
Let's consider the following figure:
The human eye captures all the information that comes along such as color, shapes, brightness, and so on. In the preceding image, the human eye captures all the information about the two main objects and stores it in a certain way. Once we understand how our system works, we can take advantage of this to achieve what we want. For example, here are a few things we need to know:
To get an idea of our field of view, let's take a look at the top view of a human and the angles at which we see various things:
Our visual system is actually capable of a lot more things, but this should be good enough to get us started. You can explore further by reading up on Human Visual System Models on the internet.
If you look around, you will see a lot of objects. You may encounter many different objects every day, and you recognize them almost instantaneously without any effort. When you see a chair, you don't wait for a few minutes before realizing that it is, in fact, a chair. You just know that it's a chair right away! Now, on the other hand, computers find it very difficult to do this task. Researchers have been working for many years to find out why computers are not as good as we are at this.
To get an answer to this question, we need to understand how humans do it. The visual data processing happens in the ventral visual stream. This ventral visual stream refers to the pathway in our visual system that is associated with object recognition. It is basically a hierarchy of areas in our brain that helps us recognize objects. Humans can recognize different objects effortlessly, and we can cluster similar objects together. We can do this because we have developed some sort of invariance toward objects of the same class. When we look at an object, our brain extracts the salient points in such a way that factors such as orientation, size, perspective, and illumination don't matter.
A chair that is double the normal size and rotated by 45 degrees is still a chair. We can easily recognize it because of the way we process it. Machines cannot do this so easily. Humans tend to remember an object based on its shape and important features. Regardless of how the object is placed, we can still recognize it. In our visual system, we build these hierarchical invariances with respect to position, scale, and viewpoint that help us to be very robust.
If you look deeper in our system, you will see that humans have cells in their visual cortex that can respond to shapes, such as curves and lines. As we move further along our ventral stream, we will see more complex cells that are trained to respond to more complex objects, such as trees, gates, and so on. The neurons along our ventral stream tend to show an increase in the size of the receptive field. This is coupled with the fact that the complexity of their preferred stimuli increases as well.
We now understand how visual data enters the human visual system and how our system processes it. The issue is that we still don't completely understand how our brain recognizes and organizes this visual data. We just extract some features from images and ask the computer to learn from them using machine learning algorithms. We still have those variations such as shape, size, perspective, angle, illumination, occlusion, and so on. For example, the same chair looks very different to a machine when you look at it from the side view. Humans can easily recognize that it's a chair regardless of how it's presented to us. So, how do we explain this to our machines?
One way to do this would be to store all the different variations of an object, including sizes, angles, perspectives, and so on. But this process is cumbersome and time-consuming! Also, it's actually not possible to gather data that can encompass every single variation. The machines will consume a huge amount of memory and a lot of time to build a model that can recognize these objects. Even with all this, if an object is partially occluded, computers still won't be able to recognize it. This is because they think that this is a new object. So, when we build a Computer Vision library, we need to build the underlying functional blocks that can be combined in many different ways to formulate complex algorithms. OpenCV provides a lot of these functions and they are highly optimized. So, once we understand what OpenCV provides out of the box, we can use it effectively to build interesting applications. Let's go ahead and explore this in the next section.
Using OpenCV, you can pretty much do every Computer Vision task that you can think of. Real-life problems require you to use many blocks together to achieve the desired result. So, you just need to understand what modules and functions to use to get what you want. Let's understand what OpenCV can do out of the box.
One of the best things about OpenCV is that it provides a lot of in-built primitives to handle operations related to image processing and Computer Vision. If you have to write something from scratch, you will have to define things, such as an image, point, rectangle, and so on. These are fundamental to almost any Computer Vision algorithm. OpenCV comes with all these basic structures out of the box, and they are contained in the core module. Another advantage is that these structures have already been optimized for speed and memory, so you don't have to worry about the implementation details.
The imgcodecs module handles reading and writing image files. When you operate on an input image and create an output image, you can save it as a jpg or a png file with a simple command. You will be dealing with a lot of video files when you are working with cameras. The videoio module handles everything related to the input/output of video files. You can easily capture a video from a webcam or read a video file in many different formats. You can even save a bunch of frames as a video file by setting properties such as frames per second, frame size, and so on.
When you write a Computer Vision algorithm, there are a lot of basic image processing operations that you will use over and over again. Most of these functions are present in the imgproc module. You can do things such as image filtering, morphological operations, geometric transformations, color conversions, drawing on images, histograms, shape analysis, motion analysis, feature detection, and so on. Let's consider the following figure:
The right-hand side image is a rotated version of the left-hand side image. We can do this transformation with a single line in OpenCV. There is another module called ximgproc that contains advanced image processing algorithms such as structured forests for edge detection, domain transform filters, adaptive manifold filters, and so on.
OpenCV provides a module called highgui that handles all the high-level user interface operations. Let's say that you are working on a problem and you want to check what the image looks like before you proceed to the next step. This module has functions that can be used to create windows to display images and/or video. There is also a waiting function that will wait until you hit a key on your keyboard before it goes to the next step. There is a function that can detect mouse events as well. This is very useful to develop interactive applications. Using this functionality, you can draw rectangles on these input windows and then proceed based on the selected region.
Consider the following image:
As you can see, we have drawn a green rectangle on the image and applied a negative film effect to that region. Once we have the coordinates of this rectangle, we can operate only on that region.
Video analysis includes tasks such as analyzing the motion between successive frames in a video, tracking different objects in a video, creating models for video surveillance, and so on. OpenCV provides a module called video that can handle all of this. There is a module called videostab that deals with video stabilization. Video stabilization is an important part of video cameras. When you capture videos by holding the camera in your hands, it's hard to keep your hands perfectly steady. If you look at that video as it is, it will look bad and jittery. All modern devices use video stabilization techniques to process the videos before they are presented to the end user.
3D reconstruction is an important topic in Computer Vision. Given a set of 2D images, we can reconstruct the 3D scene using the relevant algorithms. OpenCV provides algorithms that can find the relationship between various objects in these 2D images to compute their 3D positions. We have a module called calib3d that can handle all this. This module can also handle camera calibration, which is essential to estimate the parameters of the camera. These parameters are basically the internal parameters of any given camera that uses them to transform the captured scene into an image. We need to know these parameters to design algorithms, or else we might get unexpected results. Let's consider the following figure:
As shown in the preceding image, the same object is captured from multiple poses. Our job is to reconstruct the original object using these 2D images.
As discussed earlier, the human visual system tends to extract the salient features from a given scene so that it can be retrieved later. To mimic this, people started designing various feature extractors that can extract these salient points from a given image. Some of the popular algorithms include
