34,79 €
Mastering OpenCV, now in its third edition, targets computer vision engineers taking their first steps toward mastering OpenCV. Keeping the mathematical formulations to a solid but bare minimum, the book delivers complete projects from ideation to running code, targeting current hot topics in computer vision such as face recognition, landmark detection and pose estimation, and number recognition with deep convolutional networks.
You’ll learn from experienced OpenCV experts how to implement computer vision products and projects both in academia and industry in a comfortable package. You’ll get acquainted with API functionality and gain insights into design choices in a complete computer vision project. You’ll also go beyond the basics of computer vision to implement solutions for complex image processing projects.
By the end of the book, you will have created various working prototypes with the help of projects in the book and be well versed with the new features of OpenCV4.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 352
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Aaron LazarAcquisition Editor: Shahnish KhanContent Development Editor: Zeeyan PinheiroTechnical Editor: Ketan KambleCopy Editor: Safis EditingProject Coordinator: Vaidehi SawantProofreader: Safis EditingIndexer: Pratik Shirodkar Graphics: Alishon MendonsaProduction Coordinator: Jisha Chirayil
First published: December 2012 Second edition: April 2017 Third edition: December 2018
Production reference: 1221218
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78953-357-6
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Roy Shilkrot is an assistant professor of computer science at Stony Brook University, where he leads the Human Interaction group. Dr. Shilkrot's research is in computer vision, human-computer interfaces, and the cross-over between these two domains, funded by US federal, New York State, and industry grants. Dr. Shilkrot graduated from the Massachusetts Institute of Technology (MIT) with a PhD, and has authored more than 25 peer-reviewed papers published at premier computer science conferences, such as CHI and SIGGRAPH, as well as in leading academic journals such as ACM Transaction on Graphics (TOG) and ACM Transactions on Computer-Human Interaction (ToCHI). Dr. Shilkrot is also a co-inventor of several patented technologies, a co-author of a number of books, serves on the scientific advisory board of numerous start-up companies, and has over 10 years of experience as an engineer and an entrepreneur.
David Millán Escrivá was eight years old when he wrote his first program on an 8086 PC in Basic, which enabled the 2D plotting of basic equations. In 2005, he finished his studies in IT through the Universitat Politécnica de Valenci with honors in human-computer interaction supported by computer vision with OpenCV (v0.96). He had a final project based on this subject and published it on HCI Spanish congress. He has worked with Blender, an open source, 3D software project, and worked on his first commercial movie, Plumiferos - Aventuras voladoras, as a computer graphics software developer. David now has more than 10 years of experience in IT, with experience in computer vision, computer graphics, and pattern recognition, working with different projects and start-ups, applying his knowledge of computer vision, optical character recognition, and augmented reality. He is the author of the DamilesBlog blog, where he publishes research articles and tutorials about OpenCV, computer vision in general, and optical character recognition algorithms.
Arun Ponnusamy works as a senior computer vision engineer at a start-up (OIC Apps) in India. He is a lifelong learner, passionate about image processing, computer vision and machine learning. He is an engineering graduate from PSG College of Technology, Coimbatore. He started his career at MulticoreWare Inc., where he spent most of his time on image processing, OpenCV, software optimization, and GPU computing.
Arun loves to understand computer vision concepts clearly and explain them in an intuitive way on his blog. He has created an open source Python library for computer vision namedcvlib, which is aimed at simplicity and user-friendliness. He is currently researching object detection, generative networks, and reinforcement learning.
Marc Amberg is an experienced machine learning and computer vision engineer with a proven history of working in the IT and service industries. He is skilled in Python, C/C++, OpenGL, 3D reconstruction, and Java. He is a strong engineering professional with a master's degree in computer science (image, vision, and interactions) from Université des Sciences et Technologies de Lille (Lille I).
Vikas Gupta is a computer vision researcher with a master's degree in this domain from India's premier institute: Indian Institute of Science. His research interests are in the field of machine perception, scene understanding, deep learning and robotics.
He has been working in this field in various roles including lecturer, software engineer and data scientist. He is passionate about teaching and sharing knowledge. He has spent 3 years teaching computer vision, embedded systems, and robotics to undergraduate students, and over 3 years working on various projects involving deep learning and computer vision. He has also co-authored a computer vision course at LearnOpenCV.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Mastering OpenCV 4 Third Edition
Dedication
About Packt
Why subscribe?
Packt.com
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Cartoonifier and Skin Color Analysis on the RaspberryPi
Accessing the webcam
Main camera processing loop for a desktop app
Generating a black and white sketch
Generating a color painting and a cartoon
Generating an evil mode using edge filters
Generating an alien mode using skin detection
Skin detection algorithm
Showing the user where to put their face
Implementation of the skin color changer
Reducing the random pepper noise from the sketch image
Porting from desktop to an embedded device
Equipment setup to develop code for an embedded device
Configuring a new Raspberry Pi
Installing OpenCV on an embedded device
Using the Raspberry Pi Camera Module
Installing the Raspberry Pi Camera Module driver
Making Cartoonifier run in fullscreen
Hiding the mouse cursor
Running Cartoonifier automatically after bootup
Speed comparison of Cartoonifier on desktop versus embedded
Changing the camera and camera resolution
Power draw of Cartoonifier running on desktop versus embedded system
Streaming video from Raspberry Pi to a powerful computer
Customizing your embedded system!
Summary
Explore Structure from Motion with the SfM Module
Technical requirements
Core concepts of SfM
Calibrated cameras and epipolar geometry
Stereo reconstruction and SfM
Implementing SfM in OpenCV
Image feature matching
Finding feature tracks
3D reconstruction and visualization
MVS for dense reconstruction
Summary
Face Landmark and Pose with the Face Module
Technical requirements
Theory and context
Active appearance models and constrained local models
Regression methods
Facial landmark detection in OpenCV
Measuring error
Estimating face direction from landmarks
Estimated pose calculation
Projecting the pose on the image
Summary
Number Plate Recognition with Deep Convolutional Networks
Introduction to ANPR
ANPR algorithm
Plate detection
Segmentation
Classification
Plate recognition
OCR segmentation
Character classification using a convolutional neural network
Creating and training a convolutional neural network with TensorFlow
Preparing the data
Creating a TensorFlow model
Preparing a model for OpenCV
Import and use model in OpenCV C++ code
Summary
Face Detection and Recognition with the DNN Module
Introduction to face detection and face recognition
Face detection
Implementing face detection using OpenCV cascade classifiers
Loading a Haar or LBP detector for object or face detection
Accessing the webcam
Detecting an object using the Haar or LBP classifier
Detecting the face
Implementing face detection using the OpenCV deep learning module
Face preprocessing
Eye detection
Eye search regions
Geometrical transformation
Separate histogram equalization for left and right sides
Smoothing
Elliptical mask
Collecting faces and learning from them
Collecting preprocessed faces for training
Training the face recognition system from collected faces
Viewing the learned knowledge
Average face
Eigenvalues, Eigenfaces, and Fisherfaces
Face recognition
Face identification – recognizing people from their faces
Face verification—validating that it is the claimed person
Finishing touches—saving and loading files
Finishing touches—making a nice and interactive GUI
Drawing the GUI elements
Startup mode
Detection mode
Collection mode
Training mode
Recognition mode
Checking and handling mouse clicks
Summary
References
Introduction to Web Computer Vision with OpenCV.js
What is OpenCV.js?
Compile OpenCV.js
Basic introduction to OpenCV.js development
Accessing webcam streams
Image processing and basic user interface
Threshold filter
Gaussian filter
Canny filter
Optical flow in your browser
Face detection using a Haar cascade classifier in your browser
Summary
Android Camera Calibration and AR Using the ArUco Module
Technical requirements
Augmented reality and pose estimation
Camera calibration
Augmented reality markers for planar reconstruction
Camera access in Android OS
Finding and opening the camera
Camera calibration with ArUco
Augmented reality with jMonkeyEngine
Summary
iOS Panoramas with the Stitching Module
Technical requirements
Panoramic image stitching methods
Feature extraction and robust matching for panoramas
Affine constraint
Random sample consensus (RANSAC)
Homography constraint
Bundle Adjustment
Warping images for panorama creation
Project overview
Setting up an iOS OpenCV project with CocoaPods
iOS UI for panorama capture
OpenCV stitching in an Objective-C++ wrapper
Summary
Further reading
Finding the Best OpenCV Algorithm for the Job
Technical requirements
Is it covered in OpenCV?
Algorithm options in OpenCV
Which algorithm is best?
Example comparative performance test of algorithms
Summary
Avoiding Common Pitfalls in OpenCV
History of OpenCV from v1 to v4
OpenCV and the data revolution in computer vision
Historic algorithms in OpenCV
How to check when an algorithm was added to OpenCV
Common pitfalls and suggested solutions
Summary
Further reading
Other Books You May Enjoy
Leave a review - let other readers know what you think
Mastering OpenCV, now in its third edition, is a book series targeting computer vision engineers in their first steps in using OpenCV as a tool. Keeping the mathematical formulations to a solid but bare minimum, the book delivers complete projects from ideation to running code, targeting current hot topics in computer vision including face recognition, landmark detection and pose estimation, number recognition with deep convolutional networks, structure from motion and scene reconstruction for augmented reality, and mobile phone computer vision in native and web environments. This book brings together the vast knowledge of the authors in implementing computer vision products and projects, both in academia and in industry, in a comfortable package. It takes readers through an explanation of the API functionality, while providing insights into design choices in a complete computer vision project, and elevates beyond the basics of computer vision to implement solutions for complex image recognition projects.
This book is targeted at novice computer vision engineers looking to get started with OpenCV, mostly in a C++ environment, with a hands-on approach as opposed to a traditional ground-up knowledge construction. It provides concrete use case examples of the OpenCV API with regard to current common computer vision tasks, while encouraging copy-paste-run and trying to keep the mathematical fundamentals to the bare minimum.
Computer vision engineers nowadays have a wide range of tools and packages to choose from, including OpenCV, dlib, Matlab packages, SimpleCV, XPCV, and scikit-image. None provide better coverage and cross-platform functionality than OpenCV. However, getting started with OpenCV may seem daunting, with many thousands of functions in the official modules' API alone, excluding contributed modules. While many documenting projects exist, beyond OpenCV's own extensive tutorial offerings, most do not cater for an engineer looking to go from start to finish on a project.
This book covers much of the functionality in OpenCV, including many contributed modules, either directly by means of a dedicated chapter, or indirectly through the code and text of a chapter. It also provides an opportunity to use OpenCV on the web, on an iOS and Android device, as well as in a Python Jupyter Notebook. Each chapter approaches a different problem and provides a complete, buildable, and runnable code example of how to achieve it, alongside a walkthrough of the solution and its theoretical context.
The book is structured to offer readers the following:
Working OpenCV code samples for contemporary, non-trivial computer vision problems
Best practices in engineering and maintaining OpenCV projects
Pragmatic, algorithmic design approaches for complex computer vision tasks
Familiarity with OpenCV's most up-to-date API (v4.0.0) hands-on by example
The following chapters are covered in this book:
Chapter 1, Cartoonifier and Skin Color Analysis on the RaspberryPi, demonstrates how to write some image processing filters for desktops and for small embedded systems such as Raspberry Pi.
Chapter 2, Explore Structure from Motion with the SfM Module, demonstrates how to use the SfM module to reconstruct a scene to a sparse point cloud, including camera poses, and also obtain a dense point cloud using multi-view stereo.
Chapter 3, Face Landmark and Pose with the Face Module, explains the process of face landmark (also known as facemark) detection using the face module.
Chapter 4, Number Plate Recognition with Deep Convolutional Networks, introduces image segmentation and feature extraction, pattern recognition basics, and two important pattern recognition algorithms, the Support Vector Machine (SVM) and deep neural network (DNN).
Chapter 5, Face Detection and Recognition with the DNN Module, demonstrates different techniques for detecting faces on the images, ranging from more classic algorithms using cascade classifiers with Haar features through to newer techniques employing deep learning.
Chapter 6, Introduction to Web Computer Vision with OpenCV.js, demonstrates a new way to develop computer vision algorithms for the web using OpenCV.js, a compiled version of OpenCV for JavaScript.
Chapter 7, Android Camera Calibration and AR Using the ArUco Module, shows how to implement an augmented reality (AR) application in the Android ecosystem, using OpenCV's ArUco module, Android's Camera2 APIs, and the JMonkeyEngine 3D game engine.
Chapter 8, iOS Panoramas with the Stitching Module, shows how to build a panoramic image stitching application on the iPhone using OpenCV's precompiled library for iOS.
Chapter 9, Finding the Best OpenCV Algorithm for the Job, discusses a number of methods to follow when considering options within OpenCV.
Chapter 10, Avoiding Common Pitfalls in OpenCV, reviews the historic development of OpenCV, and the gradual increase in the framework and algorithmic offering, alongside the development of computer vision at large.
The book assumes that readers have a firm grasp of programming concepts and software engineering skills, building and running software from scratch in C++. The book also features code in JavaScript, Python, Java, and Swift. Engineers looking to dive deeper into those sections will benefit from programming language knowledge beyond C++.
Readers of this book should be able to obtain an installation of OpenCV in its various flavors. Some chapters will require a Python, others an Android, installation. Obtaining these and installing them is discussed thoroughly in the accompanying code and in the text.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Mastering-OpenCV-4-Third-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789533576_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "To see the remaining space on your SD card, run df -h | head -2."
A block of code is set as follows:
Mat bigImg; resize(smallImg, bigImg, size, 0,0, INTER_LINEAR); dst.setTo(0); bigImg.copyTo(dst, mask);
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
Mat bigImg; resize(smallImg, bigImg, size, 0,0, INTER_LINEAR); dst.setTo(0);
bigImg.copyTo(dst, mask);
Any command-line input or output is written as follows:
sudo apt-get purge -y wolfram-engine
Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Navigate to Media | Open Network Stream."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
This chapter will show how to write some image processing filters for desktops and for small embedded systems such as Raspberry Pi. First, we develop for the desktop (in C/C++) and then port the project to Raspberry Pi, since this is the recommended scenario when developing for embedded devices. This chapter will cover the following topics:
How to convert a real-life image to a sketch drawing
How to convert to a painting and overlay the sketch to produce a cartoon
A scary evil mode to create bad characters instead of good characters
A basic skin detector and skin color changer, to give someone green alien skin
Finally, how to create an embedded system based on our desktop application
Note that an embedded system is basically a computer motherboard placed inside a product or device, designed to perform specific tasks, and Raspberry Pi is a very low-cost and popular motherboard for building an embedded system:
The preceding picture shows what you could make after this chapter: a battery-powered Raspberry Pi plus screen you could wear to Comic Con, turning everyone into a cartoon!
We want to make the real-world camera frames automatically look like they are from a cartoon. The basic idea is to fill the flat parts with some color and then draw thick lines on the strong edges. In other words, the flat areas should become much more flat and the edges should become much more distinct. We will detect edges, smooth the flat areas, and draw enhanced edges back on top, to produce a cartoon or comic book effect.
When developing an embedded computer vision system, it is a good idea to build a fully working desktop version first before porting it to an embedded system, since it is much easier to develop and debug a desktop program than an embedded system! So, this chapter will begin with a complete Cartoonifier desktop program that you can create using your favorite IDE (for example, Visual Studio, XCode, Eclipse, or QtCreator). After it is working properly on your desktop, the last section shows how to create an embedded system based on the desktop version. Many embedded projects require some custom code for the embedded system, such as to use different inputs and outputs, or use some platform-specific code optimizations. However, for this chapter, we will actually be running identical code on the embedded system and the desktop, so we only need to create one project.
The application uses an OpenCV GUI window, initializes the camera, and with each camera frame it calls the cartoonifyImage()function, containing most of the code in this chapter. It then displays the processed image in the GUI window. This chapter will explain how to create the desktop application from scratch using a USB webcam and the embedded system based on the desktop application, using the Raspberry Pi Camera Module. So, first you will create a desktop project in your favorite IDE, with a main.cpp file to hold the GUI code given in the following sections, such as the main loop, webcam functionality, and keyboard input, and you will create a cartoon.cppfile with the image processing operations with most of this chapter's code in a function called cartoonifyImage().
Now that we have a sketch mode, a cartoon mode (painting + sketch mask), and an evil mode (painting + evil mask), for fun, let's try something more complex: an alien mode, by detecting the skin regions of the face and then changing the skin color to green.
There are many different techniques used for detecting skin regions, from simple color thresholds using RGB (short for Red-Green-Blue) or HSV (short for Hue-Saturation-Brightness) values, or color histogram calculation and re-projection, to complex machine learning algorithms of mixture models that need camera calibration in the CIELab color space, offline training with many sample faces, and so on. But even the complex methods don't necessarily work robustly across various camera and lighting conditions and skin types. Since we want our skin detection to run on an embedded device, without any calibration or training, and we are just using skin detection for a fun image filter; it is sufficient for us to use a simple skin detection method. However, the color responses from the tiny camera sensor in the Raspberry Pi Camera Module tend to vary significantly, and we want to support skin detection for people of any skin color but without any calibration, so we need something more robust than simple color thresholds.
For example, a simple HSV skin detector can treat any pixel as skin if its hue color is fairly red, saturation is fairly high but not extremely high, and its brightness is not too dark or extremely bright. But cameras in mobile phones or Raspberry Pi Camera Modules often have bad white balancing; therefore, a person's skin might look slightly blue instead of red, for instance, and this would be a major problem for simple HSV thresholding.
A more robust solution is to perform face detection with a Haar or LBP cascade classifier (shown in Chapter 5, Face Detection and Recognition with the DNN Module), then look at the range of colors for the pixels in the middle of the detected face, since you know that those pixels should be skin pixels of the actual person. You could then scan the whole image or nearby region for pixels of a similar color as the center of the face. This has the advantage that it is very likely to find at least some of the true skin region of any detected person, no matter what their skin color is or even if their skin appears somewhat blueish or reddish in the camera image.
Unfortunately, face detection using cascade classifiers is quite slow on current embedded devices, so that method might be less ideal for some real-time embedded applications. On the other hand, we can take advantage of the fact that for mobile apps and some embedded systems, it can be expected that the user will be facing the camera directly from a very close distance, so it can be reasonable to ask the user to place their face at a specific location and distance, rather than try to detect the location and size of their face. This is the basis of many mobile phone apps, where the app asks the user to place their face at a certain position or perhaps to manually drag points on the screen to show where the corners of their face are in a photo. So, let's simply draw the outline of a face in the center of the screen, and ask the user to move their face to the position and sizeshown.
