40,81 €
Get to grips with traditional computer vision algorithms and deep learning approaches, and build real-world applications with OpenCV and other machine learning frameworks
Key Features
Book Description
OpenCV is a native cross-platform C++ library for computer vision, machine learning, and image processing. It is increasingly being adopted in Python for development. This book will get you hands-on with a wide range of intermediate to advanced projects using the latest version of the framework and language, OpenCV 4 and Python 3.8, instead of only covering the core concepts of OpenCV in theoretical lessons. This updated second edition will guide you through working on independent hands-on projects that focus on essential OpenCV concepts such as image processing, object detection, image manipulation, object tracking, and 3D scene reconstruction, in addition to statistical learning and neural networks.
You'll begin with concepts such as image filters, Kinect depth sensor, and feature matching. As you advance, you'll not only get hands-on with reconstructing and visualizing a scene in 3D but also learn to track visually salient objects. The book will help you further build on your skills by demonstrating how to recognize traffic signs and emotions on faces. Later, you'll understand how to align images, and detect and track objects using neural networks.
By the end of this OpenCV Python book, you'll have gained hands-on experience and become proficient at developing advanced computer vision apps according to specific business needs.
What you will learn
Who this book is for
This book is for intermediate-level OpenCV users who are looking to enhance their skills by developing advanced applications. Familiarity with OpenCV concepts and Python libraries, and basic knowledge of the Python programming language are assumed.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 427
Veröffentlichungsjahr: 2020
Copyright © 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Richa TripathiAcquisition Editor: Denim PintoContent Development Editor: Rosal ColacoSenior Editor: Afshaan KhanTechnical Editor: Ketan KambleCopy Editor: Safis EditingProject Coordinator:Francy PuthiryProofreader: Safis EditingIndexer:Priyanka DhadkeProduction Designer:Aparna Bhagat
First published: October 2015 Second edition: March 2020
Production reference: 1190320
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78980-181-1
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Dr. Menua Gevorgyan is an experienced researcher with a demonstrated history of working in the information technology and services industry. He is skilled in computer vision, deep learning, machine learning, and data science as well as having a lot of experience with OpenCV and Python programming. He is interested in machine perception and machine understanding problems, and wonders if it is possible to make a machine perceive the world as a human does.
Arsen Mamikonyan is an experienced machine learning specialist with demonstrated work experience in Silicon Valley and London, and teaching experience at the American University of Armenia. He is skilled in applied machine learning and data science and has built real-life applications using Python and OpenCV, among others. He holds a master's degree in engineering (MEng) with a concentration on artificial intelligence from the Massachusetts Institute of Technology.
Michael Beyeler is a postdoctoral fellow in neuroengineering and data science at the University of Washington, where he is working on computational models of bionic vision in order to improve the perceptual experience of blind patients implanted with a retinal prosthesis (bionic eye).
His work lies at the intersection of neuroscience, computer engineering, computer vision, and machine learning. He is also an active contributor to several open source software projects, and has professional programming experience in Python, C/C++, CUDA, MATLAB, and Android. Michael received a PhD in computer science from the University of California, Irvine, and an MSc in biomedical engineering and a BSc in electrical engineering from ETH Zurich, Switzerland.
Sri Manikanta Palakollu is an undergraduate student pursuing his bachelor's degree in computer science and engineering at SICET under JNTUH. He is a founder of the OpenStack Developer Community in his college.
He started his journey as a competitive programmer. He loves to solve problems related to the data science field. His interests include data science, app development, web development, cybersecurity, and technical writing. He has published many articles on data science, machine learning, programming, and cybersecurity with publications such as Hacker Noon, freeCodeCamp, Towards Data Science, and DDI.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
OpenCV 4 with Python Blueprints Second Edition
About Packt
Why subscribe?
Contributors
About the authors
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Code in Action
Download the color images
Conventions used
Get in touch
Reviews
Fun with Filters
Getting started
Planning the app
Creating a black-and-white pencil sketch
Understanding approaches for using dodging and burning techniques
Implementing a Gaussian blur with two-dimensional convolution
Applying pencil sketch transformation
Using an optimized version of a Gaussian blur
Generating a warming and cooling filter
Using color manipulation via curve shifting
Implementing a curve filter using lookup tables
Designing the warming and cooling effect
Cartoonizing an image
Using a bilateral filter for edge-aware smoothing
Detecting and emphasizing prominent edges
Combining colors and outlines to produce a cartoon
Putting it all together
Running the app
Mapping the GUI base class
Understanding the GUI constructor
Learning about a basic GUI layout
Handling video streams
Drafting a custom filter layout
Summary
Attributions
Hand Gesture Recognition Using a Kinect Depth Sensor
Getting started
Planning the app
Setting up the app
Accessing the Kinect 3D sensor
Utilizing OpenNI-compatible sensors
Running the app and main function routine
Tracking hand gestures in real time
Understanding hand region segmentation
Finding the most prominent depth of the image center region
Applying morphological closing for smoothening
Finding connected components in a segmentation mask
Performing hand shape analysis
Determining the contour of the segmented hand region
Finding the convex hull of a contour area
Finding the convexity defects of a convex hull
Performing hand gesture recognition
Distinguishing between different causes of convexity defects
Classifying hand gestures based on the number of extended fingers
Summary
Finding Objects via Feature Matching and Perspective Transforms
Getting started
Listing the tasks performed by the app
Planning the app
Setting up the app
Running the app – the main() function routine
Displaying results
Understanding the process flow
Learning feature extraction
Looking at feature detection
Detecting features in an image with SURF
Obtaining feature descriptors with SURF
Understanding feature matching
Matching features across images with FLANN
Testing the ratio for outlier removal
Visualizing feature matches
Mapping homography estimation
Warping the image
Learning feature tracking
Understanding early outlier detection and rejection
Seeing the algorithm in action
Summary
Attributions
3D Scene Reconstruction Using Structure from Motion
Getting started
Planning the app
Learning about camera calibration
Understanding the pinhole camera model
Estimating the intrinsic camera parameters
Defining the camera calibration GUI
Initializing the algorithm
Collecting image and object points
Finding the camera matrix
Setting up the app
Understanding the main routine function
Implementing the SceneReconstruction3D class
Estimating the camera motion from a pair of images
Applying point matching with rich feature descriptors
Using point matching with optic flow
Finding the camera matrices
Applying image rectification
Reconstructing the scene
Understanding 3D point cloud visualization
Learning about structure from motion
Summary
Using Computational Photography with OpenCV
Getting started
Planning the app
Understanding the 8-bit problem
Learning about RAW images
Using gamma correction
Understanding high-dynamic-range imaging
Exploring ways to vary exposure
Shutter speed
Aperture
ISO speed
Generating HDR images using multiple exposure images
Extracting exposure strength from images
Estimating the camera response function
Writing an HDR script using OpenCV
Displaying HDR images
Understanding panorama stitching
Writing script arguments and filtering images
Figuring out relative positions and the final picture size
Finding camera parameters
Creating the canvas for the panorama
Blending the images together
Improving panorama stitching
Summary
Further reading
Attributions
Tracking Visually Salient Objects
Getting started
Understanding visual saliency
Planning the app
Setting up the app
Implementing the main function 
Understanding the MultiObjectTracker class
Mapping visual saliency
Learning about Fourier analysis
Understanding the natural scene statistics
Generating a saliency map with the spectral residual approach
Detecting proto-objects in a scene
Understanding mean-shift tracking
Automatically tracking all players on a soccer field
​Learning about the OpenCV Tracking API 
Putting it all together
Summary
Dataset attribution
Learning to Recognize Traffic Signs
Getting started
Planning the app
Briefing on supervised learning concepts
The training procedure
The testing procedure
Understanding the GTSRB dataset
Parsing the dataset
Learning about dataset feature extraction
Understanding common preprocessing
Learning about grayscale features
Understanding color spaces
Using SURF descriptor
Mapping HOG descriptor
Learning about SVMs
Using SVMs for multiclass classification
Training the SVM
Testing the SVM
Accuracy
Confusion matrix
Precision
Recall
Putting it all together
Improving results with neural networks
Summary
Dataset attribution
Learning to Recognize Facial Emotions
Getting started
Planning the app
Learning about face detection
Learning about Haar-based cascade classifiers
Understanding pre-trained cascade classifiers
Using a pre-trained cascade classifier
Understanding the FaceDetector class
Detecting faces in grayscale images
Preprocessing detected faces
Detecting the eyes
Transforming the face
Collecting data
Assembling a training dataset
Running the application
Implementing the data collector GUI
Augmenting the basic layout
Processing the current frame
Storing the data
Understanding facial emotion recognition
Processing the dataset
Learning about PCA
Understanding MLPs
Understanding a perceptron
Knowing about deep architectures
Crafting an MLP for facial expression recognition
Training the MLP
Testing the MLP
Running the script
Putting it all together
Summary
Further reading
Attributions
Learning to Classify and Localize Objects
Getting started
Planning the app
Preparing an inference script
Preparing the dataset
Downloading and parsing the dataset
Creating a TensorFlow dataset 
Classifying with CNNs
Understanding CNNs
Learning about transfer learning
Preparing the pet type and breed classifier
Training and evaluating the classifier
Localizing with CNNs
Preparing the model
Understanding backpropagation
Training the model
Seeing inference in action
Summary
Dataset attribution
Learning to Detect and Track Objects
Getting started
Planning the app
Preparing the main script 
Detecting objects with SSD
Using other detectors
Understanding object detectors
The single-object detector
The sliding-window approach
Single-pass detectors
Learning about Intersection over Union
Training SSD- and YOLO-like networks 
Tracking detected objects
Implementing a Sort tracker
Understanding the Kalman filter
Using a box tracker with the Kalman filter
Converting boundary boxes to observations
Implementing a Kalman filter
Associating detections with trackers
Defining the main class of the tracker
Seeing the app in action
Summary
Profiling and Accelerating Your Apps
Accelerating with Numba
Accelerating with the CPU
Understanding Numba, CUDA, and GPU acceleration
Setting Up a Docker Container
Defining a Dockerfile
Working with a GPU
Other Books You May Enjoy
Leave a review - let other readers know what you think
The goal of this book is to get you hands-on with a wide range of intermediate to advanced projects using the latest version of the OpenCV 4 framework and the Python 3.8 language instead of only covering the core concepts of computer vision in theoretical lessons.
This updated second edition has increased the depth of the concepts we tackle with OpenCV. It will guide you through working on independent hands-on projects that focus on essential computer vision concepts such as image processing, 3D scene reconstruction, object detection, and object tracking. It will also cover, with real-life examples, statistical learning and deep neural networks.
You will begin by understanding concepts such as image filters and feature matching, as well as using custom sensors such as the Kinect depth sensor. You will also learn how to reconstruct and visualize a scene in 3D, how to align images, and how to combine multiple images into a single one. As you advance through the book, you will learn how to recognize traffic signs and emotions on faces and detect and track objects in video streams using neural networks, even if they disappear for short periods of time.
By the end of this OpenCV and Python book, you will have hands-on experience and be proficient at developing your own advanced computer vision applications according to specific business needs.Throughout the book, you will explore multiple machine learning and computer vision models such as Support Vector Machines (SVMs) and convolutional neural networks.
This book is aimed at computer vision enthusiasts in pursuit of mastering their skills by developing advanced practical applications using OpenCV and other machine learning libraries.
Basic programming skills and Python programming knowledge is assumed.
Chapter 1, Fun with Filters, explores a number of interesting image filters (such as a black-and-white pencil sketch, warming/cooling filters, and a cartoonizer effect), and we'll apply them to the video stream of a webcam in real time.
Chapter 2, Hand Gesture Recognition Using a Kinect Depth Sensor, helps you develop an app to detect and track simple hand gestures in real time using the output of a depth sensor, such as Microsoft Kinect 3D Sensor or Asus Xtion.
Chapter 3, Finding Objects via Feature Matching and Perspective Transforms, helps you develop an app to detect an arbitrary object of interest in the video stream of a webcam, even if the object is viewed from different angles or distances, or under partial occlusion.
Chapter 4, 3D Scene Reconstruction Using Structure from Motion, shows you how to reconstruct and visualize a scene in 3D by inferring its geometrical features from camera motion.
Chapter 5, Using Computational Photography with OpenCV, helps you develop command-line scripts that take images as input and produce panoramas or High Dynamic Range (HDR) images. The scripts will either align the images so that there is a pixel-to-pixel correspondence or stitch them creating a panorama, which is an interesting application of image alignment. In a panorama, the two images are not that of a plane but that of a 3D scene. In general, 3D alignment requires depth information. However, when the two images are taken by rotating the camera about its optical axis (as in the case of panoramas), we can align two images of a panorama.
Chapter 6, Tracking Visually Salient Objects, helps you develop an app to track multiple visuallysalient objects in a video sequence (such as all the players on the field during a soccer match) at once.
Chapter 7, Learning to Recognize Traffic Signs, shows you how to train a support vector machine to recognize traffic signs from the German Traffic Sign Recognition Benchmark (GTSRB) dataset.
Chapter 8, Learning to Recognize Facial Emotions, helps you develop an app that is able to both detect faces and recognize their emotional expressions in the video stream of a webcam in real time.
Chapter 9, Learning to Recognize Facial Emotions, walks you through developing an app for real-time object classification with deep convolutional neural networks. You will modify a classifier network to train on a custom dataset with custom classes. You will learn how to train a Keras model on a dataset and how to serialize and save your Keras model to a disk. You will then see how to classify new input images using your loaded Keras model. You will train a convolutional neural network using the image data you have to get a good classifier that will have very high accuracy.
Chapter 10, Learning to Detect and Track Objects, guides you as you develop an app for real-time object detection with deep neural networks, connecting it to a tracker. You will learn how object detectors work and how they are trained. You will implement a Kalman filter-based tracker, which will use object position and velocity to predict where it is likely to be. After completing this chapter, you will be able to build your own real-time object detection and tracking applications.
Appendix A, Profiling and Accelerating Your Apps, covers how to find bottlenecks in an app and achieve CPU- and CUDA-based GPU acceleration of existing code with Numba.
Appendix B, Setting Up a Docker Container, walks you through replicating the environment that we have used to run the code in this book.
All of our code use Python 3.8, which is available on a variety of operating systems, such as Windows, GNU Linux, macOS, and others. We have made an effort to use only libraries that are available on these three operating systems. We will go over the exact versions of each of the dependencies we have used, which can be installed using pip (Python's dependency management system). If you have trouble getting any of these working, we have Dockerfiles available with which we have tested all the code in this book, which we cover in Appendix B, Setting Up a Docker Container.
Here is a list of dependencies that we have used, with the chapters they were used in:
Software required
Version
Chapter number
Download links to the software
Python
3.8
All
https://www.python.org/downloads/
OpenCV
4.2
All
https://opencv.org/releases/
NumPy
1.18.1
All
http://www.scipy.org/scipylib/download.html
wxPython
4.0
1, 4, 8
http://www.wxpython.org/download.php
matplotlib
3.1
4, 5, 6, 7
http://matplotlib.org/downloads.html
SciPy
1.4
1, 10
http://www.scipy.org/scipylib/download.html
rawpy
0.14
5
https://pypi.org/project/rawpy/
ExifRead
2.1.2
5
https://pypi.org/project/ExifRead/
TensorFlow
2.0
7, 9
https://www.tensorflow.org/install
In order to run the codes, you will need a regular laptop or Personal Computer (PC). Some chapters require a webcam, which can be either an embedded laptop camera or an external one. Chapter 2, Hand Gesture Recognition Using a Kinect Depth Sensor also requires a depth sensor that can be either a Microsoft 3D Kinect sensor or any other sensor, which is supported either by the libfreenect library or OpenCV, such as ASUS Xtion.
We have tested this using Python 3.8 and Python 3.7, on Ubuntu 18.04.
If you already have Python on your computer, you can just get going with running the following on your terminal:
$ pip install -r requirements.txt
Here, requirements.txt is provided in the GitHub repository of the project, and has the following contents (which is the previously given table in a text file):
wxPython==4.0.5
numpy==1.18.1
scipy==1.4.1
matplotlib==3.1.2
requests==2.22.0
opencv-contrib-python==4.2.0.32
opencv-python==4.2.0.32
rawpy==0.14.0
ExifRead==2.1.2
tensorflow==2.0.1
Alternatively, you can follow the instructions in Appendix B, Setting Up a Docker Container, to get everything working with a Docker container.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
Support
tab.
Click on
Code Downloads
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/OpenCV-4-with-Python-Blueprints-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Code in Action videos for this book can be viewed at http://bit.ly/2xcjKdS.
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://static.packt-cdn.com/downloads/9781789801811_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "We will use argparse as we want our script to accept arguments."
A block of code is set as follows:
import argparseimport cv2import numpy as npfrom classes import CLASSES_90from sort import Sort
Any command-line input or output is written as follows:
$ python chapter8.py collect
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
The goal of this chapter is to develop a number of image processing filters and then apply them to the video stream of a webcam in real time. These filters will rely on various OpenCV functions to manipulate matrices through splitting, merging, arithmetic operations, and applying lookup tables for complex functions.
We will cover the following three effects, which will help familiarize you with OpenCV, and we will build on these effects in future chapters of this book:
Warming and cooling filters
: We will implement our own
curve filters
using a lookup table.
Black-and-white pencil sketch
: We will make use of two image-blending techniques, known as
dodging
and
burning
.
Cartoonizer
: We will combine a bilateral filter, a median filter, and adaptive thresholding.
OpenCV is an advanced toolchain. It often raises the question, that is, not how to implement something from scratch, but which precanned implementation to choose for your needs. Generating complex effects is not hard if you have a lot of computing resources to spare. The challenge usually lies in finding an approach that not only gets the job done but also gets it done in time.
Instead of teaching the basic concepts of image manipulation through theoretical lessons, we will take a practical approach and develop a single end-to-end app that integrates a number of image filtering techniques. We will apply our theoretical knowledge to arrive at a solution that not only works but also speeds up seemingly complex effects so that a laptop can produce them in real time.
In this chapter, you will learn how to do the following using OpenCV:
Creating a black-and-white pencil sketch
Applying pencil sketch transformation
Generating a warming and cooling filter
Cartoonizing an image
Putting it all together
Learning this will allow you to familiarize yourself with loading images into OpenCV and applying different transformations to those images using OpenCV. This chapter will help you learn the basics of how OpenCV operates, so we can focus on the internals of the algorithms in the following chapters.
Now, let's take a look at how to get everything up and running.
All of the code in this book is targeted for OpenCV 4.2 and has been tested on Ubuntu 18.04. Throughout this book, we will make extensive use of the NumPy package (http://www.numpy.org).
Additionally, this chapter requires the UnivariateSpline module of the SciPy package (http://www.scipy.org) and the wxPython 4.0Graphical User Interface(GUI) (http://www.wxpython.org/download.php) for cross-platform GUI applications. We will try to avoid further dependencies where possible.
For more book-level dependencies, see Appendix A, Profiling and Accelerating Your Apps, and Appendix B, Setting Up a Docker Container.
You can find the code that we present in this chapter at our GitHub repository here: https://github.com/PacktPublishing/OpenCV-4-with-Python-Blueprints-Second-Edition/tree/master/chapter1.
Let's begin by planning the application we are going to create in this chapter.
The final app must consist of the following modules and scripts:
wx_gui.py
: This module is our implementation of a basic GUI using
wxpython
. We will make extensive use of this file throughout the book. This module includes the following layouts:
wx_gui.BaseLayout
: This is a generic layout class from which more complicated layouts can be built.
chapter1.py
: This is the main script for this chapter. It contains the following functions and classes:
chapter1.FilterLayout
: This is a custom layout based on
wx_gui.BaseLayout
, which displays the camera feed and a row of radio buttons that allows the user to select from the available image filters to be applied to each frame of the camera feed.
chapter1.main
: This is the main
routine
function for starting the GUI application and accessing the webcam.
tools.py
: This is a Python module and has a lot of helper functions that we use in this chapter, which you can reuse for your projects.
The next section demonstrates how to create a black-and-white pencil sketch.
In order to obtain a pencil sketch (that is, a black-and-white drawing) of the camera frame, we will make use of two image-blending techniques, known as dodging and burning. These terms refer to techniques employed during the printing process in traditional photography; here, photographers would manipulate the exposure time of a certain area of a darkroom print in order to lighten or darken it. Dodging lightens an image, whereas burning darkens it. Areas that were not supposed to undergo changes were protected with a mask.
Today, modern image editing programs, such as Photoshop and Gimp, offer ways to mimic these effects in digital images. For example, masks are still used to mimic the effect of changing the exposure time of an image, wherein areas of a mask with relatively intense values will expose the image more, thus lightening the image. OpenCV does not offer a native function to implement these techniques; however, with a little insight and a few tricks, we will arrive at our own efficient implementation that can be used to produce a beautiful pencil sketch effect.
If you search on the internet, you might stumble upon the following common procedure to achieve a pencil sketch from an RGB (red, green, and blue) color image:
First, convert the color image to grayscale.
Then, invert the grayscale image to get a negative.
Apply a
Gaussian blur
to the negative from
step 2
.
Blend the grayscale image (from
step 1
) with the blurred negative (from
step 3
) by using
color dodge
.
Whereas steps 1 to 3 are straightforward, step 4 can be a little tricky. Let's get that one out of the way first.
The next section shows you how to implement dodging and burning in OpenCV.
A Gaussian blur is implemented by convolving the image with a kernel of Gaussian values. Two- dimensional convolution is something that is used very widely in image processing. Usually, we have a big picture (let's look at a 5 x 5 subsection of that particular image), and we have a kernel (or filter) that is another matrix of a smaller size (in our example, 3 x 3).
In order to get the convolution values, let's suppose that we want to get the value at location (2, 3). We place the kernel centered at the location (2, 3), and we calculate the pointwise product of the overlay matrix (highlighted area, in the following image (red color)) with the kernel and take the overall sum. The resulting value (that is, 158.4) is the value we write on the other matrix at the location (2, 3).
We repeat this process for all elements, and the resulting matrix (the matrix on the right) is the convolution of the kernel with the image. In the following diagram, on the left, you can see the original image with the pixel values in the boxes (values higher than 100). We also see an orange filter with values in the bottom right of each cell (a collection of 0.1 or 0.2 that sum to 1). In the matrix on the right, you see the values when the filter is applied to the image on the left:
Note that, for points on the boundaries, the kernel is not aligned with the matrix, so we have to figure out a strategy to give values for those points. There is no single good strategy that works for everything; some of the approaches are to either extend the border with zeros or extend with border values.
Let's take a look at how to transform a normal picture into a pencil sketch.
When we perceive images, our brain picks up on a number of subtle clues to infer important details about the scene. For example, in broad daylight, highlights may have a slightly yellowish tint because they are in direct sunlight, whereas shadows may appear slightly bluish because of the ambient light of the blue sky. When we view an image with such color properties, we might immediately think of a sunny day.
This effect is not a mystery to photographers, who sometimes purposely manipulate the white balance of an image to convey a certain mood. Warm colors are generally perceived as more pleasant, whereas cool colors are associated with night and drabness.
To manipulate the perceived color temperature of an image, we will implement a curve filter. These filters control how color transitions appear between different regions of an image, allowing us to subtly shift the color spectrum without adding an unnatural-looking overall tint to the image.
In the next section, we'll look at how to manipulate color using curve shifting.
Over the past few years, professional cartoonizer software has popped up all over the place. In order to achieve a basic cartoon effect, all we need is a bilateral filter and some edge detection.
The bilateral filter will reduce the color palette or the numbers of colors that are used in the image. This mimics a cartoon drawing, wherein a cartoonist typically has few colors to work with. Then, we can apply edge detection to the resulting image to generate bold silhouettes. The real challenge, however, lies in the computational cost of bilateral filters. We will, therefore, use some tricks to produce an acceptable cartoon effect in real time.
We will adhere to the following procedure to transform an RGB color image into a cartoon:
First, apply a bilateral filter to reduce the color palette of the image.
Then, convert the original color image into grayscale.
After that, apply a
median blur
to reduce image noise.
Use
adaptive thresholding
to detect and emphasize the edges in an edge mask.
Finally, combine the color image from
step 1
with the edge mask from
step 4
.
In the upcoming sections, we will learn about the previously mentioned steps in detail. First, we'll learn how to use a bilateral filter for edge-aware smoothing.
