101,99 €
Object detection, tracking and recognition in images are key problems in computer vision. This book provides the reader with a balanced treatment between the theory and practice of selected methods in these areas to make the book accessible to a range of researchers, engineers, developers and postgraduate students working in computer vision and related fields.
Key features:
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 834
Veröffentlichungsjahr: 2013
Contents
Cover
Title Page
Copyright Page
Dedication
Preface
Acknowledgements
Notations and Abbreviations
Chapter 1: Introduction
1.1 A Sample of Computer Vision
1.2 Overview of Book Contents
References
Chapter 2: Tensor Methods in Computer Vision
2.1 Abstract
2.2 Tensor – A Mathematical Object
2.3 Tensor – A Data Object
2.4 Basic Properties of Tensors
2.5 Tensor Distance Measures
2.6 Filtering of Tensor Fields
2.7 Looking into Images with the Structural Tensor
2.8 Object Representation with Tensor of Inertia and Moments
2.9 Eigendecomposition and Representation of Tensors
2.10 Tensor Invariants
2.11 Geometry of Multiple Views: The Multifocal Tensor
2.12 Multilinear Tensor Methods
2.13 Closure
References
Chapter 3: Classification Methods and Algorithms
3.1 Abstract
3.2 Classification Framework
3.3 Subspace Methods for Object Recognition
3.4 Statistical Formulation of the Object Recognition
3.5 Parametric Methods – Mixture of Gaussians
3.6 The Kalman Filter
3.7 Nonparametric Methods
3.8 The Mean Shift Method
3.9 Neural Networks
3.10 Kernels in Vision Pattern Recognition
3.11 Data Clustering
3.12 Support Vector Domain Description
3.13 Appendix – MATLAB® and other Packages for Pattern Classification
3.14 Closure
Problems and Exercises
References
Chapter 4: Object Detection and Tracking
4.1 Introduction
4.2 Direct Pixel Classification
4.3 Detection of Basic Shapes
4.4 Figure Detection
4.5 CASE STUDY – Road Signs Tracking and Recognition
4.6 CASE STUDY – Framework for Object Tracking
4.7 Pedestrian Detection
4.8 Closure
Problems and Exercises
References
Chapter 5: Object Recognition
5.1 Abstract
5.2 Recognition from Tensor Phase Histograms and Morphological Scale Space
5.3 Invariant Based Recognition
5.4 Template Based Recognition
5.5 Recognition from Deformable Models
5.6 Ensembles of Classifiers
5.7 CASE STUDY – Ensemble of Classifiers for Road Sign Recognition from Deformed Prototypes
5.8 Recognition Based on Tensor Decompositions
5.9 Eye Recognition for Driver's State Monitoring
5.10 Object Category Recognition
5.11 Closure
Problems and Exercises
References
A Appendix
A.1 Abstract
A.2 Morphological Scale-Space
A.3 Morphological Tensor Operators
A.4 Geometry of Quadratic Forms
A.5 Testing Classifiers
A.6 Code Acceleration with OpenMP
A.7 Useful MATLAB® Functions for Matrix and Tensor Processing
A.8 Short Guide to the Attached Software
A.9 Closure
Problems and Exercises
References
Index
This edition first published 2013 © 2013 John Wiley & Sons, Ltd
Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This books use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software.
Library of Congress Cataloging-in-Publication Data
Cyganek, Boguslaw. Object detection and recognition in digital images : theory and practice / Boguslaw Cyganek. pages cm Includes bibliographical references and index. ISBN 978-0-470-97637-1 (cloth) 1. Pattern recognition systems. 2. Image processing-Digital techniques. 3. Computer vision. I. Title. TK7882.P3C94 2013 621.39′94–dc23
2012050754
A catalogue record for this book is available from the British Library
ISBN: 978-0-470-97637-1
To my family with love
Preface
We live in an era of technological revolution in which developments in one domain frequently entail breakthroughs in another. Similar to the nineteenth century industrial revolution, the last decades can be termed an epoch of computer revolution. For years we have been witnessing the rapid development of microchip technologies which has resulted in a continuous growth of computational power at ever decreasing costs. This has been underpinned by the recent developments of parallel computational systems of graphics processing units and field programmable gate arrays. All these hardware achievements also open up new application areas and possibilities in the quest of making a computer see and understand what it sees – is a primary goal in the domain of computer vision. However, although fast computers are of great help in this respect, what really makes a difference are new and better processing methods and their implementations.
The book presents selected methods of object detection and recognition with special stress on statistical and – relatively new to this domain – tensor based approaches. However, the number of interesting and important methods is growing rapidly, making it difficult to offer a complete coverage of these methods in one book. Therefore the goal of this book is slightly different, namely the methods chosen here have been used by myself and my colleagues in many projects and proved to be useful in practice. Our main areas concern automotive applications in which we try to develop vision systems for road sign recognition or driver monitoring. When starting this book my main purpose was to not only give an overview of these methods, but also to provide the necessary, though concise, mathematical background. However, just as important are implementations of the discussed methods. I'm convinced that the connection of detailed theory and its implementation is a prerequisite for the in-depth understanding of the subject. In this respect the choice of the implementation platform is also not a surprise. The C++ programming language used throughout this book and in the attached software library is of worldwide industry standard. This does not mean that implementations cannot be done using different programming platforms, for which the provided code examples can be used as a guide or for direct porting. The book is accompanied by a companion website at www.wiley.com/go/cyganekobject which contains the code, color figures, as well as slides, errata and other useful links.
This book grew as a result of my fascination with modern computer vision methods and also after writing my previous book, co-authored with J. Paul Siebert and devoted mostly to the processing of 3D images. Thus, in some sense it can be seen as a continuation of our previous work, although both can be read as standalone texts.
Thus, the book can be used by all scientists and industry practitioners related to computer vision and machine pattern recognition, but can also be used as a tutorial for students interested in this rapidly developing area.
Bogusław Cyganek
Poland
Acknowledgements
Writing a book is a tremendous task. It would not be possible if not for the help of friends, colleagues, cooperators, and many other people, whose names I sometimes don't even know, but I know they did wonderful work to make this book happen.
I would particularly like to thank numerous colleagues from the AGH University of Science and Technology, as well as the Academic Computer Centre Cyfronet, in Kraków, Poland. Special thanks go to Professor Ryszard Tadeusiewicz and Professor Kazimierz Wiatr for their continuous encouragement and support.
I would also like to express my thanks to Professor Ralf Reulke from the Humboldt-Universität zu Berlin, and Deutsches Zentrum für Luft- und Raumfahrt, as well as to all the colleagues from his team, for our fruitful cooperation in interesting scientific endeavours.
I'm very grateful to the Wiley team who have helped to make this book possible. I'd like to express my special thanks to Richard Davies, Alex King, Nicky Skinner, Simone Taylor, Liz Wingett, as well as to Nur Wahidah Binte Abdul Wahid, Shubham Dixit, Caroline McPherson, and all the others whose names I don't know but I know they did a brilliant job to make this book happen. Once again – many thanks!
I'm also very grateful to many colleagues around the world, and especially readers of my previous book on 3D computer vision, for their e-mails, questions, suggestions, bug reports, and all the discussions we've had. All these helped me to develop better text and software. I also ask for your support now and in the future!
I would like to kindly express my gratitude to the National Science Centre NCN, Republic of Poland, for their financial support in scientific research projects conducted over the years 2007–2009, as well as 2011–2013 under the contract no. DEC-2011/01/B/ST6/01994, which greatly contributed to this book. I would also like to express my gratitude to the AGH University of Science and Technology Press for granting the rights to use parts of my previous publication.
Finally, I would like to thank my family: my wife Magda, my children Nadia and Kamil, as well as my mother, for their patience, support, and encouragement during all the days I worked on this book.
Notations and Abbreviations
BBase matrixCNumber of data classesCCoefficient matrixCxCorrelation matrix of a data set {xi}DData matrixDDistance functionEStatistical expectationi, j, k, m, nFree coordinates, matrix indices1nMatrix of dimensions n×n with all elements set to 1InIdentity matrix of dimensions n×nIImage; Intensity signal of an imageIx, IySpatial derivatives of an image I in the directions x, yJNumber of components in a seriesKKernel matrixLNumber of components in a vector; Dimensionality of a spaceMNumber of clusters; Number of image channelsNNumber of (data) pointsPProbability mass functionpProbability density functionP, Q, CNumbers of indices in tensors (tensor dimensions)p, qCovariant and contravariant degrees of a tensorRNumber of principal componentsSet of real numbersTensorT(k)k-th flattening mode of a tensor TCCompact structural tensorTEExtended structural tensortTime coordinateWVector spaceW*Dual vector spaceXMatrixXTTransposed matrix XXii-th matrix (from a series of matrices)x, ySpatial coordinatesxColumn vectorxii-th vector (from a series of vectors){xi}Set of vectors xi for a given range of indices ik-th column vector from a matrix XiNormalized column vectorMean vectorOrthogonal residual vectorxii-th component of the vector xΣxCovariance matrix of a data set {xi}ρNumber of bins in the histogramΔWidth of a bin in the histogramΩSet of class labelsKhatri–Rao productKronecker productElementwise multiplication (Hadamard product)Elementwise divisionOuter product of vectorsMax productMin product×Morphological outer productFor allADAnisotropic DiffusionALSAlternating Least-SquaresAMIAffine Moment InvariantsAWGAdaptive Window GrowingCANDECOMPCANonical DECOMPosition (of tensors)CIDColor Image DiscriminantCNMFConstrained NMFCPCANDECOMP / PARAFACCSTCompact Structural TensorCSTConvolution Standardized TransformCVComputer VisionCVSComputer Vision SystemDASDriver Assisting SystemDFFSDistance From Feature SpaceDIFSDistance In Feature SpaceDSPDigital Signal ProcessingDTDistance TransformEMMLExpectation Maximization Maximum LikelihoodEMDEarth Mover's DistanceESTExtended Structural TensorFCMFuzzy c-MeansFIRFar Infra-RedFNFalse NegativeFPFalse PositiveGHTGeneralized Hough TransformGLOHGradient Location and Orientation Histogram (image descriptor)GPGaussian Processes / Genetic ProgrammingGPUGraphics Processing Unit (graphics card)HDRHigh Dynamic Range (image)HIHyperspectral ImageHNNHamming Neural NetworkHOGHistogram of GradientsHOOIHigher-Order Orthogonal IterationHOSVDHigher-Order Singular Value DecompositionICAIndependent Component AnalysisIMEDImage Euclidean DistanceIPImage ProcessingISMImplicit Shape ModelISRAImage Space Reconstruction AlgorithmsKFCMKernel Fuzzy c-Meansk-NNk-Nearest-NeighborKPCAKernel Principal Component AnalysisK–RKhatri–Rao productLPLog-PolarLNLocal Neighborhood of pixelsLSHLocality-Sensitive HashingLSQELeast-Squares Problem with a Quadratic Equality ConstraintLQELinear Quadratic EstimatorMAPMaximum A Posteriori classificationMICAMultilinear ICAMLMaximum LikelihoodMNNMorphological Neural NetworkMoGMixture of GaussiansMPIMessage Passing InterfaceMRIMagnetic Resonance ImagingMSEMean Square ErrorNIRNear Infra-RedNMFNonnegative Matrix FactorizationNTFNonnegative Tensor FactorizationNUMANon-Uniform Memory AccessOC-SVMOne-Class Support Vector MachinePARAFACPARAllel FACtors (of tensors)PCAPrincipal Component AnalysisPDEPartial Differential EquationPDFProbability Density FunctionPERCLOSEPercentage of Eye ClosurePRPattern RecognitionPSNRPeak Signal to Noise RatioR1NTFRank-1 Nonnegative Tensor FactorizationRANSACRANdom SAmple ConsensusRBFRadial Basis FunctionRLARichardson–Lucy algorithmRMSERoot Mean Square ErrorROCReceiver Operating CharacteristicsROIRegion of InterestRRERelative Reconstruction ErrorSADSum of Absolute DifferencesSIMCASoft Independent Modeling of Class AnalogiesSIFTScale-Invariant Feature Transform (image descriptor)SLAMSimultaneous Localization And MappingSMOSequential Minimal OptimizationSNRSignal to Noise RatioSOMSelf-Organizing MapsSPDSalient Point DetectorSSDSum of Squared DifferencesSTStructural TensorSURFSpeeded Up Robust Feature (image descriptor)SVMSupport Vector MachineTDCSTensor Discriminant Color SpaceTIRThermal Infra-RedTNTrue NegativeTPTrue PositiveWOC-SVMWeighted One-Class Support Vector MachineWTAWinner-Takes-All1
Introduction
Look in, let not either the proper quality, or the true worth of anything pass thee, before thou hast fully apprehended it.
—MARCUS AURELIUS Meditations, 170–180 AD (Translated by Meric Casaubon, 1634)
This book presents selected object detection and recognition methods in computer vision, joining theory, implementation as well as applications. The majority of the selected methods were used in real automotive vision systems. However, two groups of methods were distinguished. The first group contains methods which are based on tensors, which in the last decade have opened new frontiers in image processing and pattern analysis. The second group of methods builds on mathematical statistics. In many cases, object detection and recognition methods draw from these two groups. As indicated in the title, equally important is the explanation of the main concepts of the methods and presentation of their mathematical derivations, as their implementations and usage in real applications. Although object detection and recognition are strictly connected, to some extent both domains can be seen as pattern classification and frequently detection precedes recognition, we make a distinction between the two. Object detection in our definition mostly concerns answering a question about whether a given type of object is present in images. Sometimes, their current appearance and position are also important. On the other hand, the goal of object recognition is to tell its particular type. For instance, we can detect a face, or after that identify a concrete person. Similarly, in the road sign recognition system for some signs, their detection unanimously reveals their category, such as “Yield.” However, for the majority of them, we first detect their characteristic shapes, then we identify their particular type, such as “40km/h speed limit, ” and so forth.
Detection and recognition of objects in the observed scenes is a natural biological ability. People and animals perform this effortlessly in daily life to move without collisions, to find food, avoid threats, and so on. However, similar computer methods and algorithms for scene analysis are not so straightforward, despite their unprecedented development. Nevertheless, biological systems after close observations and analysis provide some hints for their machine realizations. A good example here are artificial neural networks which in their diversity resemble biological systems of neurons and which – in their software realization – are frequently used by computers to recognize objects. This is how the branch of computer science, called computer vision (CV), developed. Its main objective is to make computers see as humans, or even better. Sometimes it becomes possible.
Due to technological breakthroughs, domains of object detection and recognition have changed so dynamically that preparation of even a multivolume publication on the majority of important subjects in this area seems impossible. Each month hundreds of new papers are published with new ideas, theorems, algorithms, etc. On the other hand, the fastest and most ample source of information is Internet. One can easily look up almost all subjects on a myriad of webpages, such as Wikipedia. So, nowadays the purpose of writing a book on computer vision has to be stated somewhat differently than even few years ago. The difference between an ample set of information versus knowledge and experience starts to become especially important when we face a new technological problem and our task is to solve it or design a system which will do this for us. In this case we need a way of thinking, which helps us to understand the state of nature, as well as a methodology which takes us closer to a potential solution. This book grew up in just this way, alongside my work on different projects related to object recognition in images. To be able to apply a given method we need first to understand it. At this stage not just a final formula summarizing a method, but also its detailed mathematical background, are of great use. On the other hand, bare formulas don't yet solve the problem. We need their implementations. This is the second stage, sometimes requiring more time and work than the former. One of the main goals of this book is to join the two domains on a selected set of useful methods of object detection and recognition. In this respect I hope this book will be of practical use, both for self study and also as a reference when working on a concrete problem. Nevertheless, we are not able to go through all stages of all the methods, but I hope the book will provide at least a solid start for further study and development in this fascinating and dynamically changing area.
As indicated in the title, one of my goals was to join theory and practice. My experience is that such composition leads to an in-depth understanding of the subject. This is further underpinned by case studies of mostly automotive applications of object detection and recognition. Thus, sections of this book can be grouped as follows:
Presentations of methods, their main concepts, and mathematical background.Method implementations which contain C++ code listings (sections of this type are indicated with word IMPLEMENTATION).Analysis of special applications (their names start with CASE STUDY).Apart from this we have some special entries which contain brief explanations of some mathematical concepts with examples which aim is to help in understanding the mathematical derivation in the surrounding sections.
A comment on code examples. I have always been convinced that in a book like this we should not spoil pages with an introduction to C, C++ or other basic principles of computer science, as sometimes is the case. The reasons are at least twofold: the first is that for computer science there are a lot of good books available, for which I provide the references. The second reason, is so to not divert a Reader from the main purpose of this book, which is an in-depth presentation of the modern computer vision methods and algorithms. On the other hand, Readers who are not familiar with C++ can skip detailed code explanations and focus on implementation in other platforms. However, there is no better way of learning the method than through practical testing and usage in applications.
This book is based on my experience gathered while working on many scientific projects. Results of these were published in a number of conference and journal articles. In this respect, two previous books are special. The first, An Introduction to 3D Computer Vision Techniques and Applications, written together with J. Paul Siebert, was published by Wiley in 2009 [1]. The second is my habilitation thesis [2], also issued in 2009 by the AGH University of Science and Technology Press in Kraków, Poland. Extended parts of the latter are contained in different sections of this book, permission for which was granted by the AGH University Press.
Most of all, I have always found being involved in scientific and industry projects real fun and an adventure leading to self-development. I wish the same to you.
1.1 A Sample of Computer Vision
In this section let us briefly take a look at some applications of computer vision in the systems of driver monitoring, as well as scene analysis. Both belong to the on-car Driver Assisting System aimed at facilitating driving, for example by notifying drivers of incoming road signs, and most of all by preventing car accidents, for example due to the driver falling asleep.
Figure 1.1 depicts a system of cameras mounted in a test car. The cameras can observe the driver and allow the system to monitor his or her state. Cameras can also observe the front of the car for pedestrian detection or road sign recognition, in which case they can send an image like the one presented in Figure 1.2.
Figure 1.1 System of cameras mounted in a car. The cameras can observe a driver to monitor his/her state. Cameras can also observe the front of a car for pedestrian detection or road sign recognition. Such vision modules will probably soon become standard equipment, being a part of the on-board Driver Assisting System of a car.
Figure 1.2 A traffic scene. A car-mounted computer with cameras can provide information on the road scene to help safe driving. But computer vision can also help you identify where the picture was taken.
What type of information can we draw from such an image? This depends on our goal, certainly. In the real traffic situation depicted we are mainly interested in driving the car safely, avoiding pedestrians and other vehicles in motion or parked, as well as spotting and reacting to traffic signals and signs. However, in a situation where someone sent us this image we might be interested in finding out the name of that street, for instance. What can computer vision do for us? To some extent all of the above, and soon driving a car, at least in special conditions. Let us look at some stages of processing by computer vision methods, details of which are discussed in the next chapters.
Let us first observe that even a single color image has three dimensions, as shown in Figure 1.3(a). In the case of multiple images or a video stream, dimensions grow. Thus, we need tools to analyze such structures. As we will see, tensors offer new possibilities in this respect. Also, their recently developed decompositions allow insight into information contained in such multidimensional structures, as well as their compression or extraction of features for further classification. Much research into computer vision and pattern recognition is on feature detection and their properties. In this respect such transformations are investigated which change the original intensity or color pixels into some new representation which provides some knowledge about image contents or is more appropriate for finding specific objects. An example of an application of the structural tensor to image in Figure 1.2 for detection of areas with strong local structures is shown in Figure 1.3(b). Found structures are encoded with color – their orientation is represented by different colors, whereas strength is by color saturation. Let us observe that areas with no prominent structures show no response of this filter – in Figure 1.3(b) they are simply black. As will be shown, such representation proves very useful in finding specific figures in images, such as pedestrians, cars, or road signs, and so forth.
Figure 1.3 A color image can be seen as a 3D structure (a). Internal properties of such multidimensional signals can be analyzed with tensors. Local structures can be detected with the structural tensor (b). Here different colors encode orientations of areas with strong signal variations, such as edges. Areas with weak texture are in black. These features can be used to detect pedestrians, cars, road signs and other objects.
Let us now briefly show the possible steps that lead to detection of road signs in the image in Figure 1.2. In this method signs are first detected with fast segmentation by specific colors characteristic to different groups of expected signs. For instance, red color segmentation is used to spot all-red objects, among which could also be the red rims of the prohibitive signs, and so on for all colors of interest.
Figure 1.4 shows binary maps obtained of the image in Figure 1.2 after red and blue segmentations, respectively. There are many segmentation methods which are discussed in this book. In this case we used manually gathered color samples which were used to train the support vector classifiers.
Figure 1.4 Segmentation of image in Figure 1.2. Red (a), blue color segmentation (b).
From the maps in Figure 1.4 we need to find a way of selecting objects whose shape and size potentially correspond to the road signs we are looking for. This is done by specific methods which rely on detection of salient points, as well as on fuzzy logic rules which define the potential shape and size of the candidate objects.
Figure 1.5 shows the detected areas of the signs. These now need to be fed to the next classifier which will provide a final response, first if we are really observing a sign and not for instance a traffic light, and then what the type of particular sign it is. However, observed signs can be of any size and can also be rotated. Classifiers which can cope with such patterns are for instance the cooperating groups of neural networks or the decomposition of tensors of deformed prototypes. Both of the aforementioned classifiers respond with the correct type of signs visible in Figure 1.2. These, as well as many other methods of object detection and recognition, are discussed in this book.
Figure 1.5 Circular signs are found by outlining all red objects detected in the scene. Then only those which fulfill the definition and relative size expected for a sign are left (a). Triangular and rectangular shapes are found based on their corner points. The points are checked for all possible rectangles and again only those which comply with fuzzy rules defining sought figures are left (b).
Figure 1.6 Organization of the book.
1.2 Overview of Book Contents
Organizing a book is not straightforward due to many the interrelations between the topics discussed. Such relations are not linear, and in this respect electronic texts with inner links show many benefits. The printed version has its own features. On the one hand, the book can be read linearly, from the beginning to its end. On the other, selected topics can be read independently, especially when looking for a specific method or its implementation. The book is organized into six chapters, starting with the Introduction.
Chapter 2 is entirely devoted to different aspects of tensor methods applied to numerous tasks of computer vision and pattern recognition. We start with basic explanations of what tensors are, as well as their different definitions. Then basic properties of tensors, and especially their distances, are discussed. The next section provides some information on filtering of tensor data. Then structural tensor is discussed, which proves very useful in many different tasks and different types of images. A further important topic is tensor of inertia, as well as statistical moments, which can be used at different stages of object detection and recognition. Eigendecomposition of tensors, as well as their invariants, are discussed next. A separate topic are multi-focal tensors which are used to represent relations among corresponding points in multiple views of the same scene.
The second part of Chapter 2 is devoted to multilinear methods. First the most important concepts are discussed, such as k-mode product, tensor flattening, as well as different ranks of tensors. These are followed by the three main important tensor decompositions, namely Higher Order Singular Value Decomposition, best rank-1, as well as best rank-(R1, …, RP) where R1 to RP represent desired ranks of each of the P dimension of the tensor. The chapter ends with a discussion of subspace data representation, as well as nonnegative decompositions of tensors.
Chapter 3 presents an overview of classification methods. We start with a presentation of subspace methods with one of the most important data representation methods – Principal Component Analysis. The majority of the methods have their roots in mathematical statistics, so the next chapters present a concise introduction to the statistical framework of object recognition. Not surprisingly the key concept here is the Bayes theorem. Then we discuss the parametric methods as well as the Kalman filter, frequently used in tracking systems but whose applications reach far beyond this. A discussion on the nonparametric follows, starting with simple, but surprisingly useful, histogram methods. Then the Parzen approach is discussed with its connections to nearest-neighbor methods. Mean shift methods are discussed in the consecutive parts of Chapter 3. Then the probabilistic, Hamming, as well as morphological neural networks are presented.
A separate topic within Chapter 3 concerns kernel processing. These are important novel classification methods which rely on smart data transformation into a higher dimensional space in which linear classification is possible. From this group come Support Vector Machines, one of the most important types of data classifier.
The last part of Chapter 3 is devoted to the family of k-means data clustering methods which find broad application in many areas of data processing. They are used in many of the discussed applications, for which special attention to ensembles of classifiers is deserved, such as the one discussed at the end of Chapter 3.
Chapter 4 deals with object detection and tracking. It starts with a discussion on the various methods of direct pixel classification, used mostly for fast image segmentation, as shown with the help of two applications. Methods of detection of basic shapes and figures follow. These are discussed mostly in the context of automotive applications. Chapter 4 ends with a brief overview of the recent methods of pedestrian detection.
Object recognition is discussed in Chapter 5. We start with recognition methods that are based on analysis of phase histograms of objects which come from the structural tensor. Discussion on scale-space template matching in the log-polar domain follows. This technique has found many applications in CV. From these, two are discussed. Two very important topics are discussed next. The first is the idea of object recognition in the domain of deformable prototypes. The second concerns ensembles of classifiers. As was shown, these show superior results even compared to very sophisticated but single classifiers.
Chapter 5 concludes with a presentation of the road sign classification systems based on ensembles of classifiers and deformable patterns, but realized in two different ways. The first employs Hamming neural networks. The second is based on decomposition of a tensor of deformable prototype patterns. The latter is also shown in the context of handwritten digit recognition.
A very specific topic discussed at the end of Chapter 5 is eye recognition, used for monitoring the driver's state to prevent dangerous situations arising from the driver falling asleep. Chapter 5 concludes with a discussion on the recent methods of object category recognition.
Appendix A discusses a number of auxiliary topics. It starts with a presentation of the morphological scale-space. Then a domain of morphological tensors operators is briefly discussed. Next, the geometry of quadratic forms is provided. Then the problem of testing classifiers is discussed. This section gathers different approaches to classifier testing, as well as containing a list of frequent parameters and measures used to assess classifiers. The rest of Appendix A briefly presents the OpenMP library used to convert serial codes into functionally corresponding but concurrent versions. In the last section some useful MATLAB® functions for matrix and tensor processing are presented.
As already mentioned, the majority of the presented topics are accompanied by their full C++ implementations. Their main parts are also discussed in the book. The full implementation in the form of a software library can be downloaded from the book webpage [3]. This webpage also contains some additional materials, such as the manual to the software platform, color images, and other useful links.
Last but not least, I will be very grateful to hear your opinion of the book.
References
[1] Cyganek B., Siebert J.P.: An Introduction to 3D Computer Vision Techniques and Algorithms, Wiley, 2009.
[2] Cyganek B.: Methods and Algorithms of Object Recognition in Digital Images. Habilitation Thesis. AGH University of Science and Technology Press, 2009.
[3] http://www.wiley.com/go/cyganekobject
2
Tensor Methods in Computer Vision
2.1 Abstract
This chapter gathers different computer vision techniques which make use of tensors, as well as their decomposition and analysis. As will be shown, the discussed methods have found application in many methods for object detection and recognition in images. Although tensors have been known in mathematics for over a hundred years, their application in computer vision (CV) and pattern recognition (PR) has been a matter of the last two decades. The real power of tensor processing in these areas comes from their natural ability to represent the multidimensional nature of processed data well.
Based on the fundamental sampling theorem, continuous signals when sampled with sufficient frequency can be unambiguously represented by their discrete samples [1, 2]. This fundamental property transforms physical measurements with the world of computer processing, since digital signals are just data in computer memory. As will be shown, tensors are the right tools for processing a variety of digital signals, such as sound, vision, seismic, medical electroencephalogram (EEG), as well as magnetic resonance imaging (MRI), which opens vast possibilities in medical diagnosis. In MRI, for instance, it is assumed that the motion of water molecules in tissues can be approximated by a Brownian motion in the voxels of the image. However, the Brownian motion is entirely described by a symmetric and positive definite matrix, called the diffusion tensor. Processing and visualization of diffusion tensors is one of the most rapidly growing domains, joining mathematics, physics, medicine, and computer vision.
The goal of this chapter is to present different areas of CV and PR which can be well represented and analyzed with tensors. We start with definitions of tensors, as well as basic properties of tensors. The two most pronounced characteristics of tensors are their transformation rules with respect to changes of the coordinate systems. The other is their multidimensionality, which makes them the right tool to process data which depend on many factors, as will be discussed. We present the structural tensor and its variants, as well as the tensor of inertia. The former is based on signal differentiation, whereas the latter is related to the statistical moments computed from the signal. Both are useful to represent local areas, as well as whole objects, in the images. We also discuss methods of filtering of tensor data, as well as their eigendecomposition and invariants. Tensors are also the right tool to represent mutual relations between features of real objects imaged in multiple views. The next part of this chapter is devoted to the second aspect of tensors – their ability to represent and analyze multidimensional data. Presented are the most important tensor decompositions, the Higher-Order Singular Value Decomposition (HOSVD), best rank-1, as well as best rank-(, …, ), where to are the desired ranks of each of the dimensions of the tensor. Finally, the nonnegative matrix and tensor factorizations, as well as the subspace data representation, are discussed.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
