107,99 €
While the field of computer vision drives many of today's digital technologies and communication networks, the topic of color has emerged only recently in most computer vision applications. One of the most extensive works to date on color in computer vision, this book provides a complete set of tools for working with color in the field of image understanding. Based on the authors' intense collaboration for more than a decade and drawing on the latest thinking in the field of computer science, the book integrates topics from color science and computer vision, clearly linking theories, techniques, machine learning, and applications. The fundamental basics, sample applications, and downloadable versions of the software and data sets are also included. Clear, thorough, and practical, Color in Computer Vision explains: * Computer vision, including color-driven algorithms and quantitative results of various state-of-the-art methods * Color science topics such as color systems, color reflection mechanisms, color invariance, and color constancy * Digital image processing, including edge detection, feature extraction, image segmentation, and image transformations * Signal processing techniques for the development of both image processing and machine learning * Robotics and artificial intelligence, including such topics as supervised learning and classifiers for object and scene categorization Researchers and professionals in computer science, computer vision, color science, electrical engineering, and signal processing will learn how to implement color in computer vision applications and gain insight into future developments in this dynamic and expanding field.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 537
Veröffentlichungsjahr: 2012
Table of Contents
Copyright
Series
Title Page
Preface
Chapter 1: Introduction
1.1 From Fundamental to Applied
1.2 Part I: Color Fundamentals
1.3 Part II: Photometric Invariance
1.4 Part III: Color Constancy
1.5 Part IV: Color Feature Extraction
1.6 Part V: Applications
1.7 Summary
Part I: Color Fundamentals
Chapter 2: Color Vision
2.1 Introduction
2.2 Stages of Color Information Processing
2.3 Chromatic Properties of the Visual System
2.4 Summary
Chapter 3: Color Image Formation
3.1 Lambertian Reflection Model
3.2 Dichromatic Reflection Model
3.3 Kubelka–Munk Model
3.4 The Diagonal Model
3.5 Color Spaces
3.6 Summary
Part II: Photometric Invariance
Chapter 4: Pixel-Based Photometric Invariance
4.1 Normalized Color Spaces
4.2 Opponent Color Spaces
4.3 The HSV Color Space
4.4 Composed Color Spaces
4.5 Noise Stability and Histogram Construction
4.6 Application: Color-Based Object Recognition
4.7 Summary
Chapter 5: Photometric Invariance from Color Ratios
5.1 Illuminant Invariant Color Ratios
5.2 Illuminant Invariant Edge Detection
5.3 Blur-Robust and Color Constant Image Description
5.4 Application: Image Retrieval Based on Color Ratios
5.5 Summary
Chapter 6: Derivative-Based Photometric Invariance
6.1 Full Photometric Invariants
6.2 Quasi-Invariants
6.3 Summary
Chapter 7: Photometric Invariance by Machine Learning
7.1 Learning from Diversified Ensembles
7.2 Temporal Ensemble Learning
7.3 Learning Color Invariants for Region Detection
7.4 Experiments
7.5 Summary
Part III: Color Constancy
Chapter 8: Illuminant Estimation and Chromatic Adaptation
8.1 Illuminant Estimation
8.2 Chromatic Adaptation
Chapter 9: Color Constancy Using Low-level Features
9.1 General Gray-World
9.2 Gray-Edge
9.3 Physics-Based Methods
9.4 Summary
Chapter 10: Color Constancy Using Gamut-Based Methods
10.1 Gamut Mapping Using Derivative Structures
10.2 Combination of Gamut Mapping Algorithms
10.3 Summary
Chapter 11: Color Constancy Using Machine Learning
11.1 Probabilistic Approaches
11.2 Combination Using Output Statistics
11.3 Combination Using Natural Image Statistics
11.4 Methods Using Semantic Information
11.5 Summary
Chapter 12: Evaluation of Color Constancy Methods
12.1 Data Sets
12.2 Performance Measures
12.3 Experiments
12.4 Summary
Part IV: Color Feature Extraction
Chapter 13: Color Feature Detection
13.1 The Color Tensor
13.2 Color Saliency
13.3 Conclusions
Chapter 14: Color Feature Description
14.1 Gaussian Derivative-Based Descriptors
14.2 Discriminative Power
14.3 Level of Invariance
14.4 Information Content
14.5 Summary
Chapter 15: Color Image Segmentation
15.1 Color Gabor Filtering
15.2 Invariant Gabor Filters Under Lambertian Reflection
15.3 Color-Based Texture Segmentation
15.4 Material Recognition Using Invariant Anisotropic Filtering
15.5 Color Invariant Codebooks and Material-Specific Adaptation
15.6 Experiments
15.7 Image Segmentation by Delaunay Triangulation
15.8 Summary
Part V: Applications
Chapter 16: Object and Scene Recognition
16.1 Diagonal Model
16.2 Color SIFT Descriptors
16.3 Object and Scene Recognition
16.4 Results
16.5 Summary
Chapter 17: Color Naming
17.1 Basic Color Terms
17.2 Color Names from Calibrated Data
17.3 Color Names from Uncalibrated Data
17.4 Experimental Results
17.5 Conclusions
Chapter 18: Segmentation of Multispectral Images
18.1 Reflection and Camera Models
18.2 Photometric Invariant Distance Measures
18.3 Error Propagation
18.4 Photometric Invariant Region Detection by Clustering
18.5 Experiments
18.6 Summary
Citation Guidelines
References
Index
Cover image: Jasper van Turnhout
Cover design: Michael Rutkowski
Copyright © 2012 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Color in computer vision : fundamentals and applications / Theo Gevers … [et al.].
p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-89084-4 (pbk.)
1. Computer vision. 2. Color vision. 3. Color photography. I. Gevers,
Theo.
TA1634.C637 2012
006.3′7—dc23
2012000650
Preface
Visual information is our most natural source of information and communication. Apart from human vision, visual information plays a vital and indispensable role in society and is the nucleus of current communication frameworks such as the World Wide Web and mobile phones. With the ever-growing production, use, and exploitation of digital visual information (e.g., documents, websites, images, videos, and movies), a visual overflow will occur, and hence demands are urgent for the (automatic) understanding of visual information. Moreover, as digital visual information is nowadays available in color format, there is the irreversible necessity for the understanding of visual color information. Computer vision deals with the understanding of visual information. Although color became a central topic in various disciplines (ranging from mathematics and physics to the humanities and art) quite early on, in the field of computer vision it has emerged only recently. We take on the challenge of providing a substantial set of tools for image understanding from a color perspective. The central topic of this book is to present color theories, representation models, and computational methods that are essential for image understanding in the field of computer vision.
The idea to make this book was born when the authors were sitting on a terrace overlooking the Amstel River. The rich artistic history of Amsterdam, the river, and that sunny day gave us the inspiration for discussing the role of color in art, in life, and eventually in computer vision. There, we decided to do something about the lack of textbooks on color in computer vision. We agreed that the most productive and pleasant way to reflect our findings on this topic was to write this book together. A book in which color is taken as a valuable collaborative source of synergy between two research fields: color science and computer vision. The book is the result of more than 10 years of research experience of all four authors who worked closely together (as PhDs, postdocs, professors, colleagues, and eventually friends) on the same topic of color computer vision at the University of Amsterdam. Because of this long-term collaboration among the authors, our research on color computer vision is a tight connection of color theories, color image processing methods, machine learning, and applications in the field of computer vision, such as image segmentation, understanding, and search. Even though many of the chapters in the book have their origin as a journal article, we ascertained that our work is rewritten and trimmed down. This process, the long-term collaboration, and many discussions resulted in a book in which a uniform style has emerged and in which the material represents the best of us.
The book is a valuable textbook for graduate students, researchers, and professionals in the field of computer vision, computer science, color, and engineering. The book covers upper-level undergraduate and graduate courses and can also be used in more advanced courses such as postgraduate tutorials. It is a good reference for anyone, including those in industry, interested in the topic of color and computer vision. A prerequisite is a basic knowledge of image processing and computer vision. Further, a general background in mathematics is required, such as linear algebra, calculus, and probability theory. Some of the material in this book has been presented as part of graduate and postgraduate courses at the University of Amsterdam. Also, part of the material has been presented at conference tutorials and short courses at image processing conferences (International Conference on Image Processing (ICIP) and International Conference on Pattern Recognition (ICPR)), computer vision conferences (Computer Vision and Pattern Recognition (CVPR) and the International Conference on Computer Vision (ICCV)), and color conferences (Colour in Graphics, Imaging, and Vision (CGIV) and conferences organized by the International Society for Optics and Photonics (SPIE)). Computer vision contains more topics than what we have presented in this book. The emphasis is on image understanding. However, the topic of image understanding has been taken as the path along which we were able to present our work. Although the material represents our view on color in computer vision, our sincere intention was to include all relevant research. Therefore, we believe this book is one of the first extensive works on color in computer vision to be published with over 360 citations.
This book consists of five parts. The topics range from (low-level) color image formation to (intermediate-level) color invariant feature extraction and color image processing to (high-level) semantic descriptors for object and scene recognition. The topics are treated from low-level to high-level processing and from fundamental to more applied research. Part I contains the (color) fundamentals of the book. This part presents the concept of trichromatic color processing and the similarity between human and computer vision systems. Furthermore, the basics are provided on the color image formation. Reflection models that describe the imaging process, the interplay between light and matter, and how photometric conditions influence the RGB values in an image are presented. In Part II, we consider the research area of extracting color invariant information. We build detailed models of the color image formation process and design mathematical methods to infer the quantities of interest. Pixel-based and derivative-based photometric invariance are discussed. An overview is given on the computation of both photometric invariance and differential information. Part III contains an overview on color constancy. Computational methods are presented to estimate the illumination. An evaluation of color constancy methods is given on large-scale datasets. The problem of how to select and combine different methods is addressed. A statistical approach is taken to quantify the priors of unknowns in noisy data to infer the best possible estimate of the illumination from the visual scene. Feature detection and color descriptors are discussed in Part IV. Color image processing tools are provided. An algebraic (vector-based) approach is taken to extend scalar-signal to vector-signal processing. Computational methods are introduced to extract a variety of local image features, such as circle detectors, curvature estimation, and optical flow. Finally, in Part V, different applications are presented, such as image segmentation, object recognition, color naming, and image retrieval.
This book comes with a large amount of supplementary material, which can be found at
http://www.colorincomputervision.com
Here you can find
Software implementations of many of the methods presented in the book.
Datasets and pointers to public image datasets.
Slides corresponding to the material covered in the book.
Slides of new material presented at tutorials at conferences.
Pointers to workshops and conferences.
Discussions on current developments, including latest publications.
Our policy is to make our software and datasets available as a contribution to the research community. Also, in case you want to share your software or dataset, please drop us a line so we can add a pointer to it on our website. If you have any suggestions for improving the book, please send us an e-mail. We want to keep the book accurate as much as possible.
Finally, we thank all the people who have worked with us over the years and shared their passion for research and color with us.
Arnold Smeulders at the University of Amsterdam is one of the best researchers we had the opportunity to work with. He was heading the group during the time we paved the way for this book. His insatiable passion for research and lively debates have been a source of inspiration to all of us. We enjoyed working with him.
We are very grateful to Marcel Lucassen who contributed Chapter 2 to this book. Furthermore, his thorough proofreading and enthusiasm were indispensable for the quality of the book. It is a fortune to have him as a human (color) vision scientist amidst us. It was certainly a pleasure to work with him. We are indebted to Jan van Gemert for his proofreading and Frank Aldershoff for LaTeX and Mathematica issues.
We are also grateful to NWO (Dutch Organisation for Scientific Research), who granted Theo Gevers with a VICI (#639.023.705) with the same title of this book “Color in Computer Vision” and Jan-Mark Geusebroek with a VENI. These grants were valuable for this book.
While working at the University of Amsterdam, we had the opportunity to collaborate with many wonderful colleagues. We want to thank Arnold Smeulders for his work on Chapters 6 and 13, Rein van de Boomgaard for Chapter 6, Gertjan Burghouts for Chapters 14 and 115, Koen van de Sande and Cees Snoek for their help on Chapter 16, and Harro Stokman for Chapter 18. Furthermore, we thank the following persons: Virginie Mes, Roberto Valenti, Marcel Worring, Dennis Koelma, and all other members of the ISIS group.
At the Computer Vision Center (Universitat Autònoma de Barcelona), we thank José Álvarez and Antonio López for their contribution to Chapter 7. Further, we are indebted to Robert Benavente, Maria Vanrell, and Ramon Baldrich for their contribution to Chapter 17. At the LEAR team in INRIA rh Ône Alpes, France, we thank Cordelia Schmid, Jakob Verbeek, and Diane Larlus for their help with Chapters 5 and 17. We also appreciate the contribution of Andrew Bagdanov at the Media Integration and Communication Center in Florence, Italy. Furthermore, Joost van de Weijer acknowledges the support of the Spanish Ministry of Science and Innovation in Madrid, Spain, in particular for funding the Consolider MIPRCV project and for providing him with the Ramon y Cajal Fellowship.
Lastly, we will always remember that this book would not have been possible without our families and loved ones whose energy and love inspired us to make our work colorful and worthwhile.
October 2011
Amsterdam, The Netherlands
Theo Gevers, Arjan Gijsenij, Joost van de Weijer and Jan-Mark Geusebroek
Chapter 1: Introduction
Color is one of the most important and fascinating aspects of the world surrounding us. To comprehend the broad characteristics of color, a range of research fields has been actively involved, including physics (light and reflectance modeling), biology (visual system), physiology (perception), linguistics (cultural meaning of color), and art.
From a historical perspective, covering more than 400 years, prominent researchers contributed to our present understanding of light and color. Snell and Descartes (1620–1630) formulated the law of light refraction. Newton (1666) discovered various theories on light spectrum, colors, and optics. The perception of color and the influence on humans has been studied by Goethe in his famous book “Farbenlehre” (1840). Young and Helmholtz (1850) proposed the trichromatic theory of color vision. Work on light and color resulted in quantum mechanics elaborated by Max Planck, Albert Einstein, and Niels Bohr. In art (industrial design), Albert Munsell (1905) invented the theory on color ordering in his “A Color Notation.” Further, the value of the biological and therapeutic effects of light and color have been analyzed, and views on color from folklore, philosophy, and language have been articulated by Schopenhauer, Hegel, and Wittgenstein.
Over the last decades, with the technological advances of printers, displays, and digital cameras, an explosive growth in the diversity of needs in the field of color computer vision has been witnessed. More and more, the traditional gray value imaginary is replaced by color systems. Moreover, today, with the growth and popularity of the World Wide Web, a tremendous amount of visual information, such as images and videos, has become available. Hence, nowadays, all visual data is available in color. Furthermore, (automatic) image understanding is becoming indispensable to handle large amount of visual data. Computer vision deals with image understanding and search technology for the management of large-scale pictorial datasets. However, in computer vision, the use of color has been only partly explored so far.
This book determines the use of color in computer vision. We take on the challenge of providing a substantial set of color theories, computational methods, and representations, as well as data structures for image understanding in the field of computer vision. Invariant and color constant feature sets are presented. Computational methods are given for image analysis, segmentation, and object recognition. The feature sets are analyzed with respect to their robustness to noise (e.g., camera noise, occlusion, fragmentation, and color trustworthiness), expressiveness, discriminative power, and compactness (efficiency) to allow for fast visual understanding. The focus is on deriving semantically rich color indices for image understanding. Theoretical models are presented to express semantics from both a physical and a perceptual point of view.
The aim of this book is to present color theories and techniques for image understanding from (low level) basic color image formation to (intermediate level) color invariant feature extraction and color image processing to (high level) learning of object and scene recognition by semantic detectors. The topics, and corresponding chapters, are organized from low level to high level processing and from fundamental to more applied research. Moreover, each topic is driven by a different research area using color as an important stand-alone research topic and as a valuable collaborative source of information bridging the gap between different research fields (Fig. 1.1).
Figure 1.1 The different topics are organized from low level to high level processing and from fundamental to more applied research. Each topic is driven by a different research area from human perception, physics, and mathematics to machine learning.
The book starts with the explanation of the mechanisms of human color perception. Understanding the human visual pathway is crucial for computer vision systems, which aim to describe color information in such a way that it is relevant to humans.
Then, physical aspects of color are studied, resulting in reflection models from which photometric invariance is derived. Photometric invariance is important for computer vision, as it results in color measurements that are independent of accidental imaging conditions such as a change in camera viewpoint or a variation in the illumination.
A mathematical perspective is taken to cope with the difference between gray value (scalar) and color (vector) information processing, that is, the extension of single-channel signal to multichannel signal processing. This mathematical approach will result in a sound way to perform color processing to obtain (low level) computational methods for (local) feature computation (e.g., color derivatives), descriptors (e.g., SIFT), and image segmentation. Furthermore, based on both mathematical and physical fundamentals, color image feature extraction is presented by integrating differential operators and color invariance.
Finally, color is studied in the context of machine learning. Important topics are color constancy, photometric invariance by learning, and color naming in the context of object recognition and video retrieval. On the basis of the multichannel approach and color invariants, computational methods are presented to extract salient image patches. From these salient image patches, color descriptors are computed. These descriptors are used as input for various machine learning methods for object recognition and image classification.
The book consists of five parts, which are discussed next.
The observed color of an object depends on a complex set of imaging conditions. Because of the similarity in trichromatic color processing between humans and computer vision systems, in Chapter 2, an outline on human color vision is provided. The different stages of color information processing along the human visual pathway are presented. Further, important chromatic properties of the visual system are discussed such as chromatic adaptation and color constancy. Then, to provide insights in the imaging process, in Chapter 3, the basics on color image formation are presented. Reflection models are introduced describing the imaging process and how photometric changes, such as shadows and specularities, influence the RGB values in an image. Additionally, a set of relevant color spaces are enumerated.
In computer vision, invariant descriptions for image understanding are relatively new but quickly gaining ground. The aim of photometric invariant features is to compute image properties of objects irrespective of their recording conditions. This comes, in general, at the loss of some discriminative power. To arrive at invariant features, the imaging process should be taken into account.
In Chapters 4–6, the aim is to extract color invariant information derived from the physical nature of objects in color images using reflection models. Reflection models are presented to model dull and gloss materials, as well as shadows, shading, and specularities. In this way, object characteristics can be derived (based on color/texture statistics) for the purpose of image understanding. Physical aspects are investigated to model and analyze object characteristics (color and texture) under different viewing and illumination conditions. The degree of invariance should be tailored to the recording circumstances. In general, a color model with a very wide class of invariance loses the power to discriminate among object differences. Therefore, in Chapter 6, the aim is to select the tightest set of invariants suited for the expected set of nonconstant conditions.
As discussed in Chapter 4, most of the methods to derive photometric invariance are using 0th order photometric information, that is, pixel values. The effect of the reflection models on higher-order- or differential-based algorithms remained unexplored for a long time. The drawbacks of the photometric invariant theory (i.e., the loss of discriminative power and deterioration of noise characteristics) are inherited by the differential operations. To improve the performance of differential-based algorithms, the stability of photometric invariants can be increased through the noise propagation analysis of the invariants. In Chapters 5 and 6, an overview is given on how to advance the computation of both photometric invariance and differential information in a principled way.
While physical-based reflection models are valid for many different materials, it is often difficult to model the reflection of complex materials (e.g., with nonperfect Lambertian or dielectrical surfaces) such as human skin, cars, and road decks. Therefore, in Chapter 7, we also present techniques to estimate photometric invariance by machine learning models. On the basis of these models, computational methods are studied to derive the (in)sensitivity of transformed color channels to photometric effects obtained from a set of training samples.
Differences in illumination cause measurements of object colors to be biased toward the color of the light source. Humans have the ability of color constancy; they tend to perceive stable object colors despite large differences in illumination. A similar color constancy capability is necessary for various computer vision applications such as image segmentation, object recognition, and scene classification.
In Chapters 8–10, an overview is given on computational color constancy. Many state-of-the-art methods are tested on different (freely) available datasets. As color constancy is an underconstrained problem, color constancy algorithms are based on specific imaging assumptions. These assumptions include the set of possible light sources, the spatial and spectral characteristics of scenes, or other assumptions (e.g., the presence of a white patch in the image or that the averaged color is gray). As a consequence, no algorithm can be considered as universal. With the large variety of available methods, the inevitable question, that is, how to select the method that induces the equivalence class for a certain imaging setting, arises. Furthermore, the subsequent question is how to combine the different algorithms in a proper way. In Chapter 11, the problem of how to select and combine different methods is addressed. An evaluation of color constancy methods is given in Chapter 12.
We present how to extend luminance-based algorithms to the color domain. One requirement is that image processing methods do not introduce new chromaticities. A second implication is that for differential-based algorithms, the derivatives of the separate channels should be combined without loss of derivative information. Therefore, the implications on the multichannel theory are investigated, and algorithmic extensions for luminance-based feature detectors such as edge, curvature, and circular detectors are given. Finally, the photometric invariance theory described in earlier parts of the book is applied to feature extraction.
The aim is to take an algebraic (vector based) approach to extend scalar-signal to vector-signal processing. However, a vector-based approach is accompanied by several mathematical obstacles. Simply applying existing luminance-based operators on the separate color channels, and subsequently combining them, will fail because of undesired artifacts.
As a solution to the opposing vector problem, for the computation of the color gradient, the color tensor (structure tensor) is presented. In Chapter 13, we give a review on color-tensor-based techniques on how to combine derivatives to compute local structures in color images in a principled way. Adaptations of the tensor lead to a variety of local image features, such as circle detectors, curvature estimation, and optical flow.
Although color is important to express saliency, the explicit incorporation of color distinctiveness into the design of image feature detectors has been largely ignored. To this end, we give an overview on how color distinctiveness can be explicitly incorporated in the design of color (invariant) representations and feature detectors. The approach is based on the analysis of the statistics of color derivatives. Furthermore, we present color descriptors for the purpose of object recognition. Object recognition aims to detect high level semantic information present in images and videos. The approach is based on salient visual features and using machine learning to build concept detectors from annotated examples. The choice of features and machine learning algorithms is of great influence on the accuracy of the concept detector. Features based on interest regions, also known as local features, consist of an interest region detector and a region descriptor. In contrast to the use of intensity information only, we will present both interest point detection (Chapter 13) and region description (Chapter 14), see Figure 1.2.
Figure 1.2 Visual exploration is based on the paradigm to divide the images into meaningful parts from which features are computed. Salient point detection is applied first from which color descriptors are computed. Then, machine learning is applied to provide classifiers for object recognition.
In computer vision, texture is considered as all what is left after color and local shape have been considered or it is given in terms of structure and randomness. Many common textures are composed of small textons usually too large in number to be perceived as isolated objects. In Chapter 15, we give an overview on powerful features based on natural image statistics or general principles from surface physics in order to classify a large number of materials by their texture. On the basis of their textural nature, different materials and concepts containing certain types of material can be identified (Fig. 1.3). For features at the level of (entire) objects, the aim is to aggregate pieces of local visual information to characteristic geographical arrangements of (possibly missing) parts. The objective is to find computational models to combine individual observations of an object's appearance under the large number of variations in that appearance.
Figure 1.3 On the basis of their textural nature, different materials and concepts containing certain types of material can be identified.
In the final part of the book, we emphasize on the importance of color in several computer vision applications.
In Chapter 16, we follow the state-of-the-art object recognition paradigm consisting of a learning phase and a (runtime) classification phase (Fig. 1.4). The learning module consists of color feature extraction and supervised learning strategies. Color descriptors are computed at salient points in the image by different point detectors (Fig. 1.2). The learning part is executed offline. The runtime classification part takes an image or video as an input from which features are extracted. Then, the classification scheme will provide a probability to what class of concepts the query image/video belongs to (people, mountain, or cars). A concept is defined as a material (e.g., grass, brick, or sand, as illustrated in Fig. 1.3a) or as an object (e.g., car, bike, or person, as illustrated in Fig. 1.3b), an event (explosion, crash, etc.), or a scene (e.g., mountain, beach, or city), see Figure 1.5.
Figure 1.4 First, during training, features are extracted and objects/scenes are learned offline by giving examples of different concepts (e.g., people, buildings, mountains) as the input to a learning system (in this case pictures containing people). Then, during online recognition, features are extracted from the incoming image/video and provided to the classification system to result in a probability of being one of the concepts.
Figure 1.5 TRECVID concepts and corresponding key frames.
Color names are linguistic labels that humans attach to colors. We use them routinely and seemingly without effort to describe the world around us. They have been primarily studied in the fields of visual psychology, anthropology, and linguistics. One of the most influential works in color naming is the linguistic study of Berlin and Kay on basic color terms. In Chapter 17, color names are presented in the context of image retrieval. This allows for searching objects in images by a certain color name.
Finally, in Chapter 18, we give an overview on multispectral imaginary and applications to segmentation and detection. In fact, techniques are presented to detect regions in multispectral images. To obtain robustness against noise, noise propagation is adopted.
Visual information (images and video) is one of the most valuable sources of information. In fact, it is the core of current technologies such as the Internet and mobile phones. The immense stimulus of the use and exploitation of digital visual information demands for advanced knowledge representations, learning systems, and image understanding techniques. As all digital information is nowadays available in color (documents, images, videos, and movies), there is an increasing demand for the use and understanding of color information.
Although color has been proved to be a central topic in various disciplines, it has only been partly explored so far in computer vision, which this book resolves. The central topic of this book is to present color theories, color representation models, and computational methods, which are essential for visual understanding in the field of computer vision. Color is taken as the merging topic between different research areas such as mathematics, physics, machine learning, and human perception. Theoretical models are studied to express color semantics from both a physical and a perceptual point of view. These models are the foundations for visual exploration, which are tested in practice.
Part I
Color Fundamentals
Chapter 2: Color Vision
By Marcel P. Lucassen
For any vision system, color vision is possible only when two or more light sensors sample the spectral energy distribution of the incoming light in different ways. In animal life, several instantiations of this principle are found, some of them even using parts of the electromagnetic spectrum not visible to the human eye. Human color vision is basically trichromatic, involving three types of cone photoreceptors in the retinae of our eyes. According to a number of reports, however, some women may possess tetrachromatic vision involving four photoreceptor types. Less than three functional sensors—color deficiency—is a well-known phenomenon in humans, often erroneously termed as color blindness. But apart from these two anomalies, “normal” color vision starts with the absorption of light in three cone types. Responses arising from these cones are combined in retinal ganglion cells to form three opponent channels: one achromatic (black–white) and two chromatic channels (red–green and yellow–blue). Retinal ganglion cells send off pulselike signals through the optic nerve to the visual cortex, where the perception of color eventually takes place. With the advances in neural imaging techniques, vision researchers have learned much about the specific locations of information processing in the visual cortex. How this eventually results in the perception of color and associated color phenomena in the context of other perceptual attributes such as shape and motion is largely unknown. This chapter describes the basic building blocks of the visual pathway and provides some grip on the factors that affect the fascinating process of color vision.
Color vision starts with light that enters our eyes. At the cornea, a very sensitive part of our eyes, the incoming light is refracted. The diameter of the pupil, the hole in the iris through which light enters the eye, is dependent on the light intensity. Iris muscles cause the dilation and contraction of the pupil, which thereby regulates the amount of light entering the eye ball by a factor of about 10–30, depending on the exact minimum and maximum pupil diameters. Adjustment of the lens curvature by the lens muscles is the process known as accommodation and ensures the projection of a sharply focused image on the retina at the back of the eye ball. Unfortunately, because of the chromatic aberration of the lens it is not possible to have a focused image for all wavelengths simultaneously. This explains why red text on a blue background or vice versa can appear blurry and difficult to read. Blue and red are associated with the lower and upper ends of the visible wavelength spectrum, implying that when we focus on one, the other will be out of focus.
The retina contains two kinds of light-sensitive cells, rods and cones, named after their basic shapes. Each retina holds about 100 million photoreceptors, roughly 95 million rods and 5 million cones. At low light levels ( < 0.01 cd/m2), our vision is scotopic and served by rod activity only. In pure scotopic vision we sense differences in the light–dark dimension, but color vision is not possible. Also, visual acuity is poor. At intermediate light levels (0.01–1 cd/m2) our vision is mesopic, in which both rods and cones are active. In mesopic light conditions color discrimination is poor. At light levels above 1 cd/m2 our vision becomes photopic, where cone activity is best and allows for good color discrimination.
The spatial distribution of rods and cones along the retina is not uniform. Where cone density is high, rod density is low, and vice versa. Usually the visual field is divided into a central area (having high cone density) and a peripheral area (high rod density). Cone density is at maximum (around 150,000–200,000 cones/mm2) in a tiny spot central to the retina, the fovea, which allows us to perform high acuity tasks such as reading, and provides the best color discrimination. A yellow macular pigment covers the fovea and may serve to maintain high visual acuity because it filters out the blurry short wavelength light that is scattered in the ocular media. At the very heart of the fovea, an area known as the foveola, no S-cones are present at all, which causes small blue objects to be invisible to the S-cone system (Fig. 2.1c). This phenomenon is known as small-field tritanopia, a color vision deficiency for objects subtending visual angles smaller than 0.35°. The three cone types (L, M, S) occur in different numbers, in L:M:S ratios of about 60:30:5 although these numbers may vary considerably from person to person.
Figure 2.1 Cone mosaic at the central fovea, showing (a) L-cones, (b) M-cones, and (c) S-cones. The area shown is approximately 0.3 × 0.3 mm and is rod-free. The labeling in red, green, and blue refers to the spectral region where the cones have their maximum sensitivity. Note the different number of cones and the absence of S-cones in the center. Source: Figures adapted from Reference 1.
The three cone types have peak sensitivities at different wavelengths and are sensitive to the long-wave (L), middle-wave (M), and short-wave (S) portions of the wavelength spectrum. In Figure 2.2, the spectral sensitivities of the cone types are shown. Note that the sensitivities of the L- and M-cones are largely overlapping whereas the S-cones are spectrally more isolated. Owing to the spectral overlap, at each wavelength there exists a unique combination of L, M, S sensitivities. However, wavelength information is lost in the process that determines the cone responses. For each cone type, the response is obtained by summing up the wavelength-by-wavelength product of the light spectrum with the spectral sensitivity over the spectral window, resulting in three numbers (one for each cone type). The perceived color of an object is determined by the relative magnitude of these three numbers that the object “produces,” but not exclusively so. The visual system also makes spatial comparisons, which make the perceived color of an object dependent on neighboring colors as well.
Figure 2.2 (a) Relative spectral sensitivity of the three cone types. (b) Spectral luminous efficiency functions V(λ) for photopic vision and V′(λ) for scotopic vision, with sensitivities normalized to their maximum. Source: Data for 2° observer, after Reference 2.
A quantity often used in vision is the spectral luminous efficiency function, which is denoted by the symbol V(λ) for photopic vision and V′(λ) for scotopic vision. It represents the spectral sensitivity of the eye. For photopic vision, V(λ) is the spectral envelope obtained from a weighted average of the three cone sensitivities, and for scotopic vision it is the spectral sensitivity of the rods. Note that the latter is shifted toward the blue end of the spectrum.
If each photoreceptor were to be connected to individual brain cells, one can imagine that a neural cable of considerable thickness would be required. It makes sense therefore that, before signals are sent to the brain, the output signals of the cones are spatially pooled and combined. Also, from an information theory point of view it makes sense to compress the amount of visual information, given the limited bandwidth of the visual pathway [3]. The rods and cones are connected to subsequent layers of horizontal cells, bipolar cells, amacrine cells, and ganglion cells. Interestingly, the incoming light has to first pass these layers in reverse order to reach the layer containing the photoreceptors. The incoming light and the nerve signals thus travel in opposite directions. All neurons have inputs and outputs forming a complex structure in the retinal layer. The output of a neuron is influenced by inputs that can be excitatory (stimulating the output) or inhibitory (suppressing the output). The horizontal and amacrine cells make it possible to combine information from photoreceptors at different spatial locations. A single ganglion cell may thus receive inputs from many photoreceptors. The area on the retina that contributes to the stimulation of a ganglion cell is known as the receptive field. Likewise, neural cells along the visual pathway also have their receptive fields, but these are not necessarily equal to the receptive fields of ganglion cells. The axons of the ganglion cells together form the optic nerve, the connection between the eyes and the brain. When excited, the ganglion cells will fire sharply peaked output signals (pulses or spikes) to the optic nerve. To summarize, the light that is initially absorbed in the cone photoreceptors is transformed to electrical pulse signals that encode the visual information.
The next processing stage upstream the visual pathway to consider is the lateral geniculate nucleus, or LGN in short. It is the place where two streams of visual information meet: one stream coming from the left part of the visual field (projected on the right part of each retina) and another coming from the right part of the visual field (projected on the left part of each retina). The LGN can be thought of as a relay station, where signals from the retina pass and are sent to the primary visual cortex (V1) in the back of the head. The left and right “halves” of V1 thus receive information from the right and left halves of the visual field, respectively. Properties of cells within the LGN are very much like those of the retinal ganglion cells, including their receptive field organization. Important for the understanding of the (color) vision process is the notion of opponent cells, usually in a center-surround configuration. The so-called on-cells are excited by light stimulation in the central part of the receptive field, whereas they are inhibited by stimulation in the outer part of it (surrounding the center). Off-cells have the opposite spatial characteristics, that is, inhibition by light stimulation in the center of the receptive field and excitation in the surround. Cells with a center-surround configuration play an important role in vision, since they are capable of detecting spatial transitions in light intensity (such as edges) and color. Two types of chromatic cone opponent cells have been reported, sometimes called red–green and blue–yellow cells [4, 5]. Such cells compare signals from different cone types. In the case of the red–green on-cell, abbreviated to red-on, the cell is excited by stimulation of the L-cones and inhibited by the stimulation of the M-cones.
From LGN, nerve signals are sent to the visual cortex, which can be thought of as divided in a number of functionally distinct areas (V1–V5). The idea is that cells within such an area are predominantly responsible for analyzing different properties of the retinal image, such as shape, motion, orientation, and color [6]. Area V4 is considered an area that is specialized in color processing, although its role as “color center” is under debate. A recent review of the research of the past 25 years on cortical processing of color signals has put more emphasis on the role of area V1 [7]. Since the different areas in visual cortex are interconnected and feature both forward and backward loops, it is indeed hard to imagine that a single brain area would take care of all the color processing. We have also learned that color cannot be considered as a completely isolated visual property, since it is always in interaction with shape, texture, contrast, and so on, which thus would require information exchange between specialized brain areas. It is clear, however, that the visual information in one area depends on the presence of information in a preceding area. Opponent cells were found in LGN and also in V1. Another type of opponent cells, double opponent cells, was found in the primary visual cortex. These cells are capable of both spatial and chromatic opponency and are optimally excited when the color in the center of the receptive field is the opposite color from the one in the surround. And to make it even more complex, these cells also show temporal opponent characteristics [8]. Using noninvasive imaging techniques such as PET (positron emission tomography) and fMRI (functional magnetic resonance imaging), many studies have reported on the mapping of brain activity, and many will follow. This will hopefully lead to a more complete understanding of the processes underlying color vision and perception, and how it integrates into higher order processes involving, for instance, emotion and behavior.
The dynamic range of the human visual system is very impressive, covering a light intensity range of about 1012. This is achieved by adaptation to the ambient light level, a process in which the sensitivity to light is adjusted. Two variants of adaptation we are commonly aware of are light adaptation and dark adaptation, occurring whenever we change from a low light intensity to a high light intensity situation or vice versa. Light adaptation is a relatively fast process, in the order of seconds, whereas dark adaptation takes minutes to complete. Perhaps somewhat less noticeable is the process of chromatic adaptation, in which the sensitivities of the primary color channels (L, M, S) are individually adjusted. This has the effect of white-balancing because any color dominance is counterbalanced by the sensitivity readjustments. Chromatic adaptation is a continuous and spatially localized process, which may bring specific appearance effects when making eye movements after a period of fixation. Studies into the temporal characteristics of chromatic adaptation have shown that the underlying visual processes are characterized by both a fast and a slow component and are located at the receptor level as well as the cortical level [9, 10]. Figure 2.3 demonstrates the effect of chromatic adaption.
Figure 2.3 Demonstration of chromatic adaptation (inspired by the work of John Sadowski). Stare at the black dot in the image (a) for about 20 s, without blinking or moving your eyes. Then quickly look at the black spot in the center of the image (b). The image will appear as having natural colors for a brief period because of the aftereffect of chromatic adaptation.
The spectral distribution of daylight changes during the day. Despite these changes, the color appearance of objects is remarkably stable, a phenomenon known as color constancy. Grass remains green throughout the day, whereas from a physical point of view the more reddish light toward the end of the day would predict the grass to appear brownish. Color constancy is considered a basic property of the visual system and has been intensively studied in the past few decades. There exist different approaches to solving the problem of color constancy, which focus on the question of how to disentangle the product of illumination and surface reflection that enters our eye. Reviews of human color constancy studies are presented by Smithson [11] and Foster [12]. An overview of the computational approach to color constancy by illuminant estimation is presented in Chapter 8. Contrary to what the term constancy may suggest, there is abundant psychophysical evidence, coming from different experimental paradigms, showing that human color constancy is not perfect. The degree of color constancy can be quantified using a constancy index ranging between 0 (no constancy at all) and 1 (perfect constancy). Foster [12] tabulated values for the constancy index for some 30 different experimental studies, showing widely varying values. Imperfect constancy implies that a change in the color of the illuminant is not fully discounted for by the visual system, which results in noticeable shifts in object colors. Figure 2.4 presents a demonstration of color constancy. Figure 2.4b shows the original scene, and Figure 2.4a shows a simulated change in the color of the global illuminant acting on the whole image. Although we easily perceive the global shift toward a purplish color, the fruit colors stay reasonably constant. If, on the other hand, the simulated change in the illuminant is locally restricted to the apple in the center of the fruit basket, color constancy is lost and the apple appears purple. This demonstrates the different effects of local versus global changes in the illumination.
Figure 2.4 (a) Global change in illumination, (b) original image (standard image from ISO 12640:1997), and (c) local change in illumination. Note the very different appearance of the color of the apple for the global and the local illuminant change, although physically they are identical.
How can we explain the different appearances of the apple in the images (a) and (c) in Figure 2.4 while the physical light distributions reflected from the apples are identical? The key to the explanation is the fact that for the global change in illumination, ratios across object boundaries within the individual L-, M-, S-cone signals stay the same, whereas for the local illuminant change these ratios change. The latter results in the perception of a completely different color, as if the apple had been replaced by a different object. Ratios across borders or edges also play an important role in the retinex theory [13, 14]. According to the theory, the visual system independently processes three images, each image belonging to one cone type (L, M, or S). Within each cone image, lightness values (so-called designators) are calculated from spatial comparisons of the reflectance at a specific point to the maximum reflectance in the image. The combination of the three lightness values occupies a point in a three-dimensional space and determines the color. Retinex theory was shown to correlate well with visual perception and received a lot of attention from vision researchers (both in a positive and a negative way). Hurlbert [15] showed that several other lightness algorithms, all having the retinex algorithm as their precursor, are formally connected by one and the same mathematical formula. We refer to Chapter 5 where the role of color ratios for computational color constancy is discussed.
An alternative explanation of color constancy has a physiological basis. A well-known and often used chromatic adaptation model is the coefficient rule of von Kries [16]. It states that the sensitivities of the three cone types are regulated by cone-specific gain factors that are inversely proportional to the level of cone stimulation. To illustrate, let us assume that we are in a room in which we adapt to neutral (white) illumination that stimulates the L-, M-, and S-cones in equal amounts. Within the room are several colored objects and also a white object. Now we change the room illumination from neutral toward blue such that the S-cone system is stimulated twice as much, whereas the L- and M-cone stimulation remains unaffected. According to the von Kries coefficient law, the sensitivity of the S-cone system will be reduced by a factor of 2 to effectively rebalance the L-, M-, S-cone stimulation. For the white object, which takes on the illuminant color, this will result in unchanged cone stimulations, implying that von Kries adaptation permits perfect color constancy for the white object. For the colored objects in the room, however, perfect color constancy is not guaranteed because the interaction between the illuminant spectrum and the surface reflectance may result in S-cone ratios being different from 2.
Helson [17] proposed an adaptation model in which the visual system is adapted to a medium gray level. Objects with reflectances above that of the adaptation level take on the color of the illuminant, whereas objects with reflectances below that of the adaptation level take on the complementary color. This effect is known as the Helson–Judd Effect.
The perceived color of an object is determined not only by the light coming from that object but also by the light coming from neighboring objects in the scene. Colors seen in complete isolation, such as a patch of color on a black background presented on a color display, can appear as if they are self-luminous and emit light. When put in context of other colors, however, the appearance is different and dependent on the exact definition of the surrounding colors. Two important spatial interactions are mentioned here, which influence color perception, contrast, and assimilation. In contrast effects, the difference between a color and its surround is enhanced so that the two will look more different. The effect can be interpreted as an induction effect, whereby the color complementary to that of the surround is induced into the center. Different surrounds may give dramatically different effects, as demonstrated in Figure 2.5.
Figure 2.5 Simultaneous color contrast: the center squares are physically identical but appear different because of a difference in surround color.
The effect of assimilation, on the other hand, is the opposite of the contrast effect because with assimilation the difference between a color region and the adjacent color appears smaller. This leads to the perception that the color seems to be shifted toward that of the surrounding color. Figure 2.6 demonstrates how the perceived color of text may change completely. It appears that the color of the stripes covering the text spreads out into the text. In other words, the surrounding color induces its color into the target color.
Figure 2.6 Demonstration of chromatic assimilation (after Reference 18). (a) Shows four lines of text, the first two and the last two having the same color. When placed on differently colored backgrounds and “behind” thin colored stripes, the color of the stripes seems to spread into the color of the words. Physically, the colors of the text in (a) and the uncovered parts of the text in (b) are identical.
The demonstrations in Figures 2.5 and 2.6 are dependent on viewing distance, or more precisely, on the visual angles that the details subtend on the retina. We already mentioned that the number of S-cones is much less than that of the L- and M-cones; therefore they sample the retinal image at a lower spatial resolution. This has consequences also for the spatial resolution of the blue–yellow channel. Figure 2.7 shows how the contrast sensitivity of the achromatic channel and the two chromatic channels of the visual system depends on the spatial frequency. Fine details (higher spatial frequencies) are best detected by the luminance channel, whereas the two chromatic channels are better equipped to detect more coarse details (lower spatial frequencies). This property of the visual system is used successfully in image compression techniques. Since the chromatic channels cannot detect (at a certain viewing distance) the high spatial frequency contents of a color image, this information can be removed or compressed without visually degrading the image.
Figure 2.7 Contrast sensitivity functions for luminance and chromatic contrast, as a function of spatial frequency. Source: Replotted from Figures 7 and 9 in Reference 19. Solid lines represent fits to the data. Note the difference between the low pass characteristic of the chromatic channels and the bandpass characteristic of the achromatic channel.
Spatial effects can occur only when some form of spatial comparison is performed by the visual system. We already noted the importance of center-surround cells for vision because they allow the detection of intensity and color edges. Mathematically, these edge detectors are obtained by taking spatial derivatives, as presented in Chapter 6.
A number of studies have focused on the question of how many colors can be perceived by humans. There is no single answer to this question, since it depends on the criteria used for counting discriminable colors. As a result, estimates vary from order 103 to 106. If we go out to buy a can of red paint to match the color of a tomato we saw earlier that day, chances are very high that the two colors will not match. Humans are far better in seeing differences between colors (relative color) than in memorizing absolute colors. Early measurements of chromatic discrimination thresholds [20] have laid the basis for the developments of a perceptually uniform color space (CIELAB), and the derivation of mathematical formulae to quantify color differences [21]. The latter are abundantly used in industry.
There exist various tests to measure someone's chromatic discrimination ability. Even for normal trichromats, people with “normal” color vision, this ability may change from person to person. There are different ways in which color vision may be impaired; usually the distinction is made between acquired and congenital color vision deficiencies. Aging causes the ocular media to become more yellow, which reduces color discrimination along the yellow–blue axis of color space [22]. Some diseases, alcohol consumption [23], medication, and drugs [24] can negatively affect color vision abilities. These are examples of acquired color vision deficiencies. With congenital deficiencies, abnormalities in the photopigments are inherited and are already present at birth. This affects about 8% of men and 0.45% of women. The spectral sensitivities of the photopigments can differ from normal trichromats in many different ways. The terms protan, deutan, and tritan are used to indicate that the L-, M-, and S-cone, respectively, are abnormal. We can indicate the severeness of this abnormality by a number ranging between 0 (cone type missing) and 1 (normal). If the abnormality is somewhere in between 0 and 1, we speak of anomalous trichromats. If one cone pigment is missing, only two functional cone types are left, resulting in dichromatic color vision. Depending on the cone type that is lacking (L, M, or S), dichromats are characterized as protanopes, deuteranopes, or tritanopes. Color discrimination for dichromats is strongly reduced as illustrated in Figure 2.8.
Figure 2.8 (a) Original image. (b) Simulated appearance for a deuteranope (missing the M-cone photopigment). Simulated image obtained with the TNO color deficiency simulator.
It is mistaken belief that color-deficient people are not able to see color, as the term color blind would suggest. What is meant is that they are less well able to discriminate colors; some colors are confused, which can be graphically shown in color space (Fig. 2.9). Colors located on the so-called confusion lines cannot be distinguished, and hence appear equal. For the different types of deficiency, the confusion lines originate in different copunctal points.
Figure 2.9 CIE 1931 x, y chromaticity space showing confusion lines for a protan, deutan, and tritan. Colors located on such confusion lines are not distinguished by color deficients.
The different stages of color information processing along the human visual pathway have been highlighted. Color vision begins with the absorption of light in the three cone types at the retinal level. Cone responses are spatially compared and transformed to three opponent color signals (one achromatic and two chromatic), traveling along the optic nerve from LGN to the visual cortex, where the perception of color eventually takes place. We discuss important chromatic properties of the visual system, such as chromatic adaptation and color constancy, which provide demonstrations of spatial interactions and finally take a look at color deficiency.
Chapter 3: Color Image Formation
