85,99 €
Data registration refers to a series of techniques for matching or bringing similar objects or datasets together into alignment. These techniques enjoy widespread use in a diverse variety of applications, such as video coding, tracking, object and face detection and recognition, surveillance and satellite imaging, medical image analysis and structure from motion. Registration methods are as numerous as their manifold uses, from pixel level and block or feature based methods to Fourier domain methods.
This book is focused on providing algorithms and image and video techniques for registration and quality performance metrics. The authors provide various assessment metrics for measuring registration quality alongside analyses of registration techniques, introducing and explaining both familiar and state-of-the-art registration methodologies used in a variety of targeted applications.
Key features:
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 410
Veröffentlichungsjahr: 2015
Cover
Title Page
Copyright
Preface
Acknowledgements
Chapter 1: Introduction
1.1 The History of Image Registration
1.2 Definition of Registration
1.3 What is Motion Estimation
1.4 Video Quality Assessment
1.5 Applications
1.6 Organization of the Book
References
Chapter 2: Registration for Video Coding
2.1 Introduction
2.2 Motion Estimation Technique
2.3 Registration and Standards for Video Coding
2.4 Evaluation Criteria
2.5 Objective Quality Assessment
2.6 Conclusion
2.7 Exercises
References
Chapter 3: Registration for Motion Estimation and Object Tracking
3.1 Introduction
3.2 Optical Flow
3.3 Efficient Discriminative Features for Motion Estimation
3.4 Object Tracking
3.5 Evaluating Motion Estimation and Tracking
3.6 Conclusion
3.7 Exercise
References
Chapter 4: Face Alignment and Recognition Using Registration
4.1 Introduction
4.2 Unsupervised Alignment Methods
4.3 Supervised Alignment Methods
4.4 3D Alignment
4.5 Metrics for Evaluation
4.6 Conclusion
4.7 Exercise
References
Chapter 5: Remote Sensing Image Registration in the Frequency Domain
5.1 Introduction
5.2 Challenges in Remote Sensing Imaging
5.3 Satellite Image Registration in the Fourier Domain
5.4 Correlation Methods
5.5 Subpixel Shift Estimation in the Fourier Domain
5.6 FFT-Based Scale-Invariant Image Registration
5.7 Motion Estimation in the Frequency Domain for Remote Sensing Image Sequences
5.8 Evaluation Process and Related Datasets
5.9 Conclusion
5.10 Exercise – Practice
References
Chapter 6: Structure from Motion
6.1 Introduction
6.2 Pinhole Model
6.3 Camera Calibration
6.4 Correspondence Problem
6.5 Epipolar Geometry
6.6 Projection Matrix Recovery
6.7 Feature Detection and Registration
6.8 Reconstruction of 3D Structure and Motion
6.9 Metrics and Datasets
6.10 Conclusion
6.11 Exercise—Practice
References
Chapter 7: Medical Image Registration Measures
7.1 Introduction
7.2 Feature-Based Registration
7.3 Intensity-Based Registration
7.4 Transformation Spaces and Optimization
7.5 Conclusion
7.6 Exercise
References
Chapter 8: Video Restoration Using Motion Information
8.1 Introduction
8.2 History of Video and Film Restoration
8.3 Restoration of Video Noise and Grain
8.4 Restoration Algorithms for Video Noise
8.5 Instability Correction Using Registration
8.6 Estimating and Removing Flickering
8.7 Dirt Removal in Video Sequences
8.8 Metrics in Video Restoration
8.9 Conclusions
8.10 Exercise—Practice
References
Index
End User License Agreement
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
Cover
Table of Contents
Preface
Begin Reading
Chapter 1: Introduction
Figure 1.1 (a) Occluded objects A1 and A2, (b) single object A
Figure 1.2 Motion estimation predicts the contents of each macroblock base due the motion relative to the reference frame. The reference frame is searched to find the 16 × 16 block that matches the macroblock
Figure 1.3 Steps of hand and fingers motion estimation
Figure 1.4 The output of a tracking system
Figure 1.5 Observing other vehicles with a single camera
Figure 1.6 (a) Intelligent CCTV system, (b) 3D indoor visual system
Figure 1.7 (a). Fragment of the satellite image, (b) 3D visualization of the part of the city, obtained as the result of the high-resolution satellite image processing.
Source:
http://commons.wikimedia.org/wiki/File:Pentagon-USGS-highres-cc.jpg
Figure 1.8 The 3D shape of a monument reconstructed using a sequence of 2D images
Chapter 2: Registration for Video Coding
Figure 2.1 Block matching algorithm
Figure 2.2 Full-Search algorithm using a (a) raster and a (b) spiral order
Figure 2.3 (a) The Three-Step Search (TSS) and (b) the New Three-Step Search (NTSS)
Figure 2.4 The 2D Logarithmic (TDL) Search converging on a position of minimum distortion
Figure 2.5 (a) Match is found at position (+2, +6) and (b) match is found at position (−3, +1)
Figure 2.6 (a). Example of the Cross Search Algorithm, arrows illustrate the different patterns used in the final stage. (b) The Orthogonal Search Algorithm (OSA) converging on a position of minimum distortion at (+6, +4)
Figure 2.7 Diamond Search procedure. This Figure shows the Large Diamond Search Pattern and the Small Diamond Search Pattern. It also shows an example path to motion vector (−4, 2) in five search steps: four times of LDSP and one time of SDSP
Figure 2.8 Adaptative rood pattern (ARP)
Figure 2.9 Quad-tree structure example
Figure 2.10 (a, b) A tree-level hierarchical search
Figure 2.11 Component for the compression process in H.264
Figure 2.12 Components in H.264 decoder
Figure 2.13 Possible partition of macroblocks and possible sub-partitions of a block
Figure 2.14 Multiple reference motion estimation
Figure 2.15 Three frames of ‘Mobcal’ sequence (a), ‘Basketball’ sequence (b), ‘Garden’ sequence (c), ‘Foreman’ sequence(d) and ‘Yos’ sequence (e).
Source
: https://media.xiph.org/video/derf/
Figure 2.16 Sample frames from HDMB
Figure 2.17 Sample frame from LIVE Video Quality Database
Figure 2.18 An example of motion-compensated frame differences corresponding to 1 dB gain (right over left frame)
Figure 2.19 An example of 12° angular error gain (right over left frame)
Figure 2.20 Comparison of ‘Lena’ images with different types of distortions: (a) original ‘Lena’ image, 512×512, 8 bits/pixel; (b) multiplicative speckle noise contaminated image, MSE = 225, ; (c) blurred image, MSE = 225, ; (d) additive Gaussian noise contaminated image, MSE = 225, ; (e) mean shifted image, MSE = 225, ; (f) contrast stretched image, MSE = 225, ; (g) JPEG compressed image, MSE = 215, ; (h) impulsive salt-pepper noise contaminated image, MSE = 225,
Figure 2.21 Diagram of structural similarity measurement system
Figure 2.22 Video quality assessment system based on structural information
Figure 2.23 Reduced-reference scheme
Chapter 3: Registration for Motion Estimation and Object Tracking
Figure 3.1 Flow diagram of an OF-based object detector [2]
Figure 3.2 Car tracking system based on optical flow estimation. a&e)original image containing the objects of interest, b) motion field estimation using Horn–Schunk optical flow, f ) motion field estimation using Lukas–Kanade optical flow, c&g)foreground segmentation, d&h) object tracking
Figure 3.3 (a) Large displacement optical flow used for body part segmentation and pose Estimation; (b) deformable object tracking using optical flow constraints
Figure 3.4 Human pose estimation using optical flow features. (a) Articulated pose tracking using optical flow for dealing with self-occlusion; (b) upper-body human pose estimation using flowing puppets, consisting on articulated shape models with dense optical flow [20]
Figure 3.5 Diagram of vehicle detection in surveillance airborne video described in [21]. Images are first stabilized by detecting invariant features, which are used to estimate the affine transformation using RANSAC. Then, dense optical flow is estimated and segmented using EM clustering
Figure 3.6 (a) KLT features used for stabilization. (b) Stabilized image. (c) Optical flow clusterized according to EM. (d) EM clusters. (e) Resulting foreground segmentation. Images from Ref. [21]
Figure 3.7 Sampling patterns belonging to the feature-based descriptors: (a) SIFT and SURF [40]; (b) BRISK [45]; and (c) FREAK [46]
Figure 3.8 Face location of an orientation tracking using KLT and minimum Eigenvalue features [58, 59]
Chapter 4: Face Alignment and Recognition Using Registration
Figure 4.1 Effects of registration on recognition. In this picture, the subject is wrongly identified when using directly the input provided by Viola and Jones face detector (box in a). In (b) the person is correctly identified after correcting the alignment and compensating the illumination
Figure 4.2 Dense grid registration. (a) Reference image with grid overlaying and extracted reference template. (b) Input image with warped resulting grid and registered input image [6]
Figure 4.3 Examples of supervised alignment methods for face recognition
Figure 4.4 (a) Typical keypoints used in generative models. (b) Training set samples used for learning a PDM model before and after applying procrustes algorithm [9]
Figure 4.5 Keypoints corresponding to eye, mouth corners and nose are extracted using dedicated classifiers and a filtering strategy to remove false positives
Chapter 5: Remote Sensing Image Registration in the Frequency Domain
Figure 5.1 An example of atmospheric and cloud interactions.
Source
: Visible Earth NASA
Figure 5.2 Examples of multitemporal and terrain/relief effects due to natural hazards.
Source
: Earth Imaging Journal
Figure 5.3 The correlation surface with a peak at the coordinates corresponding to the shift between the two pictures at the top.
Source
: Visible Earth database
Figure 5.4 (a) The 512 × 512 pentagon image. (b) and (c) The distribution of the difference in orientation ΔΦ between the original image and two circularly shifted versions.
Source
: http://commons.wikimedia.org/wiki/File:Pentagon-USGS-highres-cc.jpg
Figure 5.5 (a) Simple illustration of various fitted functions, (b) sample of the PC surface in the horizontal axis, (c)
e
sin
c
-fitted function [65], (d) Gaussian-fitted function, (e) sin
c
-fitted function and (f) quadratic-fitted function
Figure 5.6 (a) and (c) Satellite images from Visible Earth dataset; (b) and (d) the rotated versions; and (e) the 1
D
representations
A
(dashed line) and
A
G
(solid line).
Source
: Visible Earth database
Figure 5.7 (a) Cartesian, (b) polar and (c) log-polar grids
Figure 5.8 Average peak ratio for all the test sequences
Figure 5.9 Steps of shape adaptive PC algorithm using padding techniques.
Source
: Visible Earth database
Figure 5.10 Examples of simulated images for remote sensing registration [84]
Chapter 6: Structure from Motion
Figure 6.1 The main structure of the pipeline used in Structure from Motion
Figure 6.2 A representation of the concept of a pinhole camera
Figure 6.3 The camera orientation and position in the camera coordinate system
Figure 6.4 The camera orientation and position in the camera coordinate system
Figure 6.5 An example of epipolar geometric showing the two cameras observing the same scene from their centers of projection
Figure 6.6 An example showing the epipole projected into the other camera's image plane
Figure 6.7 An example of the feature detection and extraction process, and the obtained alignedimages (panoramic view)
Figure 6.8 Example of block based auto-correlation over uniform areas, edges and corners
Figure 6.9 Examples of extracted SIFT features
Figure 6.10 The pixels in a level are the result of applying a Gaussian filter to the lower level and then subsampling the reduce the size
Figure 6.11 Example of calculating the Summed Area Table
Figure 6.12 An example of an integral image representation
Figure 6.13 Visualisation of how inference progressed in a keyframe-based optimisation
Figure 6.14 Visualisation of how inference progressed in a filter based optimisation
Figure 6.15 An example of image stitching that generates a panoramic view of 360 degrees, with the subjects moving around following the camera direction
Figure 6.16 The main steps of a general image stitching algorithm
Chapter 7: Medical Image Registration Measures
Figure 7.1 Dense spatial embedding of the floating image. Points within the image grid (in the shaded area) are interpolated, while points outside the image grid are assigned the unknown intensity symbol *
Figure 7.2 Joint intensity binning
Figure 7.3 Profiles of three variants of the mutual information measure against translations with constant direction. (a) Mutual information, (b) normalized mutual information and (c) pseudo-log-likelihood
Figure 7.4 Joint histogram of two brain magnetic resonance volumes using 256 bins for both images. (a) Unregistered images and (b) registered images
Figure 7.5 Joint histogram of brain magnetic resonance and computed tomography volumes using 256~bins for both images. (a) unregistered images and (b) registered images
Figure 7.6 ‘Grey stripe’ images. The floating is a binary image with size 40 × 30 pixels. The source is a gradation image with size 30 × 30 pixels in which each column has a different intensity. (a) Floating image and (b) source image
Figure 7.7 Profiles of mutual information and kernel pseudo log-likelihood against horizontal translations in the ‘grey stripe’ registration experiment. Mutual information is globally maximal for any integer translation (for non-integer translations, smaller values are observed due to floating intensity interpolation). (a) Mutual information and (b) Kernel pseudo-likelihood
Figure 7.8 Example of rigid CT-to-MRI registration using finite mixture pseudo-likelihood with
C=5
components. Shown in panel (c) is the final maximum a posteriori tissue classification using the outcome of the EM algorithm. The CT information makes it possible to segment the skull, which is indistinguishable in intensity from brain tissues in the MRI volume. (a) MRI volume, (b) registered CT volume, and (c) 5 class segmentation
Figure 7.9 2D representation of trilinear interpolation weights. The center dot represents a transformed voxel. Outer dots represent grid neighbours
Figure 7.10 Profiles of three variants of the mutual information measure against translations with constant direction. (a) T1-weighted magnetic resonance image with Canny–Deriche contours overlaid, (b) T2-weighted image from an other subject with T1-weighted image contours overlaid and (c) objective registered T2-weighted image
Chapter 8: Video Restoration Using Motion Information
Figure 8.1 Processing steps in image sequence restoration
Figure 8.2 Conversion process from film to PAL and NTSC
Figure 8.3 Examples of film grain (a) and image/video noise (b)
Figure 8.4 Examples of grain (top row) and video noise (bottom row) patterns
Figure 8.5 (a) Example ofspecial filter, (b) noise filter operating on the temporal axes
Figure 8.6 The arrangement of the temporal filtering system
Figure 8.7 A first-order recursive filter, where is a scaling factor
Figure 8.8 An example of motion-compensated filtering system.
Figure 8.9 Example frames of the Joan sequence indicating the motion of the camera
Figure 8.10 A close example of the Joan sequence comparing pairs of successive frames showing the global motion
Figure 8.11 Example of stabilization of flicker-like effects in image sequences through local contrast correction
Figure 8.12 Example frames with scratches, vertical and non-vertical
Figure 8.13 An example of a degraded frame is shown at the left, while the ground truth is denoted with dark spots on the right image indicating the locations of dirt
Figure 8.14 Example of dirt detection and removal
Figure 8.15 Example of intensity variation over time in locations with dirt artefacts
Figure 8.16 Median filtering on the co-sided pixels (dark coloured) and on the motion-compensated pixel locations (transparent) for both moving objects (left) and dirt artefacts (right)
Figure 8.17 Block diagram of dirt removal using motion compensation
Figure 8.18 Examples of video and film sequences used in restoration for evaluation
Chapter 3: Registration for Motion Estimation and Object Tracking
Table 3.1 Properties of the main datasets on tracking and activity recognition
Table 3.2 Properties of the main datasets on tracking and activity recognition (continue)
Chapter 5: Remote Sensing Image Registration in the Frequency Domain
Table 5.1 A list of instruments used for remote sensing data capturing and their properties
Chapter 7: Medical Image Registration Measures
Table 7.1 Some popular medical image registration measures expressed in terms of the normalized joint histogram (note that the dependence in
T
is omitted from the formulas for clarity). Measures marked with a star () are to be minimized. Measures marked with a dag () are not invariant through swapping the rows and columns of the joint histogram
Table 7.2 Some usual 3D transformation stabilizers with associated differential forms and kernel functions. Note that the kernel function corresponding to linear elasticity is an approximation taken from Ref. [76] where
Chapter 8: Video Restoration Using Motion Information
Table 8.1 Summary of film and video specifications
Vasileios Argyriou
Kingston University, UK
Jesús Martínez del Rincón
Queen's University Belfast, UK
Barbara Villarini
University College London, UK
Alexis Roche
Siemens Healthcare / University Hospital Lausanne / École Polytechnique Fédérale Lausanne, Switzerland
This edition first published 2015
© 2015 John Wiley & Sons Ltd
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Image, video & 3D data registration : medical, satellite and video processing applications with quality metrics / [contributions by] Vasileios Argyriou, Faculty of Science, Engineering and Computing, Kingston University, UK, Jesús Martínez del Rincón, Queen's University, Belfast, UK, Barbara Villarini, University College London, UK, Alexis Roche, Siemens Medical Solutions, Switzerland.
pages cm
Includes bibliographical references and index.
ISBN 978-1-118-70246-8 (hardback)
1. Image registration. 2. Three-dimensional imaging. I. Argyriou, Vasileios. II. Title: Image, video and 3D data registration.
TA1632.I497 2015
006.6′93 – dc23
2015015478
A catalogue record for this book is available from the British Library.
Cover Image: Courtesy of Getty images
This book was motivated by the desire we and others have had to further the evolution of the research in computer vision and video processing, focusing on image and video techniques for registration and quality performance metrics. There are a significant number of registration methods operating at different levels and domains (e.g. block or feature based, pixel level, Fourier domain), each method applicable in specific domains. Image registration or motion estimation in general is the process of calculating the motion of a camera and/or the motion of the individual objects composing a scene. Registration is essential for many applications such as video coding, tracking, object and face detection and recognition, surveillance and satellite imaging, structure from motion, simultaneous localization and mapping, medical image analysis, activity recognition for entertainment, behaviour analysis and video restoration.
In this book, we present the state-of-the-art registration based on the targeted application providing an introduction to the particular problems and limitations of each domain, an overview of the previous approaches and a detailed analysis of the most well-known current methodologies. Additionally, various assessment metrics for measuring the quality of registration are presented showcasing the differences among different targeted applications. For example, the important features in a medical image (e.g. MRI data) may be different from a human face picture, and therefore, the quality metrics are adjusted accordingly. The state-of-the-art metrics for quality assessment is analysed explaining their advantages and disadvantages and providing visual examples. Also information about common datasets utilized to evaluate these approaches is discussed for each application.
The evolution of the research related to registration and quality metrics has been significant over recent decades with popular examples including simple block matching techniques, optical flow and feature-based approaches. Also, over the last few years with the advent of hardware architectures, real-time algorithms have been introduced. In the near future, it is expected to have high-resolution images and videos processed in real time. Furthermore, the advent of new acquisition devices capturing new modalities such as depth require traditional concepts in registration and the quality assessment to be revised while new applications are also discussed.
This book will provide:
an analysis of registration methodologies and quality metrics covering the most important research areas and applications;
an introduction to key research areas and the current work underway in these areas;
to an expert on a particular area the opportunity to learn about approaches in different registration applications and either obtain ideas from them, or apply his or her expertises to a new area improving the current approaches and introducing novel methodologies;
to new researchers an introduction up to an advanced level, and to specialists, ways to obtain or transfer ideas from different areas covered in this book.
We are deeply indebted to many of our colleagues who have given us valuable suggestions for improving the book. We acknowledge helpful advice from Professor Theo Vlachos and Dr. George Tzimiropoulos during the preparation of some of the chapters.
In the last few decades, the evolution in technology has provided a rapid development in image acquisition and processing, leading to a growing interest in related research topics and applications including image registration. Registration is defined as the estimation of a geometrical transformation that aligns points from one viewpoint of a scene with the corresponding points in the other viewpoint. Registration is essential in many applications such as video coding, tracking, detection and recognition of object and face, surveillance and satellite imaging, structure from motion, simultaneous localization and mapping, medical image analysis, activity recognition for entertainment, behaviour analysis and video restoration. It is considered one of the most complex and challenging problems in image analysis with no single registration algorithm to be suitable for all the related applications due to the extreme diversity and variety of scenes and scenarios. This book presents image, video and 3D data registration techniques for different applications discussing also the related quality performance metrics and datasets. State-of-the-art registration methods based on the targeted application are analysed, including an introduction to the problems and limitations of each method. Additionally, various assessment quality metrics for registration are presented indicating the differences among the related research areas. For example, the important features in a medical image (e.g. MRI data) may not be the same as in the picture of a human face, and therefore the quality metrics are adjusted accordingly. Therefore, state-of-the-art metrics for quality assessment are analysed explaining their advantages and disadvantages, and providing visual examples separately for each of the considered application areas.
In image processing, one of the first times that the concept of registration appeared was in Roberts' work in 1963 [1]. He located and recognized predefined polyhedral objects in scenes by aligning their edge projections with image projections. The first registration applied to an image was in the remote sensing literature. Using sum of absolute differences as similarity measure, Barnea and Silverman [2] and Anuta [3, 4] proposed some automatic methods to register satellite images. In the same years, Leese [5] and Pratt [6] proposed a similar approach using the cross-correlation coefficient as similarity measure. In the early 1980s, image registration was used in biomedical image analysis using data acquired from different scanners measuring anatomy. In 1973, for the first time Fischler and Elschlager [7] used non-rigid registration to locate deformable objects in images. Also, non-rigid registration was used to align deformed images and to recognize handwritten letters. In medical imaging, registration was employed to aligned magnetic resonance (MR) and computer tomography (CT) brain images trying to build an atlas [8, 9].
Over the last few years due to the advent of powerful and low-cost hardware, real-time registration algorithms have been introduced, improving significantly their performance and accuracy. Consequently, novel quality metrics were introduced to allow unbiased comparative studies. This book will provide an analysis of the most important registration methodologies and quality metrics, covering the most important research areas and applications. Through this book, all the registration approaches in different applications will be presented allowing the reader to get ideas supporting knowledge transfer from one application area to another.
During the last decades, automatic image registration became essential in many image processing applications due to the significant amount of acquired data. With the term image registration, we define the process of overlaying two or more images of the same scene captured in different times and viewpoints or sensors. It represents a geometrical transformation that aligns points of an object observed from a viewpoint with the corresponding points of the same or different object captured from another viewpoint. Image registration is an important part of many image processing tasks that require information and data captured from different sources, such as image fusion, change detection and multichannel image restoration. Image registration techniques are used in different contexts and types of applications. Typically, it is widely used in computer vision (e.g. target localization, automatic quality control), in remote sensing (e.g. monitoring of the environment, change detection, multispectral classification, image mosaicing, geographic systems, super-resolution), in medicine (e.g. combining CT or ultrasound with MR data in order to get more information, monitor the growth of tumours, verify or improve the treatments) and in cartography updating maps. Image registration is also employed in video coding in order to exploit the temporal relationship between successive frames (i.e. motion estimation techniques are used to remove temporal redundancy improving video compression and transmission).
In general, registration techniques can be divided into four main groups based on how the data have been acquired [10]:
Different viewpoints (multiview analysis)
: A scene is acquired from different viewpoints in order to obtain a larger/panoramic 2D view or a 3D representation of the observed scene.
Different times (multitemporal analysis)
: A scene is acquired in different times, usually on a regular basis, under different conditions, in order to evaluate changes among consecutive acquisitions.
Different sensors (multimodal analysis)
: A scene is acquired using different kinds of sensors. The aim is to integrate the information from different sources in order to reveal additional information and complex details of the scene.
Scene to model registration
: The image and the model of a scene are registered. The model can be a computer representation of the given scene, and the aim is to locate the acquired scene in the model or compare them.
It is not possible to define a universal method that can be applied to all registration tasks due to the diversity of the images and the different types of degradation and acquisition sources. Every method should take different aspects into account. However, in most of the cases, the registration methods consist of the following steps:
Feature detection
: Salient objects, such as close-boundary regions, edges, corners, lines and intersections, are manually or automatically detected. These features can be represented using points such as centre of gravity and line endings, which are called control points (CPs).
Feature matching
: The correspondence between the detected features and the reference features is estimated. In order to establish the matching, features, descriptors and similarity measures among spatial relationships are used.
Transform model estimation
: According to the matched features, parameters of mapping functions are computed. These parameters are used to align the sensed image with the reference image.
Image resampling and transformation
: The sensed image is transformed using the mapping functions. Appropriate interpolation techniques can be used in order to calculate image values in non-integer coordinates.
Video processing differs from image processing due to the fact that most of the observed objects in the scene are not static. Understanding how objects move helps to transmit, store and manipulate video in an efficient way. Motion estimation is the research area of imaging a video processing that deals with these problems, and it is also linked to feature matching stage of the registration algorithms. Motion estimation is the process by which the temporal relationship between two successive frames in a video sequence is determined. Motion estimation is a registration method used in video coding and other applications to exploit redundancy mainly in the temporal domain.
When an object in a 3D environment moves, the luminance of its projection in 2D is changing either due to non-uniform lighting or due to motion. Assuming uniform lighting, the changes can only be interpreted as movement. Under this assumption, the aim of motion estimation techniques is to accurately model the motion field. An efficient method can produce more accurate motion vectors, resulting in the removal of a higher degree of correlation.
Integer pixel registration may be adequate in many applications, but some problems require sub-pixel accuracy, either to improve the compression ratio or to provide a more precise representation of the actual scene motion. Despite the fact that sub-pixel motion estimation requires additional computational power and execution time, the obtained advantages settle its use that is essential for the most multimedia applications.
In a typical video sequence, there is no 3D information about the scene contents. The 2D projection approximating a 3D scene is known as ‘homography’, and the velocity of the 3D objects corresponds to the velocity of the luminance intensity on the 2D projection, known as ‘optical flow’. Another term is ‘motion field’, a 2D matrix of motion vectors, corresponding to how each pixel or block of pixels moves. General ‘motion field’ is a set of motion vectors, and this term is related to the ‘optical flow’ term, with the latter being used to describe dense ‘motion fields’.
Finding the motion between two successive frames of a video sequence is an ill-posed problem due to the intensity variations not exactly matching the motion of the objects. Another problematic phenomenon is the covered objects, in which case it is efficient to make the assumption that the occluded objects can be considered as many separable objects, until they are observed as a single object (Figure 1.1). Additionally, in motion estimation, it is assumed that motion within an object is smooth and uniform due to the spatial correlation.
Figure 1.1 (a) Occluded objects A1 and A2, (b) single object A
The concept of motion estimation is used in many applications and is analysed in the following chapters providing details of state-of-the-art algorithms, allowing the reader to apply this information in different contexts.
The main target in the design of modern multimedia systems is to improve the video quality perceived by the user. Video quality assessment is a difficult task because many factors can interfere on the final result.
In order to obtain quality improvement, the availability of an objective quality metric that represents well the human perception is crucial. Many methods and measures have been proposed aiming to provide objective criteria that give accurate and repeatable results taking into account the subjective experience of a human observer. Objective quality assessment methods based on subjective measurements are using either a perceptual model of the human visual system (HVS) or a combination of relevant parameters tuned with subjective tests [11, 12].
Objective measurements are used in many image and video processing applications since they are easy to apply for comparative studies. One of the most popular metrics is peak signal-to-noise ratio (PSNR) that is based on the mean square error between the original and a distorted data. The computation of this value is trivial but has significant limitations. For example, it does not correlate well with the perceived quality, and in many cases the original undistorted data (e.g. images, videos) may not be available.
At the end of each chapter, a description of the metrics used to assess the quality of the presented registration methods is available for all the discussed applications, highlighting the key factors that affect the overall quality, the related problems and solutions, and the examples to illustrate these concepts.
Registration techniques are required in many applications based on video processing. As mentioned in the earlier section, motion estimation is a registration task employed to determine the temporal relationship between the video frames. One of the most important applications of motion estimation is in video coding systems.
Video CODECs (COder/DECoder) comprise an encoder and a decoder. The encoder compresses (encodes) video data resulting in a file that can be stored or streamed economically. The decoder decompresses (decodes) encoded video data (whether from a stored file or streamed), enabling video playback.
Compression is a reversible conversion of data to a format that requires fewer bits, usually performed so that the data can be stored or transmitted more efficiently. The size of the data in compressed form relative to the original size is known as the compression ratio . If the inverse of the process, ‘decompression’, produces an exact replica of the original data, then the compression is lossless. Lossy compression, usually applied to image and video data, does not allow reproduction of an exact replica of the original data but results in higher compression ratios.
Neighbouring pixels within an image or a video frame are highly correlated (spatial redundancy). Also neighbouring areas within successive video frames are highly correlated too (temporal redundancy).
A video signal consists of a sequence of images. Each image can be compressed individually without using the other video frames (intra-frame coding) or can exploit the temporal redundancy considering the similarity among consecutive frames (inter-frame coding), obtaining a better performance. This is achieved in two steps:
Motion estimation
: A region (usually a block) of the current frame is compared with neighbouring region of the adjacent frames. The aim is to find the best match typically in the form of motion vectors (
Figure 1.2
).
Motion compensation
: The matching region from the reference frame is subtracted from the current region block.
Figure 1.2 Motion estimation predicts the contents of each macroblock base due the motion relative to the reference frame. The reference frame is searched to find the 16 × 16 block that matches the macroblock
Motion estimation considers images of the same scene acquired in different time, and for this reason it is regarded as an image registration task. In Chapter 2, the most popular motion estimation methods for video coding are presented.
Motion estimation is not only utilised in video coding applications but also to improve the resolution and the quality of the video. If we have multiple, shifted and low-resolution images, we can use image processing methods in order to obtain high-resolution images.
Furthermore, digital videos acquired by consumer camcorders or high-speed cameras, which can be used in industrial applications and to track high-speed objects, are often degraded by linear space-varying blur and additive noise. The aim of video restoration is to estimate each image or frame, as it would appear without the effects of sensor and optics degradations. Image and video/restoration are essential when we want to extract still images from videos. This is because blurring and noise may not be visible to the human eye at usual frame rates, but they can become rather evident when observing a ‘freeze-frame’. The restoration is also a technique used when historical film materials are encoded in a digital format. Especially if they are encoded with block-based encoders, many artefacts may be present in the coded frame. These artefacts are removed using sophisticated techniques based on motion estimation. In Chapter 8, video registration techniques used in restoration applications are presented.
Medical images are increasingly employed in health care for different kinds of tasks, such as diagnosis, planning, treatment, guided treatment and monitoring diseases progression. For all these studies, multiple images are acquired from subjects at different times and in the most of the cases using different imaging modalities and sensors. Especially with the growing number of imaging systems, different types of data are produced. In order to improve and gain information, proper integration of these data is highly desirable. Registration is then fundamental in this integration process. One example of different data registration is the epilepsy surgery. Usually the patients undergo various data acquisition processes including MR, CT, digital subtraction angiography (DSA), ictal and interictal single-photon emission computed tomography (SPECT) studies, magnetoencephalography (MEG), electroencephalography (EEG) and positron emission tomography (PET). Another example is the radiotherapy treatment, in which both CT and MR are employed. Therefore, it can be argued that the benefits for the surgeons are significant by registering all these data. Registering methods are also applied to monitor the growth of a tumour or to compare the patient's data with anatomical atlases.
Motion estimation is also used for medical applications operating like a doctor's assistant or guide. For example, motion estimation is used to indicate the right direction for the laser, displaying the optical flow (OF) during interstitial laser therapy (ILT) of a brain tumour. The predicted OF velocity vectors are superimposed on the grey-scaled images, and the vectors are used to predict the amount and the direction of heat deposition.
Another growing application of registration is in recognition of face, lips and feelings using motion estimation. A significant amount of effort has been put on sign language recognition. The motion and the position of the hand and the fingers are estimated, and patterns are used to recognize the words and the meanings (see Figure 1.3), an application particularly useful for deaf-mute people [13].
Figure 1.3 Steps of hand and fingers motion estimation
The main problems of medical image data analysis and the application of registration techniques are discussed in details in Chapter 7.
Registration methods find many applications in systems used to increase the security for persons and vehicles. Video processing and motion estimation can be used to protect humans from both active and passive accidents. The tracking of vehicles has many potential applications, including road traffic monitoring (see Figure 1.4), digital rear-view mirror, monitoring of car parks and other high-risk or high-security sites. The benefits are equally wide ranging. Tracking cars in roads could make it easier to detect accidents, potentially cutting down the time it takes for the emergency services to arrive.
Figure 1.4 The output of a tracking system
Another application of motion interpretation in image sequences is a driver assistance system for vehicles driving on the highway. The aim is to develop a digital rear-view mirror. This device could inform the driver when a lane-shift is unsafe. A single camera-based sensor can be used to retrieve information of the vehicle's environment in question. Vehicles driving behind the vehicle in question limit its motion possibilities [14]. Therefore, the application needs to estimate their motion relative to the first car. This problem is illustrated in Figure 1.5.
Figure 1.5 Observing other vehicles with a single camera
Tracking human motion can be useful for security and demographic applications. For example, an intelligent CCTV system (see Figure 1.6(a)) will be able to ‘see’ and ‘understand’ the environment. In shopping centres, it will be able to count the customers and to provide useful information (e.g. number of customers on a particular day or in the last 2 h). Robots can be utilised for indoor applications without the need of prior knowledge of the building being able to move and perform specific operations. This kind of application can be used in hospitals or offices where the passages can be easily identified (see Figure 1.6(b)).
Figure 1.6 (a) Intelligent CCTV system, (b) 3D indoor visual system
Visual tracking is an interesting area of computer vision with many practical applications. There are good reasons to track a wide variety of objects, including aeroplanes, missiles, vehicles, people, animals and microorganisms. While tracking single objects alone in images has received considerable attention, tracking multiple objects simultaneously is both more useful and more problematic. It is more useful since the objects to be tracked often exist in close proximity to other similar objects. It is more problematic since the objects of interest can touch, occlude and interact with each other; they can also enter and leave the scene.
The above-mentioned applications are based on machine vision technology, which utilises an imaging system and a computer to analyse the sequences and take decisions. There are two basic types of machine vision applications – inspection and control. In inspection applications, the machine vision optics and the imaging system enable the processor to ‘see’ objects precisely and thus make valid decisions about which parts pass and which parts must be scrapped. In control applications, sophisticated optics and software are used to direct the manufacturing process.
As it was shown in these examples, object tracking is an important task within the field of computer vision. In Chapter 3, the concept of optical flow for tracking and activity recognition are analysed presenting the related registration methodologies.
Considering a security system, another important application is face tracking and recognition. It is a biometric system for automatically identifying and verifying a person from a video sequence or frame. It is now very common to find security cameras in airports, offices, banks, ATMs and universities and in any place with an installed security system. Face recognition should be able to detect initially a face in an image. Then features are extracted that are used to recognize the human, taking into account factors such as lighting, expression, ageing, illumination, transformation and pose. Registration methods are used especially for face alignment tasks and they are presented in Chapter 4 highlighting the main problems, the different approaches and metrics for the evaluation of these tasks.
Military applications are probably one of the largest areas for computer vision, even though only a small part of the work is open to the public. The obvious examples are detection of enemy soldiers or vehicles and guidance of missiles to a designated target. More advanced systems for missile guidance send the missile to an area rather than a specific target, and target selection is made when the missile reaches the area based on locally acquired image data. Modern military concepts, such as ‘battlefield awareness’, imply that various sensors, including image sensors, provide a rich set of information about a combat scene, which can be used to support strategic decisions. In this case, automatic data processing is used to reduce complexity and to fuse information from multiple sensors increasing reliability.
Night and heat vision can be used by police to hunt a criminal. Motion estimation techniques and object tracking systems use these vision systems to obtain better performance at night and in cold areas. Torpedoes, bombs and missiles use motion estimation to find and follow a target. Also, motion estimation is utilised by aeroplanes, ships and submarines along with object tracking systems.
Satellite object tracking is one of the military applications with the most research programmes. In this case, the systems track objects such as vehicles, trains, aeroplanes or ships. The main problem in these applications is the low quality of the pictures. Therefore, these practical problems can be reduced when efficient algorithms are used.
Satellite high-resolution images can also be used to produce maps and 3D visualization of territory status of a city (see Figure 1.7). In this case, image alignment is essential to obtain accurate maps and visualizations; therefore, sub-pixel registration methods are required. Also, due to the size of the captured data, fast approaches are crucial for this type of military and satellite applications. In Chapter 5, a detailed analysis of satellite image registration is presented. Methods for evaluating the performance of these techniques in the context of achieving specific performance goals are also described.
Figure 1.7 (a). Fragment of the satellite image, (b) 3D visualization of the part of the city, obtained as the result of the high-resolution satellite image processing. Source:http://commons.wikimedia.org/wiki/File:Pentagon-USGS-highres-cc.jpg
A fundamental task in computer vision is the one referred to as image-based 3D reconstruction for the creation of 3D models of a real scene from 2D images. Three-dimensional digital models are used in applications such as animation, visualization and navigation. One of the most common techniques for the image-based 3D reconstruction is structure from motion due to its conceptual simplicity. Structure from motion is a mechanism of simultaneously estimating the 3D geometry of a scene (structure) and the camera location (3D motion). Structure from motion is applied, for example, to reconstruct 3D archaeological building and statues using 2D images (see Figure 1.8).
Figure 1.8 The 3D shape of a monument reconstructed using a sequence of 2D images
In these approaches, the first step is to find the correspondence of sparse features among consecutive images using feature extraction and matching techniques. In the second step, structure from motion is applied in order to obtain the 3D shape and the motion from the camera. In the final step, both reconstruction and motion are adjusted and refined. Structure from motion is also used in different scenarios, such us autonomous navigation and guidance, augmented reality, hand/eyes motion capture, calibration, remote sensing and segmentation.
In Chapter 6, methods for obtaining the 3D shape of an object or an area using motion information and registration techniques are presented. Also registration methods for panoramic view for digital cameras and robotics are described focusing also on issues related to performance and computational complexity.
This book will provide an analysis on registration methodologies and quality metrics covering the most important research areas and applications. It is organized as follows:
Chapter
1
An introduction to the concepts of image and video registration is presented including examples of the related applications. An historical overview on image registration and the fundamentals on quality assessment metrics are also analysed.
Chapter
2
An overview of block-matching motion estimation methods is presented including traditional methods such as full search, tree steps, diamond and other state-of-the-art approaches. The same structure is used for hierarchical and shape-adaptive methods. The concepts of quality of system (QoS) and quality of experience (QoE) are discussed, and how image and video quality is influenced by the registration techniques is analysed. Quality metrics are presented focusing on coding applications.
Chapter
3
