139,99 €
For several decades researchers have tried to construct perception systems based on the registration data from video cameras. This work has produced various tools that have made recent advances possible in this area. Part 1 of this book deals with the problem of the calibration and auto-calibration of video captures. Part 2 is essentially concerned with the estimation of the relative object/capture position when a priori information is introduced (the CAD model of the object). Finally, Part 3 discusses the inference of density information and the shape recognition in images.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 375
Veröffentlichungsjahr: 2013
Table of Contents
Introduction
Part 1
Chapter 1. Calibration of Vision Sensors
1.1. Introduction
1.2. General formulation of the problem of calibration
1.3. Linear approach
1.4. Non-linear photogrammetric approach
1.5. Results of experimentation
1.6. Bibliography
Chapter 2. Self-Calibration of Video Sensors
2.1. Introduction
2.2. Reminder and notation
2.3. Huang-Faugeras constraints and Trivedi’s equations
2.4. Kruppa equations
2.5. Implementation
2.6. Experimental results
2.7. Conclusion
2.8. Acknowledgement
2.9. Bibliography
Chapter 3. Specific Displacements for Self-calibration
3.1. Introduction: interest to resort to specific movements
3.2. Modeling: parametrization of specific models
3.3. Self-calibration of a camera
3.4. Perception of depth
3.5. Estimating a specific model on real data
3.6. Conclusion
3.7. Bibliography
Part 2
Chapter 4. Localization Tools
4.1. Introduction
4.2. Geometric modeling of a video camera
4.3. Localization of a voluminous object by monocular vision
4.4. Localization of a voluminous object by multi-ocular vision
4.5. Localization of an articulated object
4.6. Hand-eye calibration
4.7. Initialization methods
4.8. Analytical calculations of localization errors
4.9. Conclusion
4.10. Bibliography
Part 3
Chapter 5. Reconstruction of 3D Scenes from Multiple Views
5.1. Introduction
5.2. Geometry relating to the acquisition of multiple images
5.3. Matching
5.4. 3D reconstruction
5.5. 3D modeling
5.6. Examples of applications
5.7. Conclusion
5.8. Bibliography
Chapter 6. 3D Reconstruction by Active Dynamic Vision
6.1. Introduction: active vision
6.2. Reconstruction of 3D primitives
6.3. Reconstruction of a complete scene
6.4. Results
6.5. Conclusion
6.6. Appendix: calculation of the interaction matrix
6.7. Bibliography
Part 4
Chapter 7. Shape Recognition in Images
7.1. Introduction
7.2. State of the art
7.3. Principle of local quasi-invariants
7.4. Photometric approach
7.5. Geometric approach
7.6. Indexing of images
7.7. Conclusion
7.8. Bibliography
List of Authors
Index
First published in France in 2003 by Hermes Science/Lavoisier entitled Perception visuelle par imagerie video © LAVOISIER, 2003
First published in Great Britain and the United States in 2009 by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd27-37 St George’s Road London SW19 4EU UKwww.iste.co.ukJohn Wiley & Sons, Inc. 111 River Street Hoboken, NJ 07030 USAwww.wiley.com© ISTE Ltd, 2009
The rights of Michel Dhome to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Cataloging-in-Publication Data
[Perception visuelle par imagerie video. English] Visual perception through video imagery / edited by Michel Dhome.
p. cm.
Includes index.
ISBN 978-1-84821-016-5
1. Computer vision. 2. Visual perception. 3. Vision. I. Dhome, Michel.
TA1634.P4513 2007
006.3'7--dc22
2007028789
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN: 978-1-84821-016-5
Artificial vision with a main objective of automatic perception and interpretation of the universe observed by a system containing one or several cameras is a relatively new field of investigation. It leads to a surprisingly large range of problems, and most of these are not currently resolved in a reliable way. Although a general theory is not close to being reached, significant progress has been made recently, theoretically as well as methodologically.
In the visible world, images of luminance are the result of two physical processes: the first one is linked to reflectance properties of the surface of observed objects, while the second one corresponds to the projection of these same objects on the light sensitive plate of the sensor used. From a mathematical standpoint, in order to interpret the observed scene, we must solve an inverse problem, i.e. infer the surface geometry (3D) of objects present, from the purely 2D content of the image or from logged images.
This reputedly complex problem in the context of computer vision is solved by man with surprising ease. However, the human vision system operation is clearly not founded on a single concept. Examining the implemented processes during short or long distance vision is sufficient proof. In the first case, the existing disparity between left and right retinal images makes it possible for man to obtain indepth information by triangulation (stereoscopy) relating to its close environment which is vital in particular to manually capture objects. In the second case, when looking at long distance, or even more so when contemplating a picture, stereoscopy is obviously no help in interpreting the observed scene. Even under these conditions, however (total lack of direct 3D information), man is able to estimate the form and spatial position of objects he observes in the vast majority of cases. This requires mental processes, from 2D information extracted from a luminance image, able to infer 3D information. These are based on the unconscious use of prior knowledge relating to the principle of retinal image composition and the form of 3D objects surrounding us. The surprising capabilities of the human vision system are because this knowledge is continuously enhanced from early childhood.
In the last few decades, researchers in the artificial vision community have attempted to develop perception systems that would work from data emanating from video cameras. This book presents a few tools emerging from recent advances in the field.
In Part 1, the reader will find three chapters dedicated to calibration or self-calibration of video sensors. Chapter 1 presents a finite estimation approach of the intrinsic parameters of a video camera, greatly inspired by the world of photogrammetry. It is based on the interpretation of images from a calibration test chart which is not generally known with great precision. Chapter 2 addresses the complex self-calibration problem from a series of matching points between different images from a single scene. The recommended method is based on an elegant and simple decomposition of Kruppa equations. Chapter 3 explores the self-calibration problem of cameras with specific movements making the implementation of simplified development of the main matrix possible.
Part 2 mainly involves the estimation of the relative object/sensor position by introducing prior knowledge (CAD object model). The reader will discover how to treat the localization problem of a rigid object observed by a monocular system. The formalism presented is then extended to understand cases as different as multi-ocular localization and hand-eye calibration, and research on the posture of articulated objects such as robotic arms.
Part 3 addresses volume information inference in two chapters. Chapter 5 discusses the reconstruction problem of a fixed scene observed by a multi-ocular system. The notions homo-log points, epipolar geometry, fundamental matrix, essential matrix and trifocal tensor are first introduced as well as different approaches for obtaining these entities. The problem of dense matching between image pairs is addressed before a few reconstruction examples are shown. Chapter 6 discusses the notion of active dynamic vision. It is the control of the path of a camera embedded in a robotic arm in order to reconstruct the surrounding scene. An underlying problem involves the definition of optimal movements necessary for the reconstruction of different primitives (points, straight lines, cylinders). Finally, perception strategies are proposed in order to ensure a complete reconstruction taking inter-object blanking into consideration.
The recognition of forms in images is the heart of Part 4. Chapter 7 is dedicated to proposing tools for the identification in a database of the images containing visual elements identical to those contained in a request image; the differences of acquisition between images which could involve the point of view, conditions of illumination and global composition of the scene observed. The methods presented are based on a common principle: the use of quasi-invariants associated with local descriptors. These are tested against large image databases.
Calibration of a vision system consists of determining a mathematical relation existing between the three-dimensional (3D) coordinates of the reference markers of a scene and the two-dimensional (2D) coordinates of these same reference markers projected and detected in an image.
Determining this relation is an uphill task in vision, particularly for reconstruction, where it is necessary to infer 3D information from data extracted from the 2D image. In reality, the field of application is broader and calibration proves to be essential since it is necessary to establish a relation between an image and the 3D world: recognition and localization of objects, dimensional control of parts, and reconstruction of the environment for the navigation of a robot.
A complete analysis of the calibration of a vision system must take into account all the photometric, optic and electronic phenomena present in the image acquisition chain.
In general, a system that enables the calibration of a camera is made up of:
– a calibration test card (grid or standard object), generally consisting of reference markers from which 3D coordinates are accurately inferred in its local reference coordinate system;
– an image acquisition system for the digitization and storage of test card images;
– an algorithm of correlating 2D reference marker detected in the images with their counterparts of the test card;
– a calculation algorithm of transformation matrix perspective of the camera, relating the calibration test card mark with that of the image.
Generally, the problem of calibrating CCD cameras unfolds in two ways, namely, the geometric study of calibration (calculation of the projection matrix) and a radiometric calibration (uniformity of brightness in an image). The first problem is widely covered, whereas the latter is less studied.
In this chapter, we will approach the problem of geometric calibration of a video sensor in a didactic way and will solve it first using linear methods and then using non-linear methods. Often, written works talk about weak or strong calibration; the distinction is at the level of an overall estimate of the projection matrix (weak case) or in each parameter estimation which forms this matrix (strong case). The continuation of this work will deal with the problem of strong calibration of vision sensors.
The process of calibration is a monitored process often requiring the operator’s attention. This is a static process, which is carried out offline, before using the camera for a precise visual task. Once the camera is calibrated, its parameters must remain fixed throughout its use. Each time we wish to modify the focus, the focal distance, and even the opening of lens, the camera will have to be recalibrated. Many works in the last 20 years have made it possible to obtain fairly complex and precise methods for assimilating a vision system in a metrological collection. The reader can refer to the following link for an exhaustive view of the major publications in the field: http://iris.usc.edu/Vision-Notes/bibliography.
Let us consider an image acquisition system (Figure 1.1).
Calibration of the system consists of determining the transformation (R3, R2), which makes it possible to analytically express the process of image formation.
Figure 1.1.System of acquiring images
Figure 1.2.Geometry of the image formation system
Traditional geometric optics prefers the use of models with thick or thin lenses. The difficulty of expressing vergency constraints in a simpler way compels us to resort to the pin-hole model in which all rays pass through the same point (optical center). The photosensitive cell (image plane) is located at a distance f from this center and represents the focal distance or selection of the objective.
Let us note that the image obtained is normally inverted when compared to a naked eye view. To overcome this problem, we artificially place the image plane in front of the optical center (from the physics point of view, this artifice is carried out by reading the CCD matrix in such a manner as to obtain the inversion of the image).
Figure 1.3.Image plane defined in front of the optic center
In works on this subject, we find several types of image projection: orthographic, scale orthographic, para-perspective and perspective. It is this last family of projection that will capture our attention due to it being the best suited in the physical reality of vision sensors.
Figure 1.4.Notation of the different reference marks
In this section, we will use the following notations:
– Rc: camera reference (−z axis meeting the optical axis);
– Rw : reference marker related to object modeling;
– (0, u, v): image reference (considering the effect of digitization).
This first point change makes it possible to express the coordinates of the test card or the reference instrument (positioned in the surroundings) in the reference point relating to the camera. As we can see, this point change is expressed by a transformation made up of a rotation and a translation.
(1.1)
(1.2)
with
– Pw 3D coordinates of a test card reference point, indicated in the modeling reference;
– Pc coordinates of the same 3D point, indicated in the camera reference point.
The rotation and translation matrices (R(3×3) and T(3×1)) are defined in the camera reference marker. Coupling of (R, T) in the same matrix notation makes it necessary to use a notation in homogenous coordinates.
Changing from the camera point to the image point is related to perspective projection equations.
Traditionally these equations are of the form:
(1.3)
In homogenous coordinates, the system is written as:
(1.4)
It must be noted that while changing (R3, R2), homogenous notations introduce a multiplicative factor s.
Demonstration
(1.5)
by substituting s:
(1.6)
Remark. It is necessary to include the step difference, according to coordinates x and y, relative on the one hand to the elementary pixel form on the CCD
(1.7)
matrix and on the other to the rhythmic sampling of the video signal.
This leads to traditional equations for changing the point (camera coordinate pixel to pixel coordinate):
(1.8)
– (u0, v0) represent coordinates (in pixel) in the image, intersection of the optical axis and the image plane (origin of reference point change);
– (dx, dy) are respectively the dimensions according to x and y of an elementary pixel of the CCD matrix (see Figure 1.5).
The complete system of image formation is thus expressed by the following relation:
(1.9)
where:
– (Xw, Yw, Zw) are the 3D coordinates of the calibration point pertaining to the reference instrument;
– (u, v) are the 2D pixel coordinates in the projection image of this point;
– (u0, v0, f, dx, dy) are called the intrinsic parameters of calibration. They belong to the system of acquisition;
Figure 1.5.Coordinate system on the CCD matrix
– (R(11,…,33), T(x,y,z)) are called the extrinsic parameters of calibration. They provide the localization of the reference instrument in the camera reference point, while shooting a picture.
(1.10)
M is called the calibrating matrix of the system.
It contains 12 elements, which can be divided into:
– 5 intrinsic parameters pertaining to the camera. Generally, we use the Mint matrix in the form:
(1.11)
By showing:
(1.12)
The dx/dy ratio represents the pixel ratio; at known dx (supplied by the camera maker), the intrinsic parameter estimation is reduced to the calculation of 4 parameters (fx, fy, u0, v0).
– 12 independent extrinsic parameters (9 for rotation (R11 … R33) and 3 for translation (T(x,y,z))) which are independent of the camera.
In other words, there are a total of 16 parameters.
To calibrate a vision system is to be capable of determining all the parameters intervening in the analytical expression of the image formation illustrated in (1.9).
Problem. Given n number of 3D-2D pairs (Xw, Yw, Zw; u, v) (expression (1.10)) between a test card and its image, determine the 16 parameters of image formation.
Resolving the problem of calibration by the linear method has the advantage of not requiring initial values of calibration parameters to find a solution. We will see that this is expensive in practice and that the performances of such an approach are limited. Faugeras and Toscani [FAU 87] proposed this method at the beginning of the 1980s. In the 1940s, a similar approach known as DLT (Direct Linear Transform) was introduced by the photogrammetric community.
Let the 3D-2D projection between a reference mark and its image be:
(1.13)
where (m11… m34) are the 12 unknown elements of the system to be solved.
By substituting s, we obtain:
(1.14)
using the following equations drawn from matrix expression (1.9):
It is possible to rewrite the system to solve (1.14) as:
(1.15)
Index i represents the pairing between the 3D standard reference marker and the 2D reference marker detected in the image. Each pair paves the way to write 2 equations, the minimum number necessary for solving the problem thus being 6 pairs.
By normalizing m3(1,2,3), we obtain the third vector of rotation matrix R3(1,2,3) :
(1.16)
(1.17)
(1.18)
(1.19)
(1.20)
(1.21)
(1.22)
Thus, the 16 parameters of image formation are completely determined.
This method is very easy to implement. Solving a linear system is inexpensive in computing times. However, the results obtained are not very stable. The stability of a method refers to its aptitude to give similar results (on the intrinsic parameters) for different behaviors of the test card.
Let us note that the preceding division allows us to foresee a problem on the estimation level of rotation R. Indeed, we are never certain that:
(1.23)
In other words, it is never certain that the rotation matrix R is really conditioned like a rotation matrix. Therefore, a palliative solution consists of modifying the perspective projection matrix to add an extra term to it, which can be interpreted like the non-orthogonality of the optical axis as compared to the CCD sensor, which makes it possible to ensure the orthogonality of the three rotation vectors.
In fact, in the linear approach, the rotation matrix is not parametrized in a satisfactory manner. It will be advisable to use its division according to Euler’s angles (α, β, γ), respectively, around the axes (x, y, z) of the camera point. In these conditions, the results are that the system to be solved is no longer linear. Moreover, no phenomenon of optical distortion is taken into account in the process of image formation.
Nevertheless, this method obtains a very good initial estimation in the process of non-linear optimization.
In this section, we approach the problem of calibration of CCD cameras based on the formalism used in photogrammetry. The projection model used for the process of image formation refers to the pin-hole optical model, which is the approximation of the optical model of thin lens. This approach is different from the preceding one due to a precise modeling of the optical distortion phenomena caused on the surface of the lenses and also due to the implementation of a non-linear optimization process minimizing a criterion of reprojection of the reference marker in images which is expressed in pixels.
The notation conventions in photogrammetry are slightly different from those used in artificial vision. Also, traditionally, the data are no longer expressed in the camera reference marker but in the reference frame of the world, which remains fixed irrespective of the position of the camera. We will thus transform the usual photo-grammetric notations in such a manner to determine the formalism belonging to our community.
Let us consider the following notations:
– RW − XY Z is a direct 3D point. It is the reference frame of the world, which will also be used as the modeling base of the object.
– o − uv is the 2D image reference frame, whose origin is located at the top left corner of the image.
– RC − xyz is the 3D reference frame of the camera, whose origin is at the optic center c and whose z axis is confused with the optic axis. x, y are respectively parallel to o − uv.
The intrinsic parameters in the sensor to be determined are: the principal point o − u0v0, the focal distance f, the pixel size of the CCD matrix (dx, dy) or their ratio and, finally, the optical distortion parameters introduced by the camera lens.
Figure 1.6.Pin-hole model, image geometry and coordinate systems
The extrinsic parameters are the rotation matrix R as well as the translation vector T between Rc − xyz and Rw − XY Z.
The method described in this chapter strictly follows the least squares approach; we will try to minimize the measuring errors in the image, representing the difference between a point detected in the image and a projected 3D point in the corresponding test card.
Let there be a perspective projection between a 2D image and a 3D object (assumed to be pin-hole). The relation between a point and its projection in the image is described by the following expression:
(1.24)
where:
– (xi, yi, zi) is an image point defined in the camera reference point (see Figure 1.6 with , i.e., the focal distance of the camera);
– λi is a scaling factor introduced while changing from R3 to R2;
– (Xi, Yi, Zi) are the coordinates of the test card point defined in the reference frame of the surrounding W − XY Z;
– (Tx, Ty, Tz) is the translation vector;
– R is the rotation matrix expressed in the camera point and parametrized according to Euler’s three angles: α rotation around x axis, β around y axis, and γ around z axis:
(1.25)
By eliminating λi in (1.24) and by removing the index i, we obtain the following expressions called the colinearity equations in photogrammetry:
(1.26)
If we express (x, y) in the pixel coordinate system of the image, we obtain:
(1.27)
(1.28)
(1.29)
where in expressions (1.27), (1.28) and (1.29):
– u, v are the image coordinates in the image reference point;
– u0, v0 are the coordinates of the principal point in the referential image;
– a1, a2, a3 are the polynomial coefficients that model the radial distortion;
– p1, p2, p3 are the polynomial coefficients that model the tangential distortion;
– dx, dy represent the scale factors of the elementary pixel form;
– parameter , is the radial distance from the principal point. Since r can take significant values (based on the size of the image), r4 and r6 sometimes become enormous; expression (1.28) can then lead to a numerical instability during the estimation of the different parameters. A means of circumventing this difficulty is to rewrite this expression in the following way:
(1.30)
which assumes that the distortion is zero for a radial distance r0.
By substituting (1.27), (1.28) and (1.29) in (1.26), and by showing and , we obtain the following system:
(1.31)
where in (1.31), Φ is a vector of 15 parameters, i.e.:
Let us again consider the colinearity equations defined in expression (1.31):
(1.32)
Presently, the problem is to determine the value of Φ, which reduces:
In (1.32), P (Φ) and Q (Φ) are non-linear functions of Φ and therefore minimization is a non-linear optimization problem.
A method to solve this problem is to make a linearization of (1.32) from an initial value Φ0 (generally provided by the results of the linear resolution of the problem of calibration described in section 1.3) and calculate a correction ΔΦ. Then, we add ΔΦ to Φ0, which becomes the new initial value: the process must be repeated until the convergence of the system is obtained.
Let there be n 3D reference markers and their corresponding reference markers in the image; we can write the 2 × n linearized equation system in matrix form:
(1.33)
(1.34)
Thus, L represents the common value of the criterion and A represents the Jacobian matrix of the system, around the current vector Φ0.
Let the weighting matrix of measures be1W then the resolution in the context of least squares of (1.34) amounts to estimating:
(1.35)
The solution of (1.35) is given by:
(1.36)
By replacing V from its expression in (1.34), the above equation becomes:
which leads to the solution of ΔΦ:
Error in measurements is one of the main causes for obtaining bad results in calibration (which are in the image as well as on the test card). To mitigate this problem, it is possible to combine, in the same system, several images coming from the same camera but for different spatial positions (rotation and/or translation). In this case, the intrinsic parameters of the sensor are the same for all images and calibration estimates the following vector of parameters:
The matrix A of (1.34) is therefore of the form:
The main idea comes from the following observation: quality test cards of calibration are difficult to achieve: the mechanical stability of the set for a precision < 0.05 mm is obtained only with specific materials and a precise measurement of 3D reference markers used is expensive. Moreover, the variability of the angular fields based on applications leads to the usage of different test cards adapted to experimental conditions.
Self-calibration by bundle adjustment is a multi-image approach, which jointly allows a re-estimation of the three-dimensional structure of the test card and an estimation of the intrinsic and extrinsic traditional parameters of the sensor. In other words, we calibrate the camera and reconstruct the test card simultaneously.
Let the colinearity equations be:
(1.37)
To simplify writing, we removed the indices, without loss of generality, but the colinearity expressions go well with all the reference markers (Xi, Yi, Zi) of n reference markers of the test card projecting itself in (vj, vj) in m images.
If we wish to calibrate the sensor and to calculate the coordinates of the reference marker of the test card, then, the parameter vectors to be estimated take the following form:
Problem. Find Φ, which reduces
Number of unknown elements
Number of equations
2*m*n.
Extrinsic parameters. From the time we estimate, within the process, the three-dimensional coordinates of the reference marker of the test card, the extrinsic geometry of the system is also fixed to a near scale factor. Indeed, it is always possible to find a more voluminous test card observed from afar, which would strictly give the same image.
This metric loss is not of much importance for the calibration of a simple camera where only the intrinsic parameters represent an interest for the user. Nevertheless, to facilitate convergence, two reference markers of the test card will not be reconstructed and will impose the metric system. We will also impose that one of the coordinates of an unspecified point among n remains fixed to solidify the total extrinsic geometry of the reconstructed scene.
It is obvious that the optimization of such a non-linear system requires initial conditions in the field of convergence. In the experimental part, we will show that this constraint is resolved without difficulties from the time of observing the test card under different orientation behaviors. This amounts to ensuring the significant angles of triangulation for the estimation of test card reference marker.
The values of each term of the initial vector of calibration can be provided with the help of the linear approach of calibration or more simply from the data creator for intrinsic parameters and from a localization algorithm such as that of Dementhon [DEM 95] for extrinsic parameters.
From an estimate in the context of least squares (1.34) and (1.35), it is possible to calculate an estimate of the residual vector V:
(1.38)
as well as the estimate of the standard error of unit weight, which represents an estimate a posteriori of σ0 (scalar) noise on the reference marker detected in the image.
(1.39)
where P is the total number of estimated parameters. The value of the covariance matrix associated with Φ parameters is given by: ()
(1.40)
Thus, for each parameter Øi, it is possible to calculate its detection precision (standard deviation):
(1.41)
A method for measuring the reliability (quality) of an adjustment in the context of least squares is to calculate the relative redundancy of the system [TOR 81], i.e.:
(1.42)
where in (1.39) and (1.41), r represents redundancy, N is the total number of measurement equations and cii is the ith diagonal element of covariance matrix CΦ. As we can observe, the relative redundancy for a multi-image calibration is much more significant than that for a simple calibration. It results in a greater reliability in the multi-image calibration results, , when compared to simple-image resolution, . If m > 1, then qmulti> qsingle.
This first part presents an example of calibration on a 1/2" camera equipped with a 10 mm lens.
For this experiment, the polynomial distortion coefficients (a1, a2, a3, p1, p2) are initialized at zero. The test card (Xi, Yi, Zi, i ∈ [1, n]) is roughly measured and point coordinates are truncated to several centimeters; let us note that the multi-image approach makes it possible to use planar test cards, which largely facilitates their manufacture and in obtaining a rough model. The relative initial camera/test card positions (Rj, Tj, j ∈ [1, m]) are estimated by Dementhon’s algorithm [DEM 95] applied to plane objects.
The position of the principal point (u0, v0) is placed in the middle of the image and the focal distance (fx, fy) is roughly estimated by the knowledge of the focal distance of the lens (10 mm, for example) and the elementary pixel size of the CCD matrix (ranging from 9 to 15 μm according to sensors).
The series of analyzed images must obligatorily form a beam of converging views in order to enforce the angles of triangulation to obtain reconstruction of the test card. Finally, the approach is based on an accurate detection of test card reference marker (see [LAV 98]).
The images presented in Figure 1.7 show a sample of 15 photoshots taken to calibrate a camera equipped with a 10 mm lens. The test card is obtained from a plane plate and is equipped with retroreflective chips. The photographic device integrates annular high frequency lighting, which makes it possible to obtain good quality images of the chips used, irrespective of the observation angle of the object.
Figure 1.7.Partial sequence of 15 photos taken for self-calibration (768 × 576 pixels)
In order to highlight the convergence of the algorithm, we deliberately chose to leave the solution for the internal parameters of focal distance (1,500 pixels instead of 1,000), i.e., with an error of 50%. In some iterations, the algorithm is stabilized around a minimum, which leads to residues of about 0.025 pixels on each coordinate (see Table 1.1).
The curves in Figure 1.8 show the convergence of fx, u0 and v0; the minimum expression (1.35) will be attained at the end of 12 iterations.
Remarks:
Table 1.1.Bundle adjustment: Jai M10 camera and 10 mm lens
Figure 1.8.Convergence of fx, u0, v0
Figure 1.9.Distortion according to radius (a) and residues in convergence (b) (enlarged 1,000 times) of a set of measuring reference markers (768 × 576 pixel format)
– In the solution, we recalled the residues of convergence (ex, ey) corresponding to expression (1.36). Each cross represents a point of measurement resulting from one of the m images and the associated residue vector undergoes a multiplicative factor of 1,000 in Figure 1.9b. Homogenous distribution of residue vectors and their random orientation translates the non-correlation of errors in solution.
Flexibility in use. The use of the approach described in this chapter proves very flexible and enables calibrations with flexible test cards based on the focal distance of the lens used. The results of the residues obtained in comparison with traditional approaches using a test card measured with precision testify the reliability of the algorithm.
This section deals with the specific case of fish-eye lenses. The principal deformation generated by a lens with short focal length is a radial deformation. The more the lens presents a small focal distance, the broader its angular field of observation. Rays converging on the CCD matrix have an increasingly significant incidental angle on the very important front diopter of the lens and consequently move away from the assumption of paraxial optics. Therefore, the curvature radii of the lenses induce a dominating radial phenomenon.
The strategy to model the strong deformations is intuitive; it consists of increasing the order of polynomial distortion. For considering strong radial distortions, it is necessary to have the polynomial order at 5. On the other hand, several writings of the criterion can be considered.
Formula (1.28), which was previously explained, changes into:
(1.43)
The distortion polynomial will be composed of five terms instead of three. As Li highlights it in [LI 94], since r can take a maximum value of (where L represents the size of the image), r2 … r10 can take very high values and expression (1.43) can then become numerically unstable.
To resolve this disadvantage, it is possible to rewrite expression (1.43) in the following way:
(1.44)
or further:
(1.45)
Therefore, expression (1.45) forces the distortion to take a zero value for a fixed radial distance r0. In this equation, there is a variation of the focal distance in solution, which, in the first approximation, will be equal to: .
Nevertheless, [LAV 00a] shows that this does not solve any aspect of the fact that the coefficients have numerically very low values when compared to parameters of focal distance, for example. This aspect can, however, be taken into account by a third expression of the criterion.
Normalization consists of compensating the significant values of rn by normalizing all distances as compared to the focal length f:
It becomes:
or even, in distortion expression:
(1.46)
which we can rewrite by putting f in factor:
(1.47)
This new expression shows a form, which is entirely similar to that defined in (1.28), as compared to the near focal length. Then, the total criterion takes the following form:
(1.48)
Figure 1.10.Example of photoshots (format: 768 × 576 pixels)
The interest of this rewriting lies in the fact that any significant value of (r2i) in the distortion polynomial is compensated by (f2i). We will see in the experimental part that the values of (ai) will remain close to the unit.
Part of the test sequence to calibrate is presented in Figure 1.10. We can note that we are in the presence of a very strong radial distortion. An exhaustive comparison of the behavior of these three criteria is analyzed in [LAV 00a] and the results presented in Table 1.2 implement the normalized equation of colinearity expressions.
General notes
– The convergence of the calibration algorithm requires some precautions to calibrate a sensor displaying such a distortion. By releasing a set of parameters from the first iteration, the algorithm systematically diverges choosing a solution, which consists of sending the object to infinity. This is particularly true as we move far from the final solution since the coefficients of radial and tangential distortion are initialized at zero.
– To enforce the parameters, we adopt an approach, which blocks the focal distance and the principal point (fx, fy, u0, v0) as long as the average criterion is not passed under a threshold, which is fixed here at 0.6 pixels. Thus, the algorithm will initially estimate distortions, localizations and the geometry of the test card, and then optimize all the parameters as soon as the threshold is crossed. This parameter blocking comes within a Levenberg-Marquardt procedure of optimization while acting on the derivatives of the system (starting value of fx, fy fixed at 400 pixels or 40% of the solution).
Table 1.2.Fish-eye calibration. Normalized criterion
Comments
– We observe the harmonization of the order of magnitude of radial distortion coefficients in Table 1.2. The variation between the terms does not exceed a factor of 10.
– The distortion in the image edge is higher than 200 pixels (Figure 1.11).
– The decrease of the criterion is carried out without much difficulty. From 0.6 pixels, the 185 parameters are estimated at the same time. All decreasing curves present the same “break” in the vicinity of the solution (0.25 pix) before
Figure 1.11.Convergence and distortion of the normalized criterion
dropping towards the final solution; this rupture is not because of the release of parameter in our part, but due to the choice of the Levenberg-Marquardt strategy of convergence.
– The residues in solution are about 0.05 pixels.
– In Figure 1.12, let us note the presence of some outliers (important standard of vector), which can be removed by simple filtering on the values of residues.
Figure 1.12.Residues in convergence (×1,000)
The calibration of lenses with short focal length is based on three major reference markers: a precise detection of markings in images, an adapted optimization criterion and a convergence strategy, which will gradually release the parameters to avoid the local minimum pitfall.
The approach presented on self-calibration by bundle adjustment offers the advantage of quickly obtaining a test card adapted to these specific lenses. Of the three criteria given, two hold our attention more particularly: the second, which consists of fixing a zero distortion at a distance r0 from the principal point, and the third, which introduces a normalization of the distortion coefficients. In terms of residues in solution and for a constant size of the image, the two criteria are appreciably equivalent; the second offers the advantage of maintaining a practically constant size of the image before and after correction of distortion. As for the third, it offers a more favorable numerical conditioning.
The writing of a new hybrid criterion considering the advantages of the last two must bring a satisfactory solution to this problem. Figure 1.13 shows a sensor view before and after compensation of the distortion phenomena. The size of the final image is increased by 400 pixels in line and column, and leads to a real observation field of the lens of 120 degrees.
Figure 1.13.Initial image (format: 768 × 576 pixels) and corrected image (format: 1,168 × 976 pixels)
This section deals with certain concepts related to the calibration of underwater cameras. [LAV 00b] shows the passage of links on the change of focal distance and the modification of distortion curves between the use of video sensor in air and in water. It is then possible to calibrate the sensor in air and to foresee its operation in a medium of unspecified index.
The following laws must be verified for water:
1) When the camera is plunged in water, we must observe a multiplicative factor of 1.333 on the focal distance value measured in air:
(1.49)
2) Let u be the distorted image of a point in air and du be the distortion correction made to obtain a perfect projection point.
Let u′ be the distorted image of the same point in water and du′ the distortion correction made to obtain a perfect projection point, then:
(1.50)
The experimental part consists of calibrating an underwater camera in air and then in water and of analyzing the calibration results taking into account the theoretical relations expressed in the preceding section.
We use an experimental underwater camera made with a Sony CCD sensor. The entire optical device is known (index, dimension and localization of each diopter).
We calibrated the sensor from 12 images presented in Figure 1.14. The effect of radial distortion is significant and the black circle visible at the edge of the image comes from the increase in the angular field of the camera in air. It will disappear when the latter is immersed in water. From these experiments we come to the following conclusions:
– We used an expression of the radial distortion in normalized writing, for a polynomial of order 5 (see section 1.5.2.3).
Figure 1.14.Calibration in air (768 × 576 pixels)
– Radial distortion is represented in Figure 1.15a. It takes huge values and beyond 320 pixels (black circle) it no longer corresponds to the physical measurements taken in the images. To represent the compensated view of distortion (Figure 1.15b), we increased the size of the image of 400 pixels (rows and columns). This image corresponds to the sixth view of the sequence (Figure 1.14).
Similarly, we calibrated the underwater camera in water with the help of an analysis of 12 views represented in Figure 1.16. As we emphasized before, the black circle at the edge of the image disappeared and this tends to show that there is a modification of the internal behavior of the sensor during index change. Let us finally note that the angular field was strongly reduced and that the distortion at the edge of the image seems less significant than in air.
Table 1.3.Bundle adjustment in the air on an underwater camera
Figure 1.15.Radial distortion and corrected image (1,168 × 976 pixels)
Figure 1.16.Sequence of photoshots for calibration in water (768 × 576 pixels)
Notes
– At convergence, the focal distance of the underwater device was stabilized around 500 pixels (instead of 376), which leads to an angular field in water nearing 90 degrees.
– The curve of radial distortion is plotted in Figure 1.17a. It is considerably less significant than for the experiment in air.
Figure 1.17b presents one of the calibration views after compensation of distortions.
Figure 1.17.Radial distortion and corrected underwater image (968 × 776 pixels)
Table 1.4.Bundle adjustment. Underwater camera
Focal Distance. The theoretical laws of air/water passage are verified in experiments:
– The distance between the nodal point image and the CCD matrix undergoes a multiplicative factor equivalent to the index in which the camera is plunged (see [LAV 00b]).
– Table 1.5 shows the relationship between the focal distances in air and in water. If we integrate the determination uncertainties of focal distances estimated by the process of calibration, this ratio is very close to 1.333.
Table 1.5.Comparison of the focal distance in air and in water
Coordinates of the principal point (U0, V0). The position of the principal point (intersection of the optical axis and CCD matrix) is relatively stable between the two experiments (Table 1.6).
Table 1.6.Comparison of the coordinates of the principal point in air and in water
Distortion. Figure 1.18a shows a joint representation of the distortion curves obtained from the series of measurements in air and water. We can observe the importance of the radial effect during the experimentation in air.
If it is assumed that this distortion is mainly radial, the curves must then verify:
Figure 1.18b shows the prediction of distortion curves in water obtained from data calculated in air. As we can note, the superposition is extremely good. Since the view field in air is larger than in water, prediction of distortion of the edge of image in water corresponds to very precise measurements in air since this part of the image is perfectly visible. This is not the case for images in water because of the experimental difficulty in obtaining correct views of the edges of images there.
It seems obvious that the use of an underwater camera can be freed from in situ
