Omnidirectional Vision -  - E-Book

Omnidirectional Vision E-Book

0,0
142,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Omnidirectional cameras, vision sensors that can capture 360° images, have in recent years had growing success in computer vision, robotics and the entertainment industry. In fact, modern omnidirectional cameras are compact, lightweight and inexpensive, and are thus being integrated in an increasing number of robotic platforms and consumer devices. However, the special format of output data requires tools that are appropriate for camera calibration, signal analysis and image interpretation. This book is divided into six chapters written by world-renowned scholars. In a rigorous yet accessible way, the mathematical foundation of omnidirectional vision is presented, from image geometry and camera calibration to image processing for central and non-central panoramic systems. Special emphasis is given to fisheye cameras and catadioptric systems, which combine mirrors with lenses. The main applications of omnidirectional vision, including 3D scene reconstruction and robot localization and navigation, are also surveyed. Finally, the recent trend towards AI-infused methods (deep learning architectures) and other emerging research directions are discussed.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 368

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.


Ähnliche


Table of Contents

Cover

Table of Contents

Dedication Page

Title Page

Copyright Page

Acknowledgments

List of Acronyms

Preface

P.1. Omnidirectional vision: a historical perspective

P.2. Why this book?

P.3. Organization of the book

P.4. References

1 Image Geometry

1.1. Introduction

1.2. Image formation and point-wise approximation

1.3. Projection and back-projection

1.4. Central and non-central cameras

1.5. “Outer” geometry: calibrated cameras

1.6. “Inner” geometry: images of lines

1.7. Epipolar geometry

1.8. Conclusion

1.9. Acknowledgments

1.10. References

2 Models and Calibration Methods

2.1. Introduction

2.2. Projection models

2.3. Calibration methods

2.4. Conclusion

2.5. References

3 Reconstruction of Environments

3.1. Prerequisites

3.2. Pros and cons for using omnidirectional cameras

3.3. Adapt dense stereo to omnidirectional cameras

3.4. Reconstruction from only one central image

3.5. Reconstruction using stationary non-central camera

3.6. Reconstruction by a moving camera

3.7. Conclusion

3.8. References

4 Catadioptric Processing and Adaptations

4.1. Introduction

4.2. Preliminary concepts

4.3. Adapted image processing by differential calculus on quadratic surfaces

4.4. Adapted image processing by Riemannian geodesic metrics

4.5. Adapted image processing by spherical geodesic distance

4.6. Conclusion

4.7. References

5 Non-Central Sensors and Robot Vision

5.1. Introduction

5.2. Catadioptric sensors: reflector computation

5.3. Plenoptic vision as a unique form of non-central vision

5.4. Conclusion

5.5. References

6 Localization and Navigation with Omnidirectional Images

6.1. Introduction

6.2. Modeling image formation of omnidirectional cameras

6.3. Localization and navigation

6.4. Conclusion

6.5. References

Conclusion and Perspectives

C.1. Epilogue

C.2. Prospects and challenges ahead

C.3. References

List of Authors

Index

End User License Agreement

List of Tables

Chapter 2

Table 2.1. Ad hoc projection models of central catadioptric cameras (summary o...

Table 2.2. Main calibration methods and their characteristics: use of pattern ...

Chapter 4

Table 4.1. Parameters ξ and φ of the equivalence sphere model. e is the eccent...

List of Illustrations

Chapter 1

Figure 1.1. Central catadioptric systems based on planar mirrors

Figure 1.2. A camera, composed of an optical center – in red – and an image pl...

Figure 1.3. Same situation as in Figure 1.2, but with an elliptical mirror. On...

Figure 1.4. Investigating central and non-central catadioptric cameras

Figure 1.5. Illustration of the three-point pose problem

Figure 1.6. Scale ambiguity of relative motion estimation with two central cam...

Figure 1.7. Two examples of a catadioptric line image. Left: with a central ca...

Figure 1.8. Epipolar geometry of two perspective cameras.

Figure 1.9. Epipolar geometry of two para-catadioptric cameras. Left: the two ...

Figure 1.10. Epipolar geometry of two non-central cameras; here, catadioptric ...

Figure 1.11. Classical stereo rectification for perspective images. Upper left...

Figure 1.12. Generic stereo rectification. Upper left: epipolar segments are “...

Chapter 2

Figure 2.1. Illustration of the perspective projection. (a) Conventional camer...

Figure 2.2. Panoramic catadioptric vision. (a) Single catadioptric camera (V-S...

Figure 2.3. Diagram of the unified central projection model

Figure 2.4. Spherical vision: (a) Compact polydioptric spherical camera (Ricoh...

Figure 2.5. Examples of measurements in the image for calibration. (a) Checker...

Chapter 3

Figure 3.1. Visibility constraints for surface reconstruction in a 3D Delaunay...

Figure 3.2. A set of three tetrahedra whose boundary is not manifold. There is...

Figure 3.3. A few omnidirectional systems (among many others) used for environ...

Figure 3.4. Bi-cameras among many others. From left to right: Ricoh Theta S, S...

Figure 3.5. Rectifications such that the pairs of corresponding epipolar curve...

Figure 3.6. Sphere/plane sweeping. The bold spheres are input images. The virt...

Figure 3.7. Stationary non-central cameras for reconstruction from videos. Eve...

Figure 3.8. From local to global models (Lhuillier 2011). Note that 208 1632 ×...

Figure 3.9. 3D modeling for VR. A helmet-held GoPro Max 360 camera records two...

Chapter 4

Figure 4.1. Central catadioptric image formation using the unified projection ...

Figure 4.2. Geometry and image formation on a two-sheet hyperboloid surface: (...

Figure 4.3. Geometry and image formation on a spherical surface: (a) Polar coo...

Figure 4.4. Geometry and image formation on a paraboloid surface. Left: (r, φ...

Figure 4.5. Segmentation of an object of interest in a synthetic hyperboloid i...

Figure 4.6. Embedding a 2D image as a surface in a three-dimensional space

Figure 4.7. Adapted Gaussian smoothing. On the top row: smoothing using differ...

Figure 4.8. Adapted Difference of Gaussians filtering: (a) Classical approach:...

Figure 4.9. Harris corner detection using the adapted Gaussian kernel with dif...

Figure 4.10. Image formation and neighborhood dependency. (a) Orthographic ima...

Figure 4.11. Spherical coordinates and local basis

Figure 4.12. Gradient norm results on a synthetic image. (a and b) Classical p...

Figure 4.13. Harris corner detection. (a) Test Image, (b) classical Harris cor...

Figure 4.14. Matching by ZNCC. (a) Classical method: 65 total matchings and 53...

Figure 4.15. Corner detection and matching. (a and b) Two images from the real...

Chapter 5

Figure 5.1. Central and non-central projection

Figure 5.2. (a) Appositional, (b) superpositional and (c) neural superposition...

Figure 5.3. Caustic for a catadioptric device made of a source, a reflector an...

Figure 5.4. Geometric construction.

Figure 5.5. Incident ray imaged into a the camera after reflection from mirror...

Figure 5.6. Caustic surface with source at infinity and incident plane at infi...

Figure 5.7. Two-planes parameterization to encode rays by intersections of inc...

Figure 5.8. Set of trajectories built from the plenoptic estimation (red squar...

Figure 5.9. Histogram of the translational errors. The average length of the t...

Chapter 6

Figure 6.1. Representation of the first central omnidirectional catadioptric c...

Figure 6.2. Representation of an omnidirectional catadioptric camera system wi...

Figure 6.3. xz-Cut representation of the two projections model for general cen...

Figure 6.4. xz-cut representation of general distortion model

Figure 6.5. Graphical representation of a multi-camera omnidirectional system ...

Figure 6.6. xz-cut of a non-central catadioptric with quadric mirrors. vi deno...

Figure 6.7. θ denotes the angle of perspective camera projection rays. ϕ is th...

Figure 6.8. Relationships used in Chahl and Srinivasan (1997) to derive a fami...

Figure 6.9. On the left, the setup considered in Hicks and Bajcsy (1999) is sh...

Figure 6.10. Relationship between the image plane and an object surface and th...

Figure 6.11. Camera setup proposed in Gaspar et al. (2002) for maintaining con...

Figure 6.12. Graphical representation of the varying entering pupil model in G...

Figure 6.13. Application of the localization method proposed in Miraldo and Ar...

Figure 6.14. Results of the vanishing points and vanishing lines modeling in M...

Figure 6.15. Spherical catadioptric system proposed in Hong et al. (1991), and...

Figure 6.16. Camera setup used in Winters et al. (2000) and the comparison bet...

Figure 6.17. On the left, we show the setup of two fisheye systems in Li and I...

Figure 6.18. Hierarchical approach for the localization of a robot based on a ...

Figure 6.19. Example of a mapping with the memorized trajectories scheme propo...

Figure 6.20. On the left, we show the scheme of the problem studied in Lukiers...

Figure 6.21. Basic pipeline for visual odometry

Figure 6.22. Vehicle with a catadioptric omnidirectional system used in Scaram...

Figure 6.23. Vehicle with an omnidirectional multi-camera system and the resul...

Figure 6.24. Representation of the motion model used in Scaramuzza et al. (200...

Figure 6.25. Multi-camera system used in Kazik et al. (n.d.).

Figure 6.26. Vehicle with a multi-camera system used in Lee and Faundorfer (20...

Figure 6.27. Sensor setup, example of an omnidirectional image with extracted ...

Figure 6.28. Examples of fisheye images and the resulting trajectory for the m...

Figure 6.29. The top picture shows the setup of four fisheye cameras used in S...

Figure 6.30. In the top, we show the hybrid projection model proposed in Seok ...

Figure 6.31. On the left, we show the simultaneous localization and mapping pi...

Figure 6.32. Different resolutions of the spherical polyhedron used to project...

Figure 6.33. The proposed three-stage DNN (unary feature extraction, spherical...

Figure 6.34. Fernandez-Labrador et al. (2020) propose a method for mapping the...

Figure 6.35. SLAM pipeline proposed in Won et al. (2020b) using the multiple o...

Figure 6.36. Three different control schemes are considered in the switching m...

Figure 6.37. At the top, we show the control scheme used in Vidal et al. (2004...

Figure 6.38. At the top, we show the multi-robot system used in Gava et al. (2...

Guide

Cover

Table of Contents

Dedication Page

Title Page

Copyright Page

Acknowledgments

List of Acronyms

Preface

Begin Reading

Conclusion and Perspectives

List of Authors

Index

WILEY END USER LICENSE AGREEMENT

Pages

ii

iii

iv

xi

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

xxi

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

The editors dedicate this book to their families.They are grateful for their support throughout the course of this long project.

SCIENCES

Image, Field Director – Laure Blanc-Féraud

Sensors and Image Processing, Subject Head – Cédric Demonceaux

Omnidirectional Vision

From Theory to Applications

Coordinated by

Pascal Vasseur

Fabio Morbidi

First published 2023 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK

www.iste.co.uk

John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA

www.wiley.com

© ISTE Ltd 2023The rights of Pascal Vasseur and Fabio Morbidi to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s), contributor(s) or editor(s) and do not necessarily reflect the views of ISTE Group.

Library of Congress Control Number: 2023938467

British Library Cataloguing-in-Publication DataA CIP record for this book is available from the British LibraryISBN 978-1-78945-143-6

ERC code:PE6 Computer Science and Informatics PE6_2 Computer systems, parallel/distributed systems, sensor networks, embedded systems, cyber-physical systems PE6_11 Machine learning, statistical data processing and applications using signal processing (e.g. speech, image, video)

Acknowledgments

This book would not have been possible without the support provided by our host institution, the University of Picardie Jules Verne, Amiens, France. Fabio Morbidi is indebted to his colleagues, E. Mouaddib and G. Caron, for the many refreshing discussions about omnidirectional vision over a beer.

The editors express their sincere thanks to the authors who have contributed to the six main chapters of this book, for the professionalism, reactivity, valuable comments and patience, which finally led to the publication of this collective work, after multiple delays due to the Covid-19 pandemic and our busy schedules. We learnt a lot from all of them, and this common journey strengthened our bonds of mutual esteem and friendship.

Finally, the editors wish to extend their heartfelt thanks to the team at ISTE Ltd for the smooth and efficient book production process, and to L. Blanc-Féraud (director of the “Image” field) and C. Demonceaux (subject head for “Sensors and Image Processing”) for their guidance and encouragement.

Pascal VASSEUR and Fabio MORBIDI

Amiens, August 2023

List of Acronyms

AR: Augmented reality

BEV: Bird’s eye view

CCD: Charge coupled device

CNN: Convolutional neural network

DNN: Deep neural network

DoF: Degrees of freedom

DoG: Difference of Gaussians

EKF: Extended Kalman filter

FoV: Field of view

GCM: General camera model

GEM: Generalized essential matrix

GPS: Global positioning system

GPU: Graphical processing unit

IMU: Inertial measurement unit

LM: Levenberg–Marquardt

MAP: Maximum a posteriori

MSCKF: Multi-state constraint Kalman filter

PnP: Perspective-n-point

RANSAC: RANdom SAmple Consensus

RGB-D: Red, green, blue – depth

RGBA: Red, green, blue, alpha

SfM: Structure-from-motion

SGM: Semi-global matching

SLAM: Simultaneous localization and mapping

SSD: Sum of squared differences

SURF: Speeded up robust features

SVD: Singular value decomposition

UAV: Unmanned aerial vehicle

VR: Virtual reality

WTA: Winner takes all

Preface

Fabio MORBIDI and Pascal VASSEUR

MIS Laboratory, University of Picardie Jules Verne, Amiens, France

P.1. Omnidirectional vision: a historical perspective

“Charge-coupled devices” (CCDs) were invented by W. Boyle and G.E. Smith at Bell Labs in 1969. They consist of a sensor that converts an incoming 2D light pattern into an electrical signal that, in turn, is transformed into an image. Although the CCDs could capture an image, they could not store it. Digital cameras, invented in 1975 at Eastman Kodak in Rochester, NewYork, by S.J. Sasson, fixed this problem. The first digital camera, equipped with a Fairchild Semiconductor’s 100-by-100-pixel CCD, was able to display photos on a TV screen (Goodrich 2022).

An omnidirectional camera (also known as 360° camera) is a camera with a field of view (FoV) that covers approximately the entire sphere or at least a full circle in the horizontal plane (the adjective “omnidirectional” combines two words: “omni” and “directional”. “Omni” comes from the Latin word “Omnis”, meaning “all”). A conventional camera has an FoV that ranges from a few degrees to, at most, 180°: this means that it can capture, at most, light falling onto the camera focal point through a hemisphere. On the contrary, an ideal omnidirectional camera captures light from all directions falling onto the focal point, covering a full sphere. However, in practice, most omnidirectional cameras span only part of the full sphere and many cameras, which are dubbed omnidirectional, cover only approximately a hemisphere, or the full 360° along the equator of the sphere, the top and bottom hemispheres excluded (in this case, the term panoramic camera is preferred). If the full sphere is covered, the captured light beams do not exactly intersect in a single focal point, i.e. the system is non-central. Human vision is an example of a system with a wide FoV. In fact, humans have slightly over a 210° horizontal FoV (without eye movements) (Strasburger 2020), while some birds and insects have a complete or nearly complete 360° visual field. The vertical range of the visual field in humans is around 150°. A large FoV has proven to be a very important asset in the preservation of certain species, and it likely plays a crucial role in the evolution of animal vision (Burkhardt 2005).

With the same ground being plowed many times by different researchers in the last decades, various camera designs have been proposed to capture 360° images: cameras with a single lens (fisheye), cameras with two lenses (dual or twin fisheye), cameras with more than two lenses (polydioptric), camera rigs, pan-tilt-zoom and cameras with rotating mechanisms, and catadioptric systems combining mirrors (cata-) and lenses (-dioptric). Some of these cameras capture 360° images in a single shot, while the others build an omnidirectional image by stitching together different regions of the FoV acquired over a prolonged period of time. Fisheye cameras (which use lens systems with very short focal lengths and strong refractive power), and catadioptric systems (which were first patented in 1970 (Rees 1970)), are in the first group. On the other hand, pan-tilt-zoom cameras and cameras with rotating mechanisms, belong to the second group.

Although the fundamental concepts have been around since at least the 1970s (Cao et al. 1986; Yagi and Kawato 1990; Ishiguro et al. 1992), modern omnidirectional vision dates back to the late 1990s: in fact, the seminal works on image formation and geometry by Nayar (1997); Baker and Nayar (1999); and Svoboda et al. (1998) marked the beginning of an independent field of investigation. Another milestone in the history of omnidirectional vision is the unifying theory for central panoramic systems developed by Geyer and Daniilidis (2000), and subsequently extended by Barreto (2006) and other researchers (Khomutenko et al. 2016; Usenko et al. 2018). This pioneering work has been the harbinger of a burgeoning array of papers in computer vision (image processing and descriptors adapted to spherical signals, calibration, epipolar and multi-view geometry, structure-from-motion, 3D reconstruction, etc.) and robotics (image-based localization, simultaneous localization and mapping (SLAM), visual servoing, etc.). Finally, the series of OMNIVIS workshops (“Omnidirectional Vision, Camera Networks and Non-Classical Cameras”) held annually between 2000 and 2011, in conjunction with major computer vision conferences, contributed to shaping the community and bringing together researchers interested in non-conventional vision. This tradition continued with the OmniCV workshops (“Omnidirectional Computer Vision”), organized every year since 2020 (CVPR’23 marked the fourth edition).

Today, with the miniaturization of image sensors and optical components (lenses, prisms and mirrors), omnidirectional cameras have risen to prominence in consumer electronics (smartphone attachments, surveillance systems, perception systems in autonomous vehicles, etc.), and they have transformed our everyday lives and made them easier. Applications include mobile robotics, videoconferencing, art (panoramic photography), real estate (remote tours), vehicle parking assistance, virtual and augmented reality, tele-operated systems (for enhanced situational awareness), forensics, astronomy and entertainment. Several multinational electronics companies (Samsung, Ricoh, GoPro) have invested in the field of omnidirectional vision which has experienced a renaissance over the past ten years, and they are actively producing and supporting hardware. This fostered academic research and has contributed to the growth of the community.

P.2. Why this book?

While innumerable computer vision books have made their appearance in the last two decades, for example, books by Forsyth and Ponce (2011); Hartley and Zisserman (2004) and Ma et al. (2004) just to mention the most popular ones, relatively few books or monographs have been dedicated to omnidirectional vision. In fact, we are aware of only three research-oriented books (Benosman and Kang 2000; Sturm et al. 2011; Puig and Guerrero 2013), a survey paper (Ishiguro 2005), and two dedicated chapters in robotics textbooks, (Chapter 11.3 in Corke 2011; Chapter 4.2 in Siegwart et al. 2011). Several indicators suggest that the field of omnidirectional vision is now mature: it is then time to review the core principles (image formation, mathematical modeling, camera calibration, etc.), critically assess the key achievements and present some of the main applications, with an eye on the most recent trends and research directions.

Obviously, the field is too vast and dynamic to be fully covered in a single book. Therefore, a precise editorial choice has been made, and some “trendy topics” have been intentionally left out. A notable omission in coverage is the growing body of research on machine learning applied to omnidirectional vision (Ai et al. 2022), which is only briefly mentioned in Chapters 3 and 6. Moreover, we pass over the recent progress made in the field of graph image processing (Cheung et al. 2018). This book brings together the contributions of 10 renowned international scientists with multidisciplinary interests in image processing, computer vision, vehicle engineering and robotics. It is intended for a general audience: young beginners interested in discovering the field, professionals, instructors and experimented scientists in academia.

P.3. Organization of the book

This book consists of a preface, six chapters and a conclusion, and it is organized as follows:

Preface

: providing a brief history of omnidirectional vision, it defines the position and scope of the book, and presents its general structure.

Chapter 1

reviews basic geometric concepts relevant to omnidirectional vision. These include the image formation process, with a special focus on catadioptric cameras. A brief discussion on how camera models approximate the image formation process is also provided.

Chapter 2

presents the geometric models behind the formation of an omnidirectional image and critically assesses the different existing techniques for the estimation of the intrinsic parameters of an omnidirectional camera.

Chapter 3

describes different techniques for the reconstruction of 3D environments from images captured by static or moving omnidirectional cameras.

Chapter 4

is devoted to image processing, adapted to the spherical signals provided by catadioptric cameras.

Chapter 5

presents a special class of omnidirectional cameras, the so-called non-central vision sensors, and provides an overview of their main geometric properties and applications.

Chapter 6

deals with the application of omnidirectional cameras to robot localization and navigation.

Chapter 7

concludes the book. The main contributions are summarized and some prospects for future research are discussed.

It is not knowledge, but the act of learning, not possession but the act of getting there, which grants the greatest enjoyment. When I have clarified and exhausted a subject, then I turn away from it, in order to go into darkness again. The never-satisfied man is so strange; if he has completed a structure, then it is not in order to dwell in it peacefully, but in order to begin another. I imagine the world conqueror must feel thus, who, after one kingdom is scarcely conquered, stretches out his arms for others.

Extract from a letter of Carl Friedrich GAUSS to Farkas BOLYAI, dated 2 September 1808.

June 2023

P.4. References

Ai, H., Cao, Z., Zhu, J., Bai, H., Chen, Y., Wang, L. (2022). Deep learning for omnidirectional vision: A survey and new perspectives [Online]. Available at:

https://arxiv.org/abs/2205.10468

.

Baker, S. and Nayar, S. (1999). A theory of single-viewpoint catadioptric image formation.

Int. J. Comput. Vision

, 35(2), 175–196.

Barreto, J. (2006). A unifying geometric representation for central projection systems.

Comput. Vis. Image Und.

, 103(3), 208–217.

Benosman, R. and Kang, S. (eds) (2000).

Panoramic Vision: Sensors, Theory, and Applications

. Springer-Verlag, New York.

Burkhardt Jr., R.W. (2005).

Patterns of Behavior: Konrad Lorenz, Niko Tinbergen, and the Founding of Ethology

. The University of Chicago Press.

Cao, Z., Oh, S., Hall, E. (1986). Dynamic omnidirectional vision for mobile robots.

J. Robotic Syst.

, 3(1), 5–17.

Cheung, G., Magli, E., Tanaka, Y., Ng, M. (2018). Graph spectral image processing.

Proc. IEEE

, 106(5), 907–930.

Corke, P. (2011).

Robotics, Vision and Control: Fundamental Algorithms in MATLAB

, volume 73. Springer-Verlag, Berlin/Heidelberg.

Forsyth, D. and Ponce, J. (2011).

Computer Vision: A Modern Approach

, 2nd edition. Pearson Education, Upper Saddle River.

Geyer, C. and Daniilidis, K. (2000). A unifying theory for central panoramic systems and practical implications. In

Proc. 6th Eur. Conf. Comput. Vis.

Springer, Berlin/Heidelberg.

Goodrich, J. (2022). The first digital camera was Kodak’s biggest secret: The toaster-sized device displayed photos on a TV screen.

The Institute

, 60–61.

Hartley, R. and Zisserman, A. (2004).

Multiple View Geometry in Computer Vision

, 2nd edition. Cambridge University Press.

Ishiguro, H. (2005). Omnidirectional vision. In

Handbook of Pattern Recognition and Computer Vision

, Chen, C.H. and Wang, P.S.P. (eds). World Scientific, Singapore.

Ishiguro, H., Yamamoto, M., Tsuji, S. (1992). Omni-directional stereo.

IEEE Trans. Pattern Anal. Mach. Intell.

, 14(2), 257–262.

Khomutenko, B., Garcia, G., Martinet, P. (2016). An enhanced unified camera model.

IEEE Robot. Autonom. Lett.

, 1(1), 137–144.

Ma, Y., Soatto, S., Košecká, J., Sastry, S.S. (2004).

An Invitation to 3D Computer Vision: From Images to Geometric Models

. Springer-Verlag, New York.

Nayar, S. (1997). Catadioptric omnidirectional camera. In

Proc. IEEE Conf. Comput. Vis. Pattern Recognit.

IEEE, New York.

Puig, L. and Guerrero, J.J. (2013).

Omnidirectional Vision Systems: Calibration, Feature Extraction and 3D Information

. Springer-Verlag, London/Heidelberg.

Rees, D.W. (1970). Panoramic television viewing system. United States Patent, No. 3, 505, 465.

Siegwart, R., Nourbakhsh, I., Scaramuzza, D. (2011).

Introduction to Autonomous Mobile Robots

, 2nd edition. MIT Press, Cambridge, MA.

Strasburger, H. (2020). Seven myths on crowding and peripheral vision.

i-Perception

, 11(3), 1–46.

Sturm, P., Ramalingam, S., Tardif, J.-P., Gasparini, S., Barreto, J. (2011). Camera models and fundamental concepts used in geometric computer vision.

Foundations and Trends in Computer Graphics and Vision

, 6(1–2), 1–183.

Svoboda, T., Pajdla, T., Hlaváč, V. (1998). Epipolar geometry for panoramic cameras. In

Proc. Eur. Conf. Comp. Vis

. Springer, Berlin/Heidelberg.

Usenko, V., Demmel, N., Cremers, D. (2018). The double sphere camera model. In

Proc. IEEE Int. Conf. 3D Vision

. IEEE, New York.

Yagi, Y. and Kawato, S. (1990). Panorama scene analysis with conic projection. In

Proc. IEEE/RSJ Int. Conf. Intel. Robots Syst.

, volume 1. IEEE, New York.

1Image Geometry

Peter STURM

Inria Grenoble Rhône-Alpes, Montbonnot-Saint-Martin, France

This chapter reviews basic geometrical concepts relevant for omnidirectional vision. These comprise the image formation process, with a special emphasis on catadioptric cameras, and a brief discussion on how camera models approximate the image formation process. The distinction between central (or single-viewpoint) and non-central cameras is explained, together with a discussion of the respective advantages. The chapter also reviews the basic building blocks of structure-from-motion, ranging from projection and back-projection to pose and ego-motion estimation. It is then shown that line images are particularly important when studying omnidirectional cameras for calibration as well as for image matching through the epipolar geometry. Dense matching is usually sped up by pre-processing images during a rectification process, which is explained with a particular emphasis on omnidirectional images.

The concepts discussed in this chapter are illustrated in short videos accessible on the Internet (creative commons license CC-BY-NC-SA). URLs are provided in the text.

1.1. Introduction

Geometry is important in various aspects of omnidirectional vision, from sensor and camera design and modeling, over image analysis, to structure from motion. The probably most fundamental issue concerns design and modeling: how to build image acquisition devices that have certain desired characteristics? Foremost among these is of course the desire to acquire images with a very wide field of view, be it panoramic, hemispheric, fully spherical, or somewhere in between1. The potential interests are clear – such a wide field of view allows us to visualize or analyze a scene more completely2 and detect obstacles or objects to interact with, all around a robot. It also turns out that ego-motion estimation is generally more stable and accurate with a large field of view (Nelson and Aloimonos 1988). Besides such practical interests, there may also be others, such as esthetics.

Various technical solutions have been developed to acquire omnidirectional images. The first ones relied on rotating a regular or tailor-made camera about itself and “stitching together” images acquired during the rotation such as to form an image representing an extended field of view, an approach nowadays provided as a basic feature on most consumer-grade digital cameras. The obvious disadvantages are that image acquisition is not instantaneous, making it difficult to operate in dynamic contexts or to use it for omnidirectional video acquisition, and that it requires image processing that may not succeed for all types of scenes. Alternative solutions were thus developed, especially of two types. One consists of developing lens designs capable of delivering the sought after large fields of view, in particular fisheye objectives with fields of view nowadays even exceeding 180°. The other, quite popular in robotics in the last decades, is to use mirrors to enhance the field of view of a camera, leading to systems baptized catadioptric cameras3.

Besides an extended field of view, various other design objectives have been pursued by scientists and engineers. Among the most important is the aim of achieving a single effective viewpoint or optical center – in the following we will also speak of central cameras. An example is given in Figure 1.1.

Figure 1.1.Central catadioptric systems based on planar mirrors

NOTES ON FIGURE 1.1.– Left: A camera pointed at a planar mirror acquires the same image (besides issues such as loss of sharpness or color richness due to imperfect reflection) as would a virtual camera situated and oriented symmetrically on the other side of the mirror. Right: Several camera–mirror pairs are arranged such that the corresponding virtual cameras have the same optical centers. The arrangement on the right-hand side of the figure allows us to produce a panoramic image as if it were taken from a single effective viewpoint inside the pyramid of mirrors – it thus corresponds to a central camera (Iwerks 1964; Nalwa 1996). To get a complete panoramic field of view, an adequate number of camera–mirror pairs must be used, depending on the camera’s individual fields of view.

Why is this property interesting? Firstly, as noted by Baker and Nayar (1999), images acquired by central cameras allow it to synthesize perspectively correct images, that is, images as if acquired from the same viewpoint by a perspective camera. This property has been used to “navigate” in panoramic images through panning our virtual viewing direction and displaying a perspective image generated from the panorama according to the current viewing direction (Chen 1995). Obviously, this idea is applicable for any combination of real and virtual camera, provided they are both central and that the real camera offers a sufficient field of view to synthesize images for the virtual camera. Examples are shown in the accompanying videos 01_Perspective_Rendering.mp4.4 and 02_Panoramic_Rendering.mp45, where it is illustrated how to synthesize perspective panoramic images from an image acquired with an omnidirectional catadioptric camera.

Another advantage of having a single effective viewpoint is that this opens the way to directly applying the rich toolkit of Structure-from-Motion (SfM) methods originally developed for perspective cameras. A final aspect mentioned here is that the usage of central cameras enables us to perform dense stereovision very efficiently, in a manner analogous to that originally developed for perspective images, through a generalized rectification process followed by “scan-line matching”. All these aspects are further developed later in this chapter.

Other design objectives besides large field of view and single effective viewpoint concern for instance the types of distortion inevitably generated by omnidirectional cameras. Some cameras are built in order to conform to an equiangular (sometimes also called equidistant) distortion profile: this is the case if the angle spanned by the optical axis and the viewing direction associated with a pixel in the image is proportional to the pixel’s distance from the distortion center in the image. Other distortion profiles may be interesting such as ones corresponding to area-preserving projections (Hicks and Perline 2002). The choice of distortion profile or other design feature may depend on the practical application of a camera, for instance, the requirement that certain parts of the scene appear in higher resolution than others. A general framework for specifying such properties and developing a dedicated mirror shape achieving these as best as possible was proposed by Swaminathan et al. (2004) and Hicks (2005).

Various other design objectives have of course been explored, such as on optical properties (ease of focusing, color fidelity, etc.) or the volume of a camera: to circumvent the classical disadvantage of catadioptric cameras concerning their bulk, “folded” catadioptric systems composed of several nested mirrors were proposed (Nayar and Peri 1999).

Let us close this discussion by mentioning that while a single effective viewpoint is an attractive property, as explained above, designing cameras that are non-central on purpose may be beneficial in other ways: this obviously gives more degrees of freedom to optimize certain properties and also non-central cameras may allow us to determine the absolute scale of ego-motion and 3D reconstruction, which is not possible with central cameras. Non-central cameras are further discussed in section 1.4.

In this chapter, we discuss geometrical aspects of omnidirectional cameras. When discussing “the geometry” of a certain type of camera, we usually refer to some model of the image formation process: how a camera generates an image of the scene it is pointed at. The actual image formation process carried out by a real camera is relatively complex; light emitted by objects in the scene travels through space and, if entering the camera aperture, travels across the camera’s optical elements (lenses, mirrors) and finally hits the optical sensor, producing an electrical charge (for digital cameras) that is eventually converted into greylevels or colors for displaying an image to a human or for processing it by a computer. Most models of this process used in computer vision are simplified representations of it. In this chapter, we are concerned with purely geometric models, whose main basic operation is to determine, given the location of a point-like object in the scene, where the object’s image will be observed on the sensor. All higher level geometric operations in computer vision, such as computing the motion of a camera from two or more images of a scene, are derived from this basic operation.

1.1.1. Outline of this chapter

In the next section, we briefly outline the general image formation process and explain in which ways the usual camera models, even the classical perspective or pinhole model, are a simplification thereof. Two fundamental concepts, projection and back-projection, are very briefly introduced in the subsequent section, followed by a short discussion on central and non-central cameras. When talking about 3D geometry for computer vision, a useful distinction is between what happens “outside” cameras from their “inner workings”. By “outside”, we essentially mean the geometric relations between the 3D scene and one or more cameras and more particularly, the question of how to represent and infer information on the relative positioning between cameras and/or entities in the scene. This is the subject of a section, where it is addressed, for conciseness, for the case of fully calibrated cameras. The subsequent two sections then consider the geometry of the inner workings of cameras and the epipolar geometry of a pair of cameras, with an attention to the subject of rectification for stereo matching.

This chapter does not contain equations, only geometry–algebraic formulations of some of the material covered can be found in other chapters of this book and another good starting point may be Sturm et al. (2011).

1.2. Image formation and point-wise approximation

The generation of a digital image of a scene, by means of a camera, is a complex process. It is ultimately created through photons that cause an electric response in the camera sensors, which are made of either CCD, CMOS or another technology. We may distinguish what happens inside a camera, to photons that enter its aperture, from what is going on outside: these photons result from a potentially infinitely complex game, being emitted, reflected, refracted, etc., from objects in the scene, where light is “bounced” repeatedly from one object to another. Not to speak of phenomena such as atmospheric or submarine diffraction caused by particles suspended in the air or water.

All of these aspects of the image formation process have been studied, in various degrees, in photogrammetry, computer vision and computer graphics, guided by the motivation to synthesize as realistic images as possible, to enable a meaningful analysis of imagery acquired in bad weather or otherwise “unusual” conditions such as under water, or even to “look around the corner” (Torralba and Freeman 2012), that is, to use an acquired image to infer something about an object hidden from the camera’s field of view through its reflection or shadows produced on other objects.

Most works in computer vision rely on different levels of approximation of the image formation process. A first such approximation is to ignore the multiple bounces light undergoes in the scene: we implicitly assume that each point on the surface of an object in the scene emits/reflects light, and only photons reaching the camera on a direct path are considered. Sometimes, the “light emitted” by a point is modeled by a “color” associated with that point or, more generally, by a reflectance model which represents how light impinging on that point from a direct light source is reflected in different directions. In any such case, whenever the camera aperture is of finite extent (i.e. is not assumed to be a single point), the camera captures an entire volume of light emitted/reflected by any individual scene point. Unless the camera optics are perfect and the point is perfectly in focus, this set of light rays hits the camera sensor on a finite area, that is, the “image” of the scene point is not confined to an infinitesimal image point. This is of course one of the well-known causes of blur. Image formation models that mimic this have been proposed in computer vision and graphics, for example, through the definition of point spread functions6.

A second level of approximation comes essentially down to ignoring blur and to assuming that light emitted by a single scene point manifests itself in a single point in the final image. Besides the above sketched cause for blur, this approximation also ignores the fact that in a real digital camera, light is captured through a finite set of photosensitive elements, each one capturing light within a finite area7.

Most works tagged as “geometry” in computer vision use such an approximation: the image of a scene point is again a point. Camera models are then conceived to essentially allow one to compute, given the location and orientation of a camera and the location of a scene point, the location of the image point associated with that scene point. Such camera models are the basis for various tasks such as estimating the location and orientation of an object relative to a camera (pose estimation), estimating the motion of a camera just by analyzing different images taken during that motion, estimating a 3D model of the scene, etc. We will come back to these tasks in later sections. Before doing so, we first investigate, in the next section, camera models and how they represent the mapping of a scene point to an image point.

1.3. Projection and back-projection

The simplest and most widely used camera model in computer vision is the so-called pinhole model, performing a perspective projection8. It consists of two elements: the entire optics is represented by a single point, the so-called optical center, and the image sensor is represented by a mathematical plane, the image plane. The basic operation, the projection of a scene point to the image, is performed as follows: the first one creates the mathematical line that connects the optical center and the scene point and second one determines the point where that line intersects the image plane: the image point.

These simple geometrical operations can be expressed in a similarly simple algebraic manner (see, for instance, Sturm et al. 2011).

As explained in section 1.2, this model of course represents multiple approximations to the functioning of a real camera. Even so, it has been observed that it models “regular” cameras (e.g. consumer cameras) sufficiently well, that is, when being employed for tasks such as 3D reconstruction and motion estimation, the results are of acceptable accuracy. However, the approximation becomes insufficient when radial or other distortions become noticeable, for instance in wide field of view cameras, or whenever the maximum possible accuracy is sought after. This can be taken care of by extending the perspective camera model accordingly, as has been studied in photogrammetry for more than a century, resulting in more complex camera models (both geometrically and algebraically). However, even these classical extensions of the perspective model are not sufficient when considering omnidirectional cameras, we will consider these further below.

Let us now study an important concept, the reciprocal operation to projection, which we call back-projection. Back-projection starts from an image point and tries to answer the question where the original scene point could possibly be located. In general, unless additional information is available, the answer corresponds to a (half-) line. Back-projection for the perspective model is straightforward: we determine the line connecting the image point and the optical center, followed by “clipping” it to a half-line, for instance, at the optical center or at some minimal viewing distance in front of it. Both operations have a simple algebraic expression, much like for projection.

We now move toward the case of a catadioptric camera: a system composed of a perspective camera and a mirror into which this camera gazes9. Suppose that the entire geometry of the system is known: position of the image plane and the optical center, as well as the shape and position of the mirror. Let us first consider the case of a general mirror shape, that is, it is not constrained to be a surface of revolution or otherwise specific. Suppose that the mirror shape is expressed by a scalar function that takes 3D point coordinates as argument and returns the distance of the 3D point to the closest point on the mirror surface. Let us further suppose that we have a function that maps 3D points lying on the mirror surface to the associated mirror’s tangent plane10.