Computer Vision in Vehicle Technology -  - E-Book

Computer Vision in Vehicle Technology E-Book

0,0
81,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A unified view of the use of computer vision technology for different types of vehicles

Computer Vision in Vehicle Technology focuses on computer vision as on-board technology, bringing together fields of research where computer vision is progressively penetrating: the automotive sector, unmanned aerial and underwater vehicles. It also serves as a reference for researchers of current developments and challenges in areas of the application of computer vision, involving vehicles such as advanced driver assistance (pedestrian detection, lane departure warning, traffic sign recognition), autonomous driving and robot navigation (with visual simultaneous localization and mapping) or unmanned aerial vehicles (obstacle avoidance, landscape classification and mapping, fire risk assessment).

The overall role of computer vision for the navigation of different vehicles, as well as technology to address on-board applications, is analysed.

Key features:

  • Presents the latest advances in the field of computer vision and vehicle technologies in a highly informative and understandable way, including the basic mathematics for each problem.
  • Provides a comprehensive summary of the state of the art computer vision techniques in vehicles from the navigation and the addressable applications points of view.
  • Offers a detailed description of the open challenges and business opportunities for the immediate future in the field of vision based vehicle technologies.

This is essential reading for computer vision researchers, as well as engineers working in vehicle technologies, and students of computer vision.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 395

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Copyright

List of Contributors

Preface

Abbreviations and Acronyms

Chapter 1: Computer Vision in Vehicles

1.1 Adaptive Computer Vision for Vehicles

1.2 Notation and Basic Definitions

1.3 Visual Tasks

1.4 Concluding Remarks

Acknowledgments

Chapter 2: Autonomous Driving

2.1 Introduction

2.2 Autonomous Driving in Cities

2.3 Challenges

2.4 Summary

Acknowledgments

Chapter 3: Computer Vision for MAVs

3.1 Introduction

3.2 System and Sensors

3.3 Ego-Motion Estimation

3.4 3D Mapping

3.5 Autonomous Navigation

3.6 Scene Interpretation

3.7 Concluding Remarks

Chapter 4: Exploring the Seafloor with Underwater Robots

4.1 Introduction

4.2 Challenges of Underwater Imaging

4.3 Online Computer Vision Techniques

4.4 Acoustic Imaging Techniques

4.5 Concluding Remarks

Acknowledgments

Chapter 5: Vision-Based Advanced Driver Assistance Systems

5.1 Introduction

5.2 Forward Assistance

5.3 Lateral Assistance

5.4 Inside Assistance

5.5 Conclusions and Future Challenges

Acknowledgments

Chapter 6: Application Challenges from a Bird's-Eye View

6.1 Introduction to Micro Aerial Vehicles (MAVs)

6.2 GPS-Denied Navigation

6.3 Applications and Challenges

6.4 Conclusions

Chapter 7: Application Challenges of Underwater Vision

7.1 Introduction

7.2 Offline Computer Vision Techniques for Underwater Mapping and Inspection

7.3 Acoustic Mapping Techniques

7.4 Concluding Remarks

Chapter 8: Closing Notes

References

Index

End User License Agreement

Pages

ix

x

xi

xii

xiii

xiv

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

Guide

cover

Table of Contents

Preface

Begin Reading

List of Illustrations

Chapter 1: Computer Vision in Vehicles

Figure 1.1 (a) Quadcopter. (b) Corners detected from a flying quadcopter using a modified FAST feature detector.

Figure 1.2 The 10 leading causes of death in the world. Chart provided online by the World Health Organization (WHO). Road injury ranked number 9 in 2011

Figure 1.3 Two screenshots for real-view navigation.

Figure 1.4 Examples of benchmark data available for a comparative analysis of computer vision algorithms for motion and distance calculations. (a) Image from a synthetic sequence provided on EISATS with accurate ground truth. (b) Image of a real-world sequence provided on KITTI with approximate ground truth

Figure 1.5 Laplacians of smoothed copies of the same image using cv::GaussianBlur and cv::Laplacian in OpenCV, with values 0.5, 1, 2, and 4, for parameter for smoothing. Linear scaling is used for better visibility of the resulting

Laplacians

.

Figure 1.6 (a) Image of a stereo pair (from a test sequence available on EISATS). (b) Visualization of a depth map using the color key shown at the top for assigning distances in meters to particular colors. A pixel is shown in gray if there was low confidence for the calculated disparity value at this pixel.

Figure 1.7 Resulting disparity maps for stereo data when using only

one

scanline for DPSM with the SGM smoothness constraint and a MCEN data-cost function.

From top to bottom and left to right

: Left-to-right horizontal scanline, and lower-left to upper-right diagonal scanline, top-to-bottom vertical scanline, and upper-left to lower-right diagonal scanline. Pink pixels are for low-confidence locations (here identified by inhomogeneous disparity locations).

Figure 1.8 Normalized cross-correlation results when applying the third-eye technology for stereo matchers iSGM and linBPM for four real-world trinocular sequences of Set 9 of EISATS.

Figure 1.9 (a) Reconstructed cloud of points. (b) Reconstructed surface based on a single run of the ego-vehicle.

Figure 1.10 Visualization of optical flow using the color key shown around the border of the image for assigning a direction to particular colors; the length of the flow vector is represented by saturation, where value “white” (i.e., undefined saturation) corresponds to “no motion.” (a) Calculated optical flow using the original Horn–Schunck algorithm. (b) Ground truth for the image shown in Figure 1.4a.

Figure 1.11 Face detection, eye detection, and face tracking results under challenging lighting conditions. Typical Haar-like features, as introduced in Viola and Jones (2001b), are shown in the upper right. The illustrated results for challenging lighting conditions require additional efforts.

Figure 1.12 Two examples for Set 7 of EISATS illustrated by preprocessed depth maps following the described method (Steps 1 and 2). Ground truth for segments is provided by Barth et al. (2010) and shown on top in both cases. Resulting segments using the described method are shown below in both cases;

Chapter 2: Autonomous Driving

Figure 2.1 The way people think of usage and design of Autonomous Cars has not changed much over the last 60 years: (a) the well-known advert from the 1950s, (b) a design study published in 2014

Figure 2.2 (a) CMU's first demonstrator vehicle Navlab 1. The van had five racks of computer hardware, including three Sun workstations, video hardware and GPS receiver, and a Warp supercomputer. The vehicle achieved a top speed of 32 km/h in the late 1980s. (b) Mercedes-Benz's demonstrator vehicle VITA built in cooperation with Dickmanns from the university of armed forces in Munich. Equipped with a bifocal vision system and a small transputer system with 10 processors, it was used for Autonomous Driving on highways around Stuttgart in the early 1990s, reaching speeds up to 100km/h

Figure 2.3 (a) Junior by CMU's Robotics Lab, winner of the Urban Challenge 2007. (b) A Google car prototype presented in 2014 that neither features a steering wheel nor gas or braking pedals. Both cars base their environment perception on a high-end laser scanner

Figure 2.4 (a) Experimental car BRAiVE built by Broggi's team at University of Parma. Equipped with only stereo cameras, this car drove 17km along roads around Parma in 2013. (b) Mercedes S500 Intelligent Drive demonstrator named “Bertha.” In August 2013, it drove autonomously about 100 km from Mannheim to Pforzheim, following the historic route driven by Bertha Benz 125 years earlier. Close-to-market radar sensors and cameras were used for environment perception

Figure 2.5 The Bertha Benz Memorial Route from Mannheim to Pforzheim (103 km). The route comprises rural roads, urban areas (e.g., downtown Heidelberg), and small villages and contains a large variety of different traffic situations such as intersections with and without traffic lights, roundabouts, narrow passages with oncoming vehicles, pedestrian crossings, cars parked on the road, and so on

Figure 2.6 System overview of the Bertha Benz experimental vehicle

Figure 2.7 Landmarks that are successfully associated between the mapping image (a) and online image (b) are shown.

Figure 2.8 Given a precise map (shown later), the expected markings (blue), stop lines (red), and curbs (yellow) are projected onto the current image. Local correspondence analysis yields the residuals that are fed to a Kalman filter in order to estimate the vehicle's pose relative to the map.

Figure 2.9 Visual outline of a modern stereo processing pipeline. Dense disparity images are computed from sequences of stereo image pairs. Red pixels are measured close to the ego-vehicle (i.e. ), while green pixels are far away (i.e., ). From these data, the Stixel World is computed. This medium-level representation achieves a reduction of the input data from hundreds of thousands of single depth measurements to a few hundred Stixels only. Stixels are tracked over time in order to estimate the motion of other objects. The arrows show the motion vectors of the tracked objects, pointing 0.5 seconds in advance. This information is used to extract both static infrastructure and moving objects for subsequent processing tasks. The free space is shown in gray

Figure 2.10 A cyclist taking a left turn in front of our vehicle: (a) shows the result when using -Vision point features and (b) shows the corresponding Stixel result

Figure 2.11 Results of the Stixel computation, the Kalman filter-based motion estimation, and the motion segmentation step. The left side shows the arrows on the base points of the Stixels denoting the estimated motion state. The right side shows the corresponding labeling result obtained by graph-cut optimization. Furthermore, the color scheme encodes the different motion classes (right headed, left headed, with us, and oncoming). Uncolored regions are classified as static background

Figure 2.12 ROIs overlaid on the gray-scale image. In the monocular case (upper row left), about 50,000 hypotheses have to be tested by a classifier, in the stereo case (upper row right) this number reduces to about 5000. If each Stixel is assumed to be the center of a vehicle at the distance given by the Stixel World (lower row left), only 500 ROIs have to be checked, as shown on the right

Figure 2.13 Intensity and depth images with corresponding gradient magnitude for pedestrian (top) and nonpedestrian (bottom) samples. Note the distinct features that are unique to each modality, for example, the high-contrast pedestrian texture due to clothing in the gray-level image compared to the rather uniform disparity in the same region. The additional exploitation of depth can reduce the false-positive rate significantly. In Enzweiler et al. (2010), an improvement by a factor of five was achieved

Figure 2.14 ROC curve illustrating the performance of a pedestrian classifier using intensity only (red) versus a classifier additionally exploiting depth (blue). The depth cue reduces the false-positive rate by a factor of five

Figure 2.15 Full-range (0–200m) vehicle detection and tracking example in an urban scenario. Green bars indicate the detector confidence level

Figure 2.16 Examples of hard to recognize traffic lights. Note that these examples do not even represent the worst visibility conditions

Figure 2.17 Two consecutive frames of a stereo image sequence (left). The disparity result obtained from a single image pair is shown in the second column from the right. It shows strong disparity errors due to the wiper blocking parts of one image. The result from temporal stereo is visually free of errors (right) (see Gehrig et al. (2014))

Figure 2.18 Scene labeling pipeline: input image (a), SGM stereo result (b), Stixel representation (d), and the scene labeling result (c)

Figure 2.19 Will the pedestrian cross? Head and body orientation of a pedestrian can be estimated from onboard cameras of a moving vehicle. means motion to the left (body), is toward the camera (head)

Chapter 3: Computer Vision for MAVs

Figure 3.1 A micro aerial vehicle (MAV) equipped with digital cameras for control and environment mapping. The depicted MAV has been developed within the SFLY project (see Scaramuzza et al. 2014)

Figure 3.2 The system diagram of the autonomous Pixhawk MAV using a stereo system and an optical flow camera as main sensors

Figure 3.3 The state estimation work flow for a loosely coupled visual-inertial fusion scheme

Figure 3.4 A depiction of the involved coordinate systems for the visual-inertial state estimation

Figure 3.5 Illustration of monocular pose estimation. The new camera pose is computed from 3D points triangulated from at least two subsequent images

Figure 3.6 Illustration of stereo pose estimation. At each time index, 3D points can be computed from the left and right images of the stereo pair. The new camera pose can be computed directly from the 3D points triangulated from the previous stereo pair

Figure 3.7 Concept of the optical flow sensor depicting the geometric relations used to compute metric optical flow

Figure 3.8 The PX4Flow sensor to compute MAV movements using the optical flow principle. It consists of a digital camera, gyroscopes, a range sensor, and an embedded processor for image processing

Figure 3.9 The different steps of a typical structure from motion (SfM) pipeline to compute 3D data from image data. The arrows on the right depict the additional sensor data provided from a MAV platform and highlight for which steps in the pipeline it can be used

Figure 3.10 A 3D map generated from image data of three individual MAVs using MAVMAP. (a) 3D point cloud including MAVs' trajectories (camera poses are shown in red). (b) Detailed view of a part of the 3D map from a viewpoint originally not observed from the MAVs

Figure 3.11 Environment represented as a 3D occupancy grid suitable for path planning and MAV navigation. Blue blocks are the occupied parts of the environment

Figure 3.12 Live view from a MAV with basic scene interpretation capabilities. The MAV detects faces and pre-trained objects (e.g., the exit sign) and marks them in the live view

Chapter 4: Exploring the Seafloor with Underwater Robots

Figure 4.1 (a) Example of

backscattering

due to the reflection of rays from the light source on particles in suspension, hindering the identification of the seafloor texture. (b) Image depicting the effects produced by

light attenuation

of the water resulting in an evident loss of luminance in the regions farthest from the focus of the artificial lighting. (c) Example of the image acquired in shallow waters showing sunflickering patterns. (d) Image showing a generalized blurred appearance due to the small-angle forward-scattering phenomenon

Figure 4.2 Refracted sunlight creates illumination patterns on the seafloor, which vary in space and time following the dynamics of surface waves

Figure 4.3 Scheme of underwater image formation with natural light as main illumination source. The signal reaching the camera is composed of two main components: attenuated direct light coming from the observed object and water-scattered natural illumination along this propagation path. Attenuation is due to both scattering and absorption

Figure 4.4 Absorption and scattering coefficients of pure seawater. Absorption (solid line (a)) and scattering (dotted line (b)) coefficients for pure seawater, as determined and given by Smith and Baker (1981) and

Figure 4.5 Image dehazing. Example of underwater image restoration in low to extreme low visibility conditions

Figure 4.6 Loop-closure detection. As the camera moves, there is an increasing uncertainty related to both the camera pose and the environment map. At instant , the camera revisits a region of the scene previously visited at instant . If the visual observations between instants and can be associated, the resulting information not only can be used to reduce the pose and map uncertainties at instant but also can be propagated to reduce the uncertainties at prior instants

Figure 4.7 BoW image representation. Images are represented by histograms of generalized visual features

Figure 4.8 Flowchart of OVV and image indexing. In every frames, the vocabulary is updated with new visual features extracted from the last frames. The complete set of features in the vocabulary is then merged until convergence. The obtained vocabulary is used to index the last images. Also, the previously indexed frames are re-indexed to reflect the changes in the vocabulary

Figure 4.9 Sample 2D FLS image of a chain in turbid waters

Figure 4.10 FLS operation. The sonar emits an acoustic wave spanning its beam width in the azimuth () and elevation () directions. Returned sound energy is sampled as a function of () and can be interpreted as the mapping of 3D points onto the zero-elevation plane (shown in red)

Figure 4.11 Sonar projection geometry. A 3D point is mapped onto a point on the image plane along the arc defined by the elevation angle. Considering an orthographic approximation, the point is mapped onto , which is equivalent to considering that all scene points rest on the plane (in red)

Figure 4.12 Overall Fourier-based registration pipeline

Figure 4.13 Example of the denoising effect obtained by intensity averaging. (a) Single frame gathered with a DIDSON sonar (Sou 2015) operating at its lower frequency (1.1 Mhz). (b) Fifty registered frames from the same sequence blended by averaging the overlapping intensities. See how the SNR increases and small details pop-out.

Chapter 5: Vision-Based Advanced Driver Assistance Systems

Figure 5.1 Typical coverage of cameras. For the sake of clarity of the illustrations, the actual cone-shaped volumes that the sensors see are shown as triangles

Figure 5.2 Forward assistance

Figure 5.3 Traffic sign recognition

Figure 5.4 The main steps of pedestrian detection together with the main processes carried out in each module

Figure 5.5 Different approaches in Intelligent Headlamp Control (Lopez et al. (2008a)). On the top, traditional low beams that reach low distances. In the middle, the beams are dynamically adjusted to avoid glaring the oncoming vehicle. On the bottom, the beams are optimized to maximize visibility while avoiding glaring by the use of LED arrays

Figure 5.6 Enhanced night vision. Thanks to infrared sensors the system is capable of distinguishing hot objects (e.g., car engines, pedestrians) from the cold road or surrounding natural environment

Figure 5.7 Intelligent active suspension.

Figure 5.8 Lane Departure Warning (LDW) and Lane Keeping System (LKS)

Figure 5.9 Parking Assistance. Sensors' coverages are shown as 2D shapes to improve visualization

Figure 5.10 Drowsiness detection based on PERCLOS and an NIR camera

Figure 5.11 Summary of the relevance of several technologies in each ADAS: in increasing relevance as null, low, useful, and high

Chapter 6: Application Challenges from a Bird's-Eye View

Figure 6.1 A few examples of MAVs. From left to right: the senseFly eBee, the DJI Phantom, the hybrid XPlusOne, and the FESTO BioniCopter

Figure 6.2 (a) Autonomous MAV exploration of an unknown, indoor environment using RGB-D sensor (image courtesy of Shen et al. (2012)). (b) Autonomous MAV exploration of an unknown, indoor environment using a single onboard camera (image courtesy of Faessler et al. (2015b))

Figure 6.3 Probabilistic depth estimate in SVO. Very little motion is required by the MAV (marked in black at the top) for the uncertainty of the depth filters (shown as magenta lines) to converge.

Figure 6.4 Autonomous recovery after throwing the quadrotor by hand: (a) the quadrotor detects free fall and (b) starts to control its attitude to be horizontal. Once it is horizontal, (c) it first controls its vertical velocity and then (d) its vertical position. The quadrotor uses its horizontal motion to initialize its visual-inertial state estimation and uses it (e) to first break its horizontal velocity and then (f) lock to the current position.

Figure 6.5 (a) A quadrotor is flying over a destroyed building. (b) The reconstructed elevation map. (c) A quadrotor flying in an indoor environment. (d) The quadrotor executing autonomous landing. The detected landing spot is marked with a green cube. The blue line is the trajectory that the MAV flies to approach the landing spot. Note that the elevation map is local and of fixed size; its center lies always below the quadrotor's current position.

Chapter 7: Application Challenges of Underwater Vision

Figure 7.1 Underwater mosaicing pipeline scheme. The

Topology Estimation

,

Image Registration

, and

Global Alignment

steps can be performed iteratively until no new overlapping images are detected

Figure 7.2 Topology estimation scheme. (a) Final trajectory obtained by the scheme proposed in Elibol et al. (2010). The first image frame is chosen as a global frame, and all images are then translated in order to have positive values in the axes. The and axes are in pixels, and the scale is approximately 150 pixels per meter. The plot is expressed in pixels instead of meters since the uncertainty of the sensor used to determine the scale (an acoustic altimeter) is not known. The red lines join the time-consecutive images while the black ones connect non-time-consecutive overlapping image pairs. The total number of overlapping pairs is 5412. (b) Uncertainty in the final trajectory. Uncertainty of the image centers is computed from the covariance matrix of the trajectory (Ferrer et al. 2007). The uncertainty ellipses are drawn with a 95% confidence level. (c) Mosaic built from the estimated trajectory

Figure 7.3 Geometric registration of two different views (a and b) of the same underwater scene by means of a planar transformation, rendering the first image on top (c) and the second image on top (d)

Figure 7.4 Main steps involved in the pairwise registration process. The feature extraction step can be performed in both images of the pair, or only in one. In this last case, the features are identified in the second image after an optional image warping based on a transformation estimation

Figure 7.5 Example of error accumulation from registration of sequential images. The same benthic structures appear in different locations of the mosaic due to error accumulation (trajectory drift)

Figure 7.6 Photomosaic built from six images of two megapixels. The mosaic shows noticeable seams in (a), where the images have only been geometrically transformed and sequentially rendered on the final mosaic canvas, the last image on top of the previous one. After applying a blending algorithm, the artifacts (image edges) disappear from the resulting mosaic (b).

Figure 7.7 2.5D map of a Mid-Atlantic Ridge area of approximately resulting from the combination of a bathymetry and a blended photomosaic of the generated high-resolution images. The obtained scene representation provides scientists with a global view of the interest area as well as with detailed optical information acquired at a close distance to the seafloor. Data courtesy of Javier Escartin (CNRS/IPGP, France)

Figure 7.8 (a) Trajectory used for mapping an underwater chimney at a depth of about 1700 m in the Mid-Atlantic ridge (pose frames in red/green/blue corresponding to the axis). We can see the camera pointing always toward the object in a forward-looking configuration. The shape of the object shown was recovered using our approach presented in Campos et al. (2015). Note the difference in the level of detail when compared with a 2.5D representation of the same area obtained using a multibeam sensor in (b). The trajectory followed in (b) was downward-looking, hovering over the object, but for the sake of comparison we show the same trajectory as in (a). Finally, (c) shows the original point cloud, retrieved through optical-based techniques, that was used to generate the surface in (a). Note the large levels of both noise and outliers that this data set contains.

Figure 7.9 A sample of surface processing techniques that can be applied to the reconstructed surface. (a) Original; (b) remeshed; (c) simplified

Figure 7.10 Texture mapping process, where the texture filling a triangle in the 3D model is extracted from the original images. Data courtesy of Javier Escartin (CNRS/IPGP, France)

Figure 7.11 Seafloor classification example on a mosaic image of a reef patch in the Red Sea, near Eilat, covering approximately 3 6 m. (a) Original mosaic. (b) Classification image using five classes: Brain Coral (green), Favid Coral (purple), Branching Coral (yellow), Sea Urchin (pink), and Sand (gray).

Figure 7.12 Ship hull inspection mosaic. Data gathered with HAUV using DIDSON FLS.

Figure 7.13 Harbor inspection mosaic. Data gathered from an Autonomous Surface Craft with BlueView P900-130 FLS.

Figure 7.14

Cap de Vol

shipwreck mosaic: (a) acoustic mosaic and (b) optical mosaic

Computer Vision in Vehicle Technology

Land, Sea, and Air

 

Edited by

 

Antonio M. López

Computer Vision Center (CVC) and Universitat Autònoma de Barcelona, Spain

 

Atsushi Imiya

Chiba University, Japan

 

Tomas Pajdla

Czech Technical University, Czech Republic

 

Jose M. Álvarez

National Information Communications Technology Australia (NICTA), Canberra Research Laboratory, Australia

 

 

 

 

 

This edition first published 2017

© 2017 John Wiley & Sons Ltd

Registered office

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought

Library of Congress Cataloging-in-Publication Data

Names: López, Antonio M., 1969- editor. | Imiya, Atsushi, editor. | Pajdla, Tomas, editor. | Álvarez, J. M. (Jose M.), editor.

Title: Computer vision in vehicle technology : land, sea and air / Editors Antonio M. López, Atsushi Imiya, Tomas Pajdla, Jose M. Álvarez.

Description: Chichester, West Sussex, United Kingdom : John Wiley & Sons, Inc., [2017] | Includes bibliographical references and index.

Identifiers: LCCN 2016022206 (print) | LCCN 2016035367 (ebook) | ISBN 9781118868072 (cloth) | ISBN 9781118868041 (pdf) | ISBN 9781118868058 (epub)

Subjects: LCSH: Computer vision. | Automotive telematics. | Autonomous vehicles-Equipment and supplies. | Drone aircraft-Equipment and supplies. | Nautical instruments.

Classification: LCC TL272.53 .L67 2017 (print) | LCC TL272.53 (ebook) | DDC 629.040285/637-dc23

LC record available at https://lccn.loc.gov/2016022206

A catalogue record for this book is available from the British Library.

Cover image: jamesbenet/gettyimages; groveb/gettyimages; robertmandel/gettyimages

ISBN: 9781118868072

List of Contributors

Ricard Campos

, Computer Vision and Robotics Institute, University of Girona, Spain

 

Arturo de la Escalera

, Laboratorio de Sistemas Inteligentes, Universidad Carlos III de Madrid, Spain

 

Armagan Elibol

, Department of Mathematical Engineering, Yildiz Technical University, Istanbul, Turkey

 

Javier Escartin

, Institute of Physics of Paris Globe, The National Centre for Scientific Research, Paris, France

 

Uwe Franke

, Image Understanding Group, Daimler AG, Sindelfingen, Germany

 

Friedrich Fraundorfer

, Institute for Computer Graphics and Vision, Graz University of Technology, Austria

 

Rafael Garcia

, Computer Vision and Robotics Institute, University of Girona, Spain

 

David Gerónimo

, ADAS Group, Computer Vision Center, Universitat Autònoma de Barcelona, Spain

 

Nuno Gracias

, Computer Vision and Robotics Institute, University of Girona, Spain

 

Ramon Hegedus

, Max Planck Institute for Informatics, Saarbruecken, Germany

 

Natalia Hurtos

, Computer Vision and Robotics Institute, University of Girona, Spain

 

Reinhard Klette

, School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, New Zealand

 

Antonio M. López

, ADAS Group, Computer Vision Center (CVC) and Computer Science Department, Universitat Autònoma de Barcelona (UAB), Spain

 

Laszlo Neumann

, Computer Vision and Robotics Institute, University of Girona, Spain

 

Tudor Nicosevici

, Computer Vision and Robotics Institute, University of Girona, Spain

 

Ricard Prados

, Computer Vision and Robotics Institute, University of Girona, Spain

 

Davide Scaramuzza

, Robotics and Perception Group, University of Zurich, Switzerland

 

ASM Shihavuddin

, École Normale Supérieure, Paris, France

 

David Vázquez

, ADAS Group, Computer Vision Center, Universitat Autònoma de Barcelona, Spain

 

Preface

This book was born following the spirit of the Computer Vision in Vehicular Technology (CVVT) Workshop. At the moment of finishing this book, the 7th CVVT Workshop CVPR'2016 is being held in Las Vegas. Previous CVVT Workshops include the CVPR'2015 in Boston (http://adas.cvc.uab.es/CVVT2015/), ECCV'2014 in Zurich (http://adas.cvc.uab.es/CVVT2014/), ICCV'2013 in Sydney (http://adas.cvc.uab.es/CVVT2013/), ECCV'2012 in Firenze (http://adas.cvc.uab.es/CVVT2012/), ICCV'2011 in Barcelona (http://adas.cvc.uab.es/CVVT2011/), and ACCV'2010 in Queenstown (http://www.media.imit.chiba-u.jp/CVVT2010/). This implies throughout these years, many invited speakers, co-organizers, contributing authors, and sponsors have helped to keep CVVT alive and exciting. We are enormously grateful to all of them! Of course, we also want to give special thanks to the authors of this book, who kindly accepted the challenge of writing their respective chapters.

He would also like to thank the past and current members of the Advanced Driver Assistance Systems (ADAS) group of the Computer Vision Center at the Universitat Autònoma de Barcelona. He also would like to thank his current public funding, in particular, Spanish MEC project TRA2014-57088-C2-1-R, Spanish DGT project SPIP2014-01352, and the Generalitat de Catalunya project 2014-SGR-1506. Finally, he would like to thank NVIDIA Corporation for the generous donations of different graphical processing hardware units, and especially for their kind support regarding the ADAS group activities.

Tomas Pajdla has been supported by EU H2020 Grant No. 688652 UP-Drive and Institutional Resources for Research of the Czech Technical University in Prague.

Atsushi Imiya was supported by IMIT Project Pattern Recognition for Large Data Sets from 2010 to 2015 at Chiba University, Japan.

Jose M. Álvarez was supported by the Australian Research Council through its Special Research Initiative in Bionic Vision Science and Technology grant to Bionic Vision Australia. The National Information Communications Technology Australia was founded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Center of Excellence Program.

The book is organized into seven self-contained chapters related to CVVT topics, and a final short chapter with the overall final remarks. Briefly, in Chapter 1, there is a quick overview of the main ideas that link computer vision with vehicles. Chapters 2–7 are more specialized and divided into two blocks. Chapters 2–4 focus on the use of computer vision for the self-navigation of the vehicles. In particular, Chapter 2 focuses on land (autonomous cars), Chapter 3 focuses on air (micro aerial vehicles), and Chapter 4 focuses on sea (underwater robotics). Analogously, Chapters 5–7 focus on the use of computer vision as a technology to solve specific applications beyond self-navigation. In particular, Chapter 5 focuses on land (ADAS), and Chapters 6 and 7 on air and sea, respectively. Finally, Chapter 8 concludes and points out new research trends.

Antonio M. López

Computer Vision Center (CVC) andUniversitat Autònoma de Barcelona, Spain

Abbreviations and Acronyms

ACC

adaptive cruise control

ADAS

advanced driver assistance system

AUV

autonomous underwater vehicle

BA

bundle adjustment

BCM

brightness constancy model

BoW

bag of words

CAN

controller area network

CLAHE

contrast limited adaptive histogram equalization

COTS

crown of thorns starfish

DCT

discrete cosine transforms

DOF

degree of freedom

DVL

Doppler velocity log

EKF

extended Kalman filter

ESC

electronic stability control

FCA

forward collision avoidance

FEM

finite element method

FFT

fast Fourier transform

FIR

far infrared

FLS

forward-looking sonar

GA

global alignment

GDIM

generalized dynamic image model

GLCM

gray level co-occurrence matrix

GPS

global positioning system

GPU

graphical processing unit

HDR

high dynamic range

HOG

histogram of gradients

HOV

human operated vehicle

HSV

hue saturation value

IR

infrared

KPCA

kernel principal component analysis

LBL

long baseline

LBP

local binary patterns

LCA

lane change assistance

LDA

linear discriminant analysis

LDW

lane departure warning

LHC

local homogeneity coefficient

LKS

lane keeping system

LMedS

least median of squares

MEX

MATLAB executable

MLS

moving least squares

MR

maximum response

MST

minimum spanning tree

NCC

normalized chromaticity coordinates

NDT

normal distribution transform

NIR

near infrared

OVV

online visual vocabularies

PCA

principal component analysis

PDWMD

probability density weighted mean distance

PNN

probabilistic neural network

RANSAC

random sample consensus

RBF

radial basis function

ROD

region of difference

ROI

region of interest

ROV

remotely operated vehicle

SDF

signed distance function

SEF

seam-eliminating function

SIFT

scale invariant feature transform

SLAM

simultaneous localization and mapping

SNR

signal-to-noise ratio

SSD

sum of squared differences

SURF

speeded up robust features

SVM

support vector machine

TJA

traffic jam assist

TSR

traffic sign recognition

TV

total variation

UDF

unsigned distance function

USBL

ultra short base line

UUV

unmanned underwater vehicle

UV

underwater vehicle

Chapter 1Computer Vision in Vehicles

Reinhard Klette

School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand

This chapter is a brief introduction to academic aspects of computer vision in vehicles. It briefly summarizes basic notation and definitions used in computer vision. The chapter discusses a few visual tasks as of relevance for vehicle control and environment understanding.

1.1 Adaptive Computer Vision for Vehicles

Computer vision designs solutions for understanding the real world by using cameras. See Rosenfeld (1969), Horn (1986), Hartley and Zisserman (2003), or Klette (2014) for examples of monographs or textbooks on computer vision.

Computer vision operates today in vehicles including cars, trucks, airplanes, unmanned aerial vehicles (UAVs) such as multi-copters (see Figure 1.1 for a quadcopter), satellites, or even autonomous driving rovers on the Moon or Mars.

Figure 1.1 (a) Quadcopter. (b) Corners detected from a flying quadcopter using a modified FAST feature detector.

Courtesy of Konstantin Schauwecker

In our context, the ego-vehicle is that vehicle where the computer vision system operates in; ego-motion describes the ego-vehicle's motion in the real world.

1.1.1 Applications

Computer vision solutions are today in use in manned vehicles for improved safety or comfort, in autonomous vehicles (e.g., robots) for supporting motion or action control, and also for misusing UAVs for killing people remotely. The UAV technology has also good potentials for helping to save lives, to create three-dimensional (3D) models of the environment, and so forth. Underwater robots and unmanned sea-surface vehicles are further important applications of vision-augmented vehicles.

1.1.2 Traffic Safety and Comfort

Traffic safety is a dominant application area for computer vision in vehicles. Currently, about 1.24 million people die annually worldwide due to traffic accidents (WHO 2013), this is, on average, 2.4 people die per minute in traffic accidents. How does this compare to the numbers Western politicians are using for obtaining support for their “war on terrorism?” Computer vision can play a major role in solving the true real-world problems (see Figure 1.2). Traffic-accident fatalities can be reduced by controlling traffic flow (e.g., by triggering automated warning signals at pedestrian crossings or intersections with bicycle lanes) using stationary cameras, or by having cameras installed in vehicles (e.g., for detecting safe distances and adjusting speed accordingly, or by detecting obstacles and constraining trajectories).

Figure 1.2 The 10 leading causes of death in the world. Chart provided online by the World Health Organization (WHO). Road injury ranked number 9 in 2011

Computer vision is also introduced into modern cars for improving driving comfort. Surveillance of blind spots, automated distance control, or compensation of unevenness of the road are just three examples for a wide spectrum of opportunities provided by computer vision for enhancing driving comfort.

1.1.3 Strengths of (Computer) Vision

Computer vision is an important component of intelligent systems for vehicle control (e.g., in modern cars, or in robots). The Mars rovers “Curiosity” and “Opportunity” operate based on computer vision; “Opportunity” has already operated on Mars for more than ten years. The visual system of human beings provides a proof of existence that vision alone can deliver nearly all of the information required for steering a vehicle. Computer vision aims at creating comparable automated solutions for vehicles, enabling them to navigate safely in the real world. Additionally, computer vision can also work constantly “at the same level of attention,” applying the same rules or programs; a human is not able to do so due to becoming tired or distracted.

A human applies accumulated knowledge and experience (e.g., supporting intuition), and it is a challenging task to embed a computer vision solution into a system able to have, for example, intuition. Computer vision offers many more opportunities for future developments in a vehicle context.

1.1.4 Generic and Specific Tasks

There are generic visual tasks such as calculating distance or motion, measuring brightness, or detecting corners in an image (see Figure 1.1b). In contrast, there are specific visual tasks such as detecting a pedestrian, understanding ego-motion, or calculating the free space a vehicle may move in safely in the next few seconds. The borderline between generic and specific tasks is not well defined.

Solutions for generic tasks typically aim at creating one self-contained module for potential integration into a complex computer vision system. But there is no general-purpose corner detector and also no general-purpose stereo matcher. Adaptation to given circumstances appears to be the general way for an optimized use of given modules for generic tasks.

Solutions for specific tasks are typically structured into multiple modules that interact in a complex system.

Example 1.1.1 Specific Tasks in the Context of Visual Lane Analysis

Shin et al. (2014) review visual lane analysis for driver-assistance systems or autonomous driving. In this context, the authors discuss specific tasks such as “the combination of visual lane analysis with driver monitoring..., with ego-motion analysis..., with location analysis..., with vehicle detection..., or with navigation....” They illustrate the latter example by an application shown in Figure 1.3: lane detection and road sign reading, the analysis of GPS data and electronic maps (e-maps), and two-dimensional (2D) visualization are combined into a real-view navigation system (Choi et al. 2010).

Figure 1.3 Two screenshots for real-view navigation.

Courtesy of the authors of Choi et al. (2010)

1.1.5 Multi-module Solutions

Designing a multi-module solution for a given task does not need to be more difficult than designing a single-module solution. In fact, finding solutions for some single modules (e.g., for motion analysis) can be very challenging. Designing a multi-module solution requires:

1.

that modular solutions are available and known,

2.

tools for evaluating those solutions in dependency of a given situation (or

scenario

; see Klette et al. (2011) for a discussion of scenarios) for being able to select (or adapt) solutions,

3.

conceptual thinking for designing and controlling an appropriate multi-module system,

4.

a system optimization including a more extensive testing on various scenarios than for a single module (due to the increase in combinatorial complexity of multi-module interactions), and

5.

multiple modules require control (e.g., when many designers separately insert processors for controlling various operations in a vehicle, no control engineer should be surprised if the vehicle becomes unstable).

1.1.6 Accuracy, Precision, and Robustness

Solutions can be characterized as being accurate, precise, or robust. Accuracy means a systematic closeness to the true values for a given scenario. Precision also considers the occurrence of random errors; a precise solution should lead to about the same results under comparable conditions. Robustness means approximate correctness for a set of scenarios that includes particularly challenging ones: in such cases, it would be appropriate to specify the defining scenarios accurately, for example, by using video descriptors (Briassouli and Kompatsiaris 2010) or data measures (Suaste et al. 2013). Ideally, robustness should address any possible scenario in the real world for a given task.

1.1.7 Comparative Performance Evaluation

An efficient way for a comparative performance analysis of solutions for one task is by having different authors testing their own programs on identical benchmark data. But we not only need to evaluate the programs, we also need to evaluate the benchmark data used (Haeusler and Klette 2010 2012) for identifying their challenges or relevance.

Benchmarks need to come with measures for quantifying performance such that we can compare accuracy on individual data or robustness across a diversity of different input data.

Figure 1.4 illustrates two possible ways for generating benchmarks, one by using computer graphics for rendering sequences with accurately known ground truth,1 and the other one by using high-end sensors (in the illustrated case, ground truth is provided by the use of a laser range-finder).2

Figure 1.4 Examples of benchmark data available for a comparative analysis of computer vision algorithms for motion and distance calculations. (a) Image from a synthetic sequence provided on EISATS with accurate ground truth. (b) Image of a real-world sequence provided on KITTI with approximate ground truth

But those evaluations need to be considered with care since everything is not comparable. Evaluations depend on the benchmark data used; having a few summarizing numbers may not be really of relevance for particular scenarios possibly occurring in the real world. For some input data we simply can not answer how a solution performs; for example, in the middle of a large road intersection, we cannot answer which lane border detection algorithm performs best for this scenario.

1.1.8 There Are Many Winners

We are not so naive to expect an all-time “winner” when comparatively evaluating computer vision solutions. Vehicles operate in the real world (whether on Earth, the Moon, or on Mars), which is so diverse that not all of the possible event occurrences can be modeled in underlying constraints for a designed program. Particular solutions perform differently for different scenarios, and a winning program for one scenario may fail for another. We can only evaluate how particular solutions perform for particular scenarios. At the end, this might support an optimization strategy by adaptation to a current scenario that a vehicle experiences at a time.

1.2 Notation and Basic Definitions

The following basic notations and definitions (Klette 2014) are provided.

1.2.1 Images and Videos

An image is defined on a set

1.1

of pairs of integers (pixel locations), called the image carrier, where and define the number of columns and rows, respectively. We assume a left-hand coordinate system with the coordinate origin in the upper-left corner of the image, the -axis to the right, and the -axis downward. A pixel of an image combines a location in the carrier with the value of at this location.

A scalar image takes values in a set , typically with , , or . A vector-valued image has scalar values in a finite number of channels or bands. A video or image sequence consists of frames, for , all being images on the same carrier .

Example 1.2.1 Three Examples

In case of an RGB color image , we have pixels .

A geometrically rectified gray-level stereo image or frame consists of two channels and , usually called left and right images; this is implemented in the multi-picture object (mpo) format for images (CIPA 2009).

For a sequence of gray-level stereo images, we have pixel in frame , which is the combined representation of pixels and in and , respectively, at pixel location and time .

1.2.1.1 Gauss Function

The zero-mean Gauss function is defined as follows:

1.2

A convolution of an image with the Gauss function produces smoothed images

1.3

also known as Gaussians, for . (We stay with symbol here as introduced by Lindeberg (1994) for “layer”; a given context will prevent confusion with the left image of a stereo pair.)

1.2.1.2 Edges

Step-edges in images are detected based on first- or second-order derivatives, such as values of the gradient or the Laplacian given by

1.4

Local maxima of - or -magnitudes or , or zero-crossings of values