139,99 €
While 3D vision has existed for many years, the use of 3D cameras and video-based modeling by the film industry has induced an explosion of interest for 3D acquisition technology, 3D content and 3D displays. As such, 3D video has become one of the new technology trends of this century. The chapters in this book cover a large spectrum of areas connected to 3D video, which are presented both theoretically and technologically, while taking into account both physiological and perceptual aspects. Stepping away from traditional 3D vision, the authors, all currently involved in these areas, provide the necessary elements for understanding the underlying computer-based science of these technologies. They consider applications and perspectives previously unexplored due to technological limitations. This book guides the reader through the production process of 3D videos; from acquisition, through data treatment and representation, to 3D diffusion. Several types of camera systems are considered (multiscopic or multiview) which lead to different acquisition, modeling and storage-rendering solutions. The application of these systems is also discussed to illustrate varying performance benefits, making this book suitable for students, academics, and also those involved in the film industry.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 674
Veröffentlichungsjahr: 2013
Table of Contents
Foreword
Notations
Acknowledgments
Introduction
PART 1. 3D ACQUISITION OF SCENES
Chapter 1: Foundation
1.1. Introduction
1.2. A short history
1.3. Stereopsis and 3D physiological aspects
1.4. 3D computer vision
1.5. Conclusion
1.6. Bibliography
Chapter 2: Digital Cameras: Definitions and Principles
2.1. Introduction
2.2. Capturing light: physical fundamentals
2.3. Digital camera
2.4. Cameras, human vision and color
2.5. Improving current performance
2.6. Conclusion
2.7. Bibliography
Chapter 3: Multiview Acquisition Systems
3.1. Introduction: what is a multiview acquisition system?
3.2. Binocular systems
3.3. Lateral or directional multiview systems
3.4. Global or omnidirectional multiview systems
3.5. Conclusion
3.6. Bibliography
Chapter 4: Shooting and Viewing Geometries in 3DTV
4.1. Introduction
4.2. The geometry of 3D viewing
4.3. The geometry of 3D shooting
4.4. Geometric impact of the 3D workflow
4.5. Specification methodology for multiscopic shooting
4.6. OpenGL implementation
4.7. Conclusion
4.8. Bibliography
Chapter 5: Camera Calibration: Geometric and Colorimetric Correction
5.1. Introduction
5.2. Camera calibration
5.3. Radial distortion
5.4. Image rectification
5.5. Colorimetric considerations in cameras
5.6. Conclusion
5.7. Bibliography
PART 2. DESCRIPTION/ RECONSTRUCTION OF 3D SCENES
Chapter 6: Feature Points Detection and Image Matching
6.1. Introduction
6.2. Feature points
6.3. Feature point descriptors
6.4. Image matching
6.5. Conclusion
6.6. Bibliography
Chapter 7: Multi- and Stereoscopic Matching, Depth and Disparity
7.1. Introduction
7.2. Difficulties, primitives and stereoscopic matching
7.3. Simplified geometry and disparity
7.4. A description of stereoscopic and multiscopic methods
7.5. Methods for explicitly accounting for occlusions
7.6. Conclusion
7.7. Bibliography
Chapter 8: 3D Scene Reconstruction and Structuring
8.1. Problems and challenges
8.2. Silhouette-based reconstruction
8.3. Industrial application
8.4. Temporally structuring reconstructions
8.5. Conclusion
8.6. Bibliography
Chapter 9: Synthesizing Intermediary Viewpoints
9.1. Introduction
9.2. Viewpoint synthesis by interpolation and extrapolation
9.3. Inpainting uncovered zones
9.4. Conclusion
9.5. Bibliography
PART 3. STANDARDS AND COMPRESSION OF 3D VIDEO
Chapter 10: Multiview Video Coding (MVC)
10.1. Introduction
10.2. Specific approaches to stereoscopy
10.3. Multiview approaches
10.4. Conclusion
10.5. Bibliography
Chapter 11: 3D Mesh Compression
11.1. Introduction
11.2. Compression basics: rate-distortion trade-off
11.3. Multiresolution coding of surface meshes
11.4. Topological and progressive coding
11.5. Mesh sequence compression
11.6. Quality evaluation: classic and perceptual metrics
11.7. Conclusion
11.8. Bibliography
Chapter 12: Coding Methods for Depth Videos
12.1. Introduction
12.2. Analyzing the characteristics of a depth map
12.3. Depth coding methods
12.4. Conclusion
12.5. Bibliography
Chapter 13: StereoscopicWatermarking
13.1. Introduction
13.2. Constraints of stereoscopic video watermarking
13.3. State of the art for stereoscopic content watermarking
13.4. Comparative study
13.5. Conclusions
13.6. Bibliography
PART 4. RENDERING AND 3D DISPLAY
Chapter 14: HD 3DTV and Autostereoscopy
14.1. Introduction
14.2. Technological principles
14.3. Design of mixing filters
14.4. View generation and interleaving
14.5. Future developments
14.6. Conclusion
14.7. Bibliography
Chapter 15: Augmented and/or Mixed Reality
15.1. Introduction
15.2. Real-time pose computation
15.3. Model acquisition
15.4. Conclusion
15.5. Bibliography
Chapter 16: Visual Comfort and Fatigue in Stereoscopy
16.1. Introduction
16.2. Visual comfort and fatigue: definitions and indications
16.3. Signs and symptoms of fatigue and discomfort
16.4. Sources of visual fatigue and discomfort
16.5. Application to 3D content and technologies
16.6. Predicting visual fatigue and discomfort: first models
16.7. Conclusion
16.8. Bibliography
Chapter 17: 2D–3D Conversion
17.1. Introduction
17.2. The 2D–3D conversion workflow
17.3. Preparing content for conversion
17.4. Conversion stages
17.5. 3D–3D conversion
17.6. Conclusion
17.7. Bibliography
PART 5. IMPLEMENTATION AND OUTLETS
Chapter 18: 3D Model Retrieval
18.1. Introduction
18.2. General principles of shape retrieval
18.3. Global 3D shape descriptors
18.4. 2D view oriented methods
18.5. Local 3D shape descriptors
18.6. Similarity between 3D shapes
18.7. Shape recognition in 3D video
18.8. Evaluation of the performance of indexing methods
18.9. Applications
18.10. Conclusion
18.11. Bibliography
Chapter 19: 3D HDR Images and Videos: Acquisition and Restitution
19.1. Introduction
19.2. HDR and 3D acquisition
19.3. 3D HDR restitution
19.4. Conclusion
19.5. Bibliography
Chapter 20: 3D Visualization for Life Sciences
20.1. Introduction
20.2. Scientific visualization
20.3. Medical imaging
20.4. Molecular modeling
20.5. Conclusion
20.6. Bibliography
Chapter 21: 3D Reconstruction of Sport Scenes
21.1. Introduction
21.2. Automatic selection of a region of interest (ROI)
21.3. The Hough transform
21.4. Matching image features to the geometric model
21.5. Conclusion
21.6. Bibliography
Chapter 22: Experiments in Live Capture and Transmission of Stereoscopic 3D Video Images
22.1. Introduction
22.2. Retransmissions of various shows
22.3. Retransmissions of surgical operations
22.4. Retransmissions of “steadicam” interviews
22.5. Retransmission of a transatlantic video presentation
22.6. Retransmissions of bicycle races
22.7. Conclusion
22.8. Bibliography
Conclusion
List of Authors
Index
First published 2013 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd 27-37 St George’s Road London SW19 4EU UK
John Wiley & Sons, Inc. 111 River Street Hoboken, NJ 07030 USA
www.iste.co.uk
www.wiley.com
© ISTE Ltd 2013
The rights of Laurent Lucas, Céline Loscos and Yannick Remion to be identified as the author of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2013947317
British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library
Foreword
The concept of giving 3D sense to flat representations (drawings, paintings, photos and films) has been progressively and deliberately re-examined and considered since the beginning of time. The rock paintings of Altamira (Spain) and Font-de-Gaume (France), for example, provide a fascinating example of the muscular systems of large herbivores. In the Lascaux cave (France), the shape of the rocks has been used to support and even accentuate the painting’s form. All ancient art everywhere has, in some way or another, used depth and perspective in its representations, often awkwardly or confused, erroneous, often using more or less shared social codes, but always with the objective of understanding the real world beyond the limits of flat representation.
Formalized understanding of the mechanisms of Quattrocento perspective has largely enabled artists to move away from flat media to new, more accurate methods which have been used widely, often with competing artistic objectives and technical abilities. Complete perspective has therefore become an inseparable part of all pictures to the point of no longer even being a point of discussion: whether boring or shocking, controversial or exposing, it is no longer obvious because it is expected.
The dawn of photography, which by definition respects the canons of perspective, the undoubted problem of traditional representation, allowed artists to move away from this new norm which, over three centuries, had governed real-life representation. Artists can escape the unseen since, for example, space is no longer merely confined to perspective. Braque and Picasso, Klee and Bacon have shown us that this space is not only a matter of geometry but is also richer and holds several mysteries. Beyond perspective, it allows us to see background images and their convergence.
However, perspective, outside this small artistic field where it has somewhat faded, plays a vital role in our vision, logic and society. Unsurprisingly, the world of photography, as in painting, has quickly sought media which go beyond flat representation. Since the 19th Century, ingenious inventions have provided a third dimension to photography and then, with the dawn of cinema and its younger sibling television, it would not be long until 3D would make an impact, well before the Second World War. Binocular stereovision is the most natural input method for this mode, reliant on various separation means for optical paths, orthogonal polarizations, color decompositions, color flickers through wheels and mirrors and lens networks. Kerr’s or Pockel’s electromagnetic cells and liquid crystals will be examined later as part of this.
3D has not yet finished developing. Propelled by undeniable economic and social success, it has suffered from a lack of exploration followed by a new found success. The literature is evidence of this and that we are on the brink of a new dawn. However, current technologies are undeniably better than ever. Acquisition, projection, archiving and transmission technologies have come to fruition after long being suspended or in development. It has also been an opportunity and major development for production companies and commercial film distribution organizations, since virtual and augmented reality production has reached previously unseen levels of quality, performance and productivity which are indispensable for ambitious and demanding production sets. The public, expectant and demanding, desires new experiences which can be seen as evidence of the success of these new methods.
Finally, all these factors, which have made this dawn of 3D cinema possible, have played an important role in 3D television because these two fields, cinema and video, have a shared future. The opposing war between them, which has raged for 50 years, to capture an audience seen to favor one over the other, has now disappeared. We only have to think of the success of films on television or the continuation of television series through films. The public is omnivorous, consuming all kinds of images, no longer knowing whether they are from a dark room, a small screen or even a video game. This requires an abundance of pixels, bright, life-like colors and multi-sensory interaction and interactive and 3D animation, particularly when their counterparts exist in real-life but are transformed by video, as discussed in this book.
It is this which allows us to trace the progression of 3D, which has affected the entire chain of production for digital images. This book aims to examine ongoing events and describe their development, with a formal representation of theoretical tools in order to understand the approaches studied. References are provided to allow the reader to further study the developments that these numerous techniques relate to. Another aspect relates to examining all points in the technical chain which today governs 3D television. We will also examine technical tools such as cameras, screens and software. In addition, matching, detection and compression will be studied.
As a complete and complex work, 3D Video is a welcome to the current efforts and achievements which have accompanied the the emergence of this new addition to our homes, the 3D image.
Henri MAÎTRESeptember 2013
Notations
Spaces, sets
space dimension
real d-dimensional vector space
integer d-dimensional vector space
compact interval in
discrete interval in or
boolean set
set of
n
first natural integers
set of
b
–
a
integers connecting from
a
to
b
– 1
Objects
i
,
j
,
k
,
l
,
m
,
n
integer numbers
x, y, z
coordinates (integer or real)
t, u, v,
λ,
μ
real numbers
D,
Δ
real lines
P,
Π
real planes
v, w
vectors
A, B, C, …
points in the real affine space
2
or
3
AB
bi-point vector ranging from
A
to
B
M, A, B
matrices
R, T
rotation and translation matrices
f, g, h
applications, functions
Φ, Ψ
operators on other sets
d
,
d
rotation and translation function
G,
Γ
graphs
angles
ε
threshold
Multiviews
A set of N signals (known as views within the context of this book) of the same dimensions d and sizes will both be considered as a table of N signals and as a signal with a superior dimension of
global indexation space of a set of
N
views (images or volumes) with a dimension of
d
multi-signal,
N
views indexed by with values in
ε
digital view number
d
-dimensional signal
multiview sample index: position and digital image
different expressions shown as equivalents to reach the value in
ε
for the sample in position
p
in the view
i
from the last (double level indice) should be avoided
Acknowledgments
We are very grateful to those who have contributed to this book through their work and research. We would like to particularly express our gratitude to Henri Maître who has overseen the compilation of this book and has generously given us his help and support. We would also like to show our recognition to ISTE Ltd. and John Wiley & Sons who have greatly assisted as throughout the production of this book. Lastly, we would like to thank all those people and organizations who have allowed us to use their data and/or illustrations within this book.
Several pieces of research data shown in this book have been made possible because of the financial support from the following organizations:
– Regional organizations from three areas: projects including CIA (CPER Nord-Pas De Calais region in France), CREATIS (CPER Champagne-Ardenne region in France) and RUBI3 (Brittany with the Image et Réseaux (Image and Networks) cluster – 2010-13).
– The Agence nationale de la recherche (ANR) (National Agency for Research, France): including projects such as SEMANTIC-3D (AAP RNRT 2002-06), CAM-RELIEF (AAP RIAM 2008-10), FAR3D (AAP CSOSG 2008-2010), COLLAVIZ (AAP COSINUS 2009-12), PERSEE (AAP Blanc 2009-13) and 3D FaceAnalyzer (AAP Blanc Inter. 2011-13).
– Competition clusters and/or Fonds unique interministériel (FUI) (Inter ministerial Funds, France): including the projects FUTURIM@GE (Image et Réseaux (Image and Networks) – 2008-10), Terra Numerica (Cap Digital – 2006-09) and 3DLive (CapDigital, Imaginove, Images et Réseaux – 2009-12).
– The Fond national pour la société numérique (FSN) (National Fund for Digital Society, France): including the RECOVER3D project (2012-2014) for future investments.
– The European Commission: including projects such as 3D Média (FEDER 2008-13), 3D-ConTourNet (ICT COST Action IC1105), HDRi (ICT COST Action IC1005), Ed-cine (FP6 IP), Apidis (FP7 Strep) and JEDI (ITEA2, co financed by DGCIS for the French Ministry of Industry).
We would like to thank these institutions for their past and ongoing support.
Lastly, we would also like to recognize those who, by their patience, understanding and encouragement, have allowed us to bring this project to completion.
Laurent LUCAS, Céline LOSCOS and Yannick REMIONSeptember 2013
Introduction
The extension of visual content to 3D as well as dynamically capturing scenes in 3D to generate an image on a remote site in real time has long been considered merely a part of science fiction. Today they are a reality, collectively referred to by terms such as 3D television (3DTV), free viewpoint TV (FTV) and, more generally, 3D video. This new type of image creates the illusion of a real environment, resulting from continually improving efforts in research and development over a number of years.
Numerous experts believe that 3D represents the future of media, such as television and the Internet, and will in turn improve the quality of visual experiences for the end user. The whole chain of content production must be reconsidered, beginning with recording techniques, since those designed specifically for 3D are far more numerous and varied than those used normally in conventional 2D context. The same can also be said of other aspects, such as, for example:
– the description and representation of scenes according to more or less informative structures, ranging from multiview or multiview-plus-depth videos to 3D digital reconstructed models;
– 3D reconstruction which extracts 3D models in various forms from videos acquired from multiple viewpoints, such as static or animated meshes;
– the compression of representations of scenes created by capture (stereoscopic or multiview videos) or reconstruction (3D models);
– 3D display, with or without adaptation/enhancement of content and/or intermediate view synthesis.
The democratization of these technologies needs specifically designed display devices. Stereoscopic or autostereoscopic screens show a heavy tendency toward this while their use for displaying 3D content today still poses a number of problems, showing that all these techniques must yet be perfected to avoid being rejected by the end user due to reasons of poor quality and/or eyestrain.
3D videos therefore cover a multitude of aspects, collectively linking a series of recorded videos to full depth 3D visualizations, potentially using estimations of depth in video sources. The developments examined here are therefore based on methods and tools from highly varied fields, such as applied mathematics, computer imaging, computer graphics, virtual reality, signal processing as well as psychophysics and the psychology of human vision.
In this highly multidisciplinary context, the objective of this book focused on 3D video is twofold since it aims, in addition to summarizing current information about the subject, to provide:
– for students: a solid base enabling readers to carry out activities relating to this topic and to learn the underlying concepts overall;
– for researchers: as complete a reference for this subject as possible which precisely indicates current research and understanding in this field as well as future trends and perspectives.
Its organization into four parts is due to a desire to cover all phases of 3D video by bringing together formal presentations of theoretical tools and developments of more technical or technological aspects. It should be noted that all figures are also available in color at http://www.iste.co.uk/lucas/3D.zip.
The first part of this book runs through the basics of 3D video and the recording of its characterizing multiview videos. This begins with, in Chapter 1, the different fundamental aspects of this technology. Historical and mathematical aspects relating to 3D computer vision and physiology of human vision are thus presented. Chapters 2 and 3 look at technological and methodological problems in relation to capturing images, more specifically in Chapter 3 within a multiview context that characterizes 3D video. The specification of geometric elements relating to the recording and display of 3D media is then examined in Chapter 4. Chapter 5 concludes Part 1 of this book, focusing on the problems of geometric and colorimetric camera calibration.
Figure I.1.Organization of this book: the numbered chips correspond to the different chapters
Part 2 focuses on the description and reconstruction of 3D scenes. Chapters 6 and 7 analyze the problems of local feature detection, stereo-matching and stereo-correlation through dense depth estimation. Chapter 8 then presents different scene reconstruction methods, notably using silhouettes, providing an overview of the technical principles used to structure previously constructed 3D information. Finally, Chapter 9 provides an outline of intermediate view synthesis using images with depth information. Direct and inverse projection approaches are examined alongside a description of uncovered areas filling methods.
In Part 3, the field of compression and transmission norms for 3D content is covered. Chapter 10 in particular introduces 3D video formats as well as specific techniques for stereoscopic and multiview stream coding. In Chapter 11, the multiresolution compression of meshes and mesh sequences is examined in terms of standardization and visual perception. Chapter 12 then focuses on depth video coding while Chapter 13 presents the problem of protecting stereoscopic videos by watermarking the 3D stream.
In Part 4, aspects relating to 3D rendering and display are covered. This begins, in Chapter 14 with the implementation and use of autostereoscopy and is followed, in Chapter 15, by techniques relating to augmented reality. In Chapter 16, psychophysical effects relating to problems of eyestrain and visual discomfort are discussed with a specific examination of flaws in 3D content and technologies which generate these unusual stimuli. Chapter 17 focuses on the delicate problem of 2D-to-3D conversion which remains in between technology and the arts where human intervention remains indispensable.
The practical implementation of all these technologies and their applications are considered in Part 5, the final part of this book. Aspects relating to data mining (Chapter 18), high dynamic range videos (Chapter 19), biomedical visualization (chapter 20) and sport scene reconstruction (Chapter 21) are covered. This final part is concluded by an overview of experiments in live recording and transmitting of 3D stereoscopic videos (Chapter 22).
Introduction written by Laurent LUCAS, Céline LOSCOS and Yannick REMION.
Audiovisual production has, for a number of decades, used an increasing number of ever more sophisticated technologies to play 3D and 4D real and virtual content in long takes. Grouped under the term “3D video”, these technologies (motion capture (Mocap), augmented reality (AR) and free viewpoint TV (FTV) and 3DTV) complement one another and are jointly incorporated into modern productions. It is now common practice to propose AR scenes in FTV or 3DTV, either virtual or real, whether this relates to actors, sets or extras, giving virtual characters (both actors and extras) realistic movements and expressions obtained by Mocap, and even credible behavior managed by artificial intelligence.
With the success of films such as The Matrix in 1999 and Avatar in 2009 (see Figure 1.1), the acronym “3D” has become a major marketing tool for large audiovisual producers. The first, The Matrix, popularized a multiview sensor system containing 120 still cameras and two video cameras allowing slow motion virtual traveling, an effect known today as bullet time. This system has since been subject to various improvements which today not only allow the reproduction of this type of effect (FTV), but also for complete or parts of 3D reconstructions of scene content. The success of Avatar marked the renaissance of 3D cinema, a prelude to 3DTV even if it is not yet possible to free viewers from wearing 3D glasses. Glasses-free, or “autostereoscopic”, 3D display is undeniably advantageous in comparison to glasses-oriented technology due to its convincing immersive 3D vision, non-invasiveness and only slightly higher production costs in relation to 2D screens. Unfortunately, the need of multiple viewpoints (generally between five and nine) to yield immersion involves a spatial mix of these multiple images which limits their individual resolution. As a result, in contrast to stereoscopy with glasses, autostereoscopic visualization is not yet available in full HD. The induced loss of detail in relation to this current standard further limits its use. The principle challenge of autostereoscopy currently concerns the conversion of the overall dedicated tool chain into full HD.
Figure 1.1.Multiview system used to film The Matrix©Warner Bros. Entertainment Inc. a): 120 still cameras and two video cameras enabling time slicing (bullet time effect); b): stereoscopic filming; c): omnidirectional 3D capture for Avatar©20th Century Fox by James Cameron
This profusion of technologies, a veritable 3D race, is probably the result of the rapid banalizing of effects presented to the public, despite the fact that the technologies used have not yet been fully perfected. This race therefore evidently raises further challenges. All these techniques have a point in common. They rely on multiview capture of real scenes and more or less complex processing of the resulting recorded media. They also raise a series of problems relating to the volume of data, at each stage of the media chain: capture, coding [ALA 07], storage and transmission [SMO 07], concluding with its display. It is therefore essential to be able to synthesize the characteristics of this data as systems which mark their use in order to consolidate the bases of this technological explosion.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
