3D Shape Analysis - Hamid Laga - E-Book

3D Shape Analysis E-Book

Hamid Laga

0,0
111,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

An in-depth description of the state-of-the-art of 3D shape analysis techniques and their applications This book discusses the different topics that come under the title of "3D shape analysis". It covers the theoretical foundations and the major solutions that have been presented in the literature. It also establishes links between solutions proposed by different communities that studied 3D shape, such as mathematics and statistics, medical imaging, computer vision, and computer graphics. The first part of 3D Shape Analysis: Fundamentals, Theory, and Applications provides a review of the background concepts such as methods for the acquisition and representation of 3D geometries, and the fundamentals of geometry and topology. It specifically covers stereo matching, structured light, and intrinsic vs. extrinsic properties of shape. Parts 2 and 3 present a range of mathematical and algorithmic tools (which are used for e.g., global descriptors, keypoint detectors, local feature descriptors, and algorithms) that are commonly used for the detection, registration, recognition, classification, and retrieval of 3D objects. Both also place strong emphasis on recent techniques motivated by the spread of commodity devices for 3D acquisition. Part 4 demonstrates the use of these techniques in a selection of 3D shape analysis applications. It covers 3D face recognition, object recognition in 3D scenes, and 3D shape retrieval. It also discusses examples of semantic applications and cross domain 3D retrieval, i.e. how to retrieve 3D models using various types of modalities, e.g. sketches and/or images. The book concludes with a summary of the main ideas and discussions of the future trends. 3D Shape Analysis: Fundamentals, Theory, and Applications is an excellent reference for graduate students, researchers, and professionals in different fields of mathematics, computer science, and engineering. It is also ideal for courses in computer vision and computer graphics, as well as for those seeking 3D industrial/commercial solutions.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 623

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Preface

Acknowledgments

1 Introduction

1.1 Motivation

1.2 The 3D Shape Analysis Problem

1.3 About This Book

1.4 Notation

Part I: Foundations

2 Basic Elements of 3D Geometry and Topology

2.1 Elements of Differential Geometry

2.2 Shape, Shape Transformations, and Deformations

2.3 Summary and Further Reading

3 3D Acquisition and Preprocessing

3.1 Introduction

3.2 3D Acquisition

3.3 Preprocessing 3D Models

3.4 Summary and Further Reading

Part II: 3D Shape Descriptors

4 Global Shape Descriptors

4.1 Introduction

4.2 Distribution‐Based Descriptors

4.3 View‐Based 3D Shape Descriptors

4.4 Spherical Function‐Based Descriptors

4.5 Deep Neural Network‐Based 3D Descriptors

4.6 Summary and Further Reading

5 Local Shape Descriptors

5.1 Introduction

5.2 Challenges and Criteria

5.3 3D Keypoint Detection

5.4 Local Feature Description

5.5 Feature Aggregation Using Bag of Feature Techniques

5.6 Summary and Further Reading

Part III: 3D Correspondence and Registration

6 Rigid Registration

6.1 Introduction

6.2 Coarse Registration

6.3 Fine Registration

6.4 Summary and Further Reading

7 Nonrigid Registration

7.1 Introduction

7.2 Problem Formulation

7.3 Mathematical Tools

7.4 Isometric Correspondence and Registration

7.5 Nonisometric (Elastic) Correspondence and Registration

7.6 Summary and Further Reading

8 Semantic Correspondences

8.1 Introduction

8.2 Mathematical Formulation

8.3 Graph Representation

8.4 Energy Functions for Semantic Labeling

8.5 Semantic Labeling

8.6 Examples

8.7 Summary and Further Reading

Part IV: Applications

9 Examples of 3D Semantic Applications

9.1 Introduction

9.2 Semantics: Shape or Status

9.3 Semantics: Class or Identity

9.4 Semantics: Behavior

9.5 Semantics: Position

9.6 Summary and Further Reading

10 3D Face Recognition

10.1 Introduction

10.2 3D Face Recognition Tasks, Challenges and Datasets

10.3 3D Face Recognition Methods

10.4 Summary

11 Object Recognition in 3D Scenes

11.1 Introduction

11.2 Surface Registration‐Based Object Recognition Methods

11.3 Machine Learning‐Based Object Recognition Methods

11.4 Summary and Further Reading

12 3D Shape Retrieval

12.1 Introduction

12.2 Benchmarks and Evaluation Criteria

12.3 Similarity Measures

12.4 3D Shape Retrieval Algorithms

12.5 Summary and Further Reading

13 Cross‐domain Retrieval

13.1 Introduction

13.2 Challenges and Datasets

13.3 Siamese Network for Cross‐domain Retrieval

13.4 3D Shape‐centric Deep CNN

13.5 Summary and Further Reading

14 Conclusions and Perspectives

References

Index

End User License Agreement

List of Tables

Chapter 01

Table 1.1 List of notations used throughout the book.

Chapter 04

Table 4.1 A summary of the characteristics of the global descriptors presented in this chapter.

Chapter 08

Table 8.1 Potential functions used in the literature for semantic labeling.

Chapter 10

Table 10.1 Examples of 3D face verification approaches.

Table 10.2 Some of the key 3D face identification approaches.

Chapter 12

Table 12.1 Some of the 3D shape benchmarks that are currently available in the literature.

Table 12.2 A selection of

distance measures used in 3D model retrieval algorithms.

Table 12.3 Performance evaluation of handcrafted descriptors on the test set of the Princeton Shape Benchmark 53.

Table 12.4 Examples of local descriptor‐based 3D shape retrieval methods and their performance on commonly used datasets.

Table 12.5 Comparison between handcrafted descriptors and Multi‐View CNN‐based descriptors 68 on ModelNet 73.

Table 12.6 Performance comparison on ShapeNet Core55 normal dataset of some learning‐based methods.

Chapter 13

Table 13.1 An example of the state‐of‐the‐art methods that are proposed for cross‐domain comparison.

Table 13.2 An example of some Siamese networks based method and their performance on SHREC 2013 462 and SHREC 2014 80.

List of Illustrations

Chapter 01

Figure 1.1 Complexity of the shape similarity problem. (a) Nonrigid deformations. (b) Partial similarity. (c) Semantic similarity.

Figure 1.2 Structure of the book and dependencies between the chapters.

Chapter 02

Figure 2.1 An example of a parameterized curve and its differential properties.

Figure 2.2 Reparameterizing a curve

with a diffeomorphism

results in another curve

of the same shape as

. (a) A parametric curve

. (b) A reparameterization function

. (c) The reparametrized curve

.

Figure 2.3 Example of an open surface parameterized by a 2D domain

.

Figure 2.4 Illustration of (a) a spherical parameterization of a closed genus‐0 surface, and (b) how parameterization provides correspondence. Here, the two surfaces, which bend and stretch, are in correct correspondence. In general, however, the analysis process should find the optimal reparameterization which brings

into a full correspondence with

.

Figure 2.5 Illustration of the normal vector, normal plane, and normal curve.

Figure 2.6 Curvatures provide information about the local shape of a 3D object.

Figure 2.7 Representations of complex 3D objects: (a) triangular mesh‐based representation and (b) depth map‐based representation.

Figure 2.8 Examples of nonmanifold surfaces. (a) A nonmanifold vertex attached two fans of triangles, (b) a nonmanifold vertex attached to patches, and (c) a nonmanifold edge attached to three faces.

Figure 2.9 A graphical illustration of the half‐edge data structure.

Figure 2.10 Illustration of the cotangent weights used to discretize the Laplace–Beltrami operator.

Figure 2.11 Similarity transformations: examples of 2D objects with different orientations, scales, and locations, but with identical shapes.

Figure 2.12 PCA‐based normalization for translation, scale, and rotation of 3D objects that undergo nonrigid deformations.

Figure 2.13 PCA‐based normalization for translation, scale, and rotation‐of partial 3D scans. The 3D model is courtesy of SHREC'15: Range Scans based 3D Shape Retrieval 8.

Figure 2.14 Comparison between (a) PCA‐based normalization, and (b) planar‐reflection symmetry‐based normalization of 3D shapes.

Figure 2.15 Example of nonrigid deformations that affect the shape of a 3D object. A 3D object can bend, and/or stretch. The first one is referred to as

isometric deformations

. Fully elastic shapes can, at the same time, bend and stretch.

Chapter 03

Figure 3.1 An illustration of contact 3D acquisition systems. (a) A coordinate measuring machine (CMM). (b) An arm‐based 3D scanner.

Figure 3.2 An illustration of (a) triangulation and (b) stereo‐based 3D acquisition systems. A triangulation‐based system is composed of a light projector and a camera, while a stereo‐based systems uses two cameras. In both cases, the depth of a point on the 3D object is inferred by triangulation.

Figure 3.3 An illustration of a structured light 3D acquisition sensor. The projector shines a single pattern or a set of patterns onto the surface of an object. The camera then records the patterns on the surface. The 3D shape of the surface is estimated by comparing the projected patterns and the distorted patterns acquired by the camera.

Figure 3.4 An illustration of different temporal coded patterns for structured light sensors. (a) Sequential binary pattern. (b) Sequential gray pattern. (c) Phase shift with three projection patterns. (d) Hybrid pattern consisting of gray pattern and phase shift. Panel (d) is based on Figure 3 from Ref. 36. ©1998, SPIE.)

Figure 3.5 An illustration of different spatial coded patterns for structured light sensors. (a) A De Bruijn sequence. (b) A color stripe pattern generated by the De Bruijn sequence. (c) An M‐array with three symbols.

Figure 3.6 Comparison between the Laplacian smoothing and and Taubin smoothing. (a) Original shape. (b) Input shape after adding random noise. (c) Laplacian smoothing of the 3D shape in (b). (d) Taubin smoothing of the 3D shape in (b). Observe how the Laplacian smoothing results in volume shrinkage.

Figure 3.7 The concept of spherical mapping.

Figure 3.8 Example of genus‐0 surfaces (a) and their conformal spherical parameterizations (b). The color at each point on the surfaces indicates the mean curvature of the surface at that point.

Chapter 04

Figure 4.1 (a) Global descriptors describe whole objects with a single real‐valued vector. (b) Local descriptors describe the shape of regions (or neighborhoods) around feature points. In both cases, the descriptors form a space, which is not necessarily Euclidean, with a meaningful distance metric.

Figure 4.2 Some shape functions based on angles (A3), lengths (D1, D2, Ac, G), and areas (D3), see the text for their detailed description. Here, we use a 2D contour for illustration purposes only. The functions are computed on 3D models.

Figure 4.3 Illustration of the (a) D1 and (b) D2 shape distributions computed from six different 3D models.

Figure 4.4 Comparing the light field descriptors (LFDs) of the two 3D models in (a). The LFD of the first shape (top row, first shape) is compared to every LFD of the second shape (top row, second shape), obtained by all possible rotations of the camera system around the second shape (b–d). The dissimilarity between the two shapes is the minimum over all possible rotations of the camera system.

Figure 4.5 A set of

light field descriptors for a 3D model. Each descriptor is obtained with a system of

cameras placed at the vertices of the half hemisphere of a regular dodecahedron.

Figure 4.6 Examples of methods that have been used to convert a 3D shape into a set of spherical functions, which in turn are used to build compact descriptors, see the text for more details.

Figure 4.7 Illustration of the typical architecture of a Convolutional Neural Network (CNN) for the analysis of 2D images.

Figure 4.8 Illustration of the MVCNN architecture for view‐based 3D shape analysis. A set of cameras are placed around the 3D model. Each camera captures a single image that is fed to a view‐based CNN. The output of the view‐based CNNs is aggregated using a view pooling and then fed into another CNN that produces the class scores.

Figure 4.9 Illustration of orientation pooling strategies in volumetric CNN architectures. (a) Volumetric CNN (VCNN) with single orientation input. (b) MultiOrientation Volumetric Convolutional Neural Network (MO‐VCNN), which takes in various orientations of the 3D input, extracts features from shared CNN1 and then pass pooled feature through another network CNN2 to make a prediction.

Chapter 05

Figure 5.1 An illustration of keypoint detection (a) and local feature description (b).

Figure 5.2 Curvature on the Bunny model. (a) Mean curvature. (b) Gaussian curvature. Light gray denotes low curvature values and dark gray denotes high curvature values.

Figure 5.3 Keypoints on the Armadillo model detected by (a) LSP and (b) ISS methods.

Figure 5.4 Keypoints detected on the Chicken model at two scales. (a) Keypoints with a neighborhood size of 20 mm. (b) Keypoints with a neighborhood size of 5 mm. The sampling rate of the model is 2 mm.

Figure 5.5 The process of the intraoctave phase. A set of Gaussian filters are applied to an octave mesh

to obtain several multidimensional filtering maps, which are further used to produce several scalar scale maps

. These scale maps

are normalized to obtain the normalized scale maps

, which are finally used to produce 3D keypoints in an octave.

Figure 5.6 Keypoints detected on the Armadillo model by Castellani et al. 106, Mian et al. 93 and Unnikrishnan and Hebert 109.

Figure 5.7 The scheme of the keypoint detection method in 110. (a) Original lion vase model. (b) Dense 2D normal map. (c) Keypoints detected in 2D normal map. (d) Color‐coded keypoints on the 3D models. Larger spheres represent keypoints at coarser scales.

Figure 5.8 Keypoints detected on the Armadillo model by the MeshDoG 91 and IGSS 120 methods.

Figure 5.9 Keypoints detected on the Armadillo models with different poses by Hu and Hua 122.

Figure 5.10 An illustration of signature‐based methods. (a) Milk drop. (b) Splash 127. (c) Point signature 128.

Figure 5.11 The local reference frame and neighborhood discretization for histogram of spatial distributions based methods. (a) Spin image 95. (b) 3D shape context 96. (c) Intrinsic shape signature 103.

Figure 5.12 An illustration of the RoPS feature descriptor with one rotation. (a) Object (b) Local surface

(c) Rotated surface

(d) Projection

(e) Distrubution matrix

(f) Statistics

(g) Sub‐feature

Figure 5.13 Bag of features encoding; local features are first extracted from the 3D models in the training set. A codebook of size

is then constructed by grouping these features, using some clustering techniques, into

clusters. The centroids (or centers) of the clusters, called also key shapes, form the codewords. A 3D model is represented using the histogram of occurrences of the key shapes in that 3D model.

Figure 5.14 Illustration of the VLAT encoding. The features extracted from a 3D model are first assigned to their closest key‐shapes (cluster) in the codebook. Then, we measure the deviation of the covariance of the features in each cluster from the covariance of the local features of the 3D model. The measured deviation forms the descriptor of the 3D shape with respect to the cluster.

Chapter 06

Figure 6.1 The search for the corresponding point of the secondary point

. The corresponding point

of point

should lie on a spherical surface

with a center at

and a radius of

(b), where

is the distance between points

and

(a).

Figure 6.2 The search for the corresponding point of the auxiliary control point. (a) Suppose

is the orthogonal projection of point

onto the line

. (b) The candidate corresponding point

of

should lie on a circle

which is perpendicular to

and centered at point

with a radius of

, where

is the point corresponding to

,

is distance between

and the line

connecting points

and

.

Figure 6.3 An illustration of congruent 4‐points. For 4‐points (a)

on surface

, if surface

matches surface

with the corresponding 4‐points (b)

lying on the overlapped area, then

, and

.

Figure 6.4 Congruent 4‐points extraction. (a) Given 4‐points

on surface

, two ratios

and

are obtained. (b) Four possible intermediate points can be produced for each pair of points

and

using different assignments of

and

. (c) If points

and

are coincident, these 4‐points

are probably a rigidly transformed copy of

.

Figure 6.5 Point cloud registration achieved by the 4PCS method. (a,b) Two point clouds of building facades acquired from two different viewpoints. (c) The registration result achieved by the 4PCS method.

Chapter 07

Figure 7.1 Examples of (a) 3D models that undergo nonrigid deformations, (b) correspondences when the 3D models undergo only stretching, and (c) correspondences when the 3D models stretch and bend at the same time.

Figure 7.2 Examples of deformation paths between nonrigid shapes. In each example, the source and target shapes are rendered in light gray. The intermediate shapes, along the deformation path, are rendered in a semi transparent dark gray. (a) A deformation path when the correspondences are not optimal, (b) a deformation path, which is not optimal with respect to a physically motivated metric, and (c) the shortest deformation path, or a geodesic, under a physically motivated metric.

Figure 7.3 Examples of correspondences computed using the Möbius voting approach. (a) 3D shapes undergoing isometric motion, (b) 3D shapes undergoing nearly isometric motion, and (c) 3D shapes undergoing large elastic deformations..

Figure 7.4 Interpolation of a straight cylinder bending to a curved one. (a) Linear interpolation in

(

), (b) Geodesic path by SRNF inversion.

Figure 7.5 A spherical wavelet decomposition of a complex surface. Observe that at low resolutions, high frequency components disappear and the surface looks very similar to a sphere.

Figure 7.6 Comparison between the single resolution vs. multiresolution‐based SRNF inversion procedure. (a) Ground truth surface, (b) SRNF inversion using single‐resolution‐based gradient descent, and (c) multiresolution SRNF inversion.

Figure 7.7 Examples of correspondence between complex 3D shapes which undergo (a) isometric (bending only) and (b) elastic deformations (bending and stretching). The correspondences have been computed using the SRNF framework, which finds a one‐to‐one mapping between the source and target surfaces. For clarity, only a few correspondences are shown in this figure.

Figure 7.8 Examples of correspondence and geodesics between complex 3D shapes which undergo elastic deformations. In each row, the most left shape is the source and the most right one is the target. Both correspondences and geodesics are computed using the SRNF framework presented in this chapter. (a) A geodesic between two carpal bones which stretch. (b) A geodesic between two human body shapes, which bend and stretch. (c) A geodesic between two human body shapes with large isometric (bending) deformation.

Figure 7.9 Example of coregistration: the input human body shapes are simultaneously coregistered to the Karcher mean computed using Algorithm 7.1. (a) Input 3D human body shapes. (b) Computed mean human body shape.

Chapter 08

Figure 8.1 Partwise correspondences between 3D shapes in the presence of significant geometrical and topological variations. (a) Man‐made 3D shapes that differ in geometry and topology. The partwise correspondences are color coded. (b) A set of 3D models with significant shape differences, yet they share many semantic correspondences.

Figure 8.2 An example of a surface represented as a graph of nodes and edges. First, the surface is decomposed into patches. Each patch is represented with a node. Adjacent patches are connected with an edge. The geometry of each node

is represented with a unary descriptor

. The geometric relation between two adjacent nodes

and

is represented with a binary descriptor

.

Figure 8.3 Unsupervised learning of the statistics of the semantic labels. (a) Training set. (b) Presegmentation of individual shapes. (c) Embedding in the descriptor space. (d) Clustering. (e) Statistical model per semantic label.

Figure 8.4 Effects of the unary term

on the quality of the segmentation.

Figure 8.5 Semantic labeling by using (a) only the unary term of Eq. 8.1 and (b) the unary and binary terms of Eq. 8.1. The result in (b) is obtained using the approach of Kalogerakis et al. 209, i.e. using the unary term and binary term defined in the first row of Table 8.1.

Figure 8.6 Supervised labeling using the approach of Kalogerakis et al. 209. Observe that different training sets lead to different labeling results. The bottom row image is from Reference 209.

Figure 8.7 Effect of the intermesh term on the quality of the semantic labeling. Results are obtained using the approach of van Kaick et al. 203, i.e. using the energy functions of the second row of Table 8.1. (a) Semantic labeling without the intermesh term (i.e. using the energy of Eq. 8.1). (b) Semantic labeling using the intermesh term (i.e. using the energy of Eq. 8.2).

Chapter 09

Figure 9.1 (a) The scanning configuration that produced the raw (b) and segmented (c) depth images. The images are used for cow health monitoring.

Figure 9.2 (a) This image shows three views of scanned people. The scans were used to collect statistics about human body shape and size. (b) This image shows a depth image of a collection of parts from an industrial circuit breaker. Potentially graspable objects and gripper positions are marked in white.

Figure 9.3 (a) This image shows the labeling of pixels based on a binocular stereo process 234, where white pixels are traversable areas, gray are obstacle boundaries, and black are obstacles. (b) An approach to laser‐based 3D scanning of road surfaces from a moving vehicle.

Figure 9.4 Five frames of a person speaking a passphrase with the infrared intensity image below and a cosine shaded depth image above. More information can be found in 235.

Figure 9.5 Six common facial expressions captured in both color and depth. More information can be found in 238.

Figure 9.6 (a) An intensity and processed depth image of a person walking, which is used for person reidentification. More information can be found in 239. (b) A volumetric (3D) shape from stereo image data reconstructed from multiple color images, along with a semantic labeling (medium gray: building, dark gray: ground, bright gray: vegetation). More information can be found in 240.

Figure 9.7 The left and middle intensity and depth images are of the same person, whereas the right image is of a second person wearing a mask imitating the first person. More information can be found in 241.

Figure 9.8 Examples of the color images and associated 3D models of objects that are recognized using specialized keypoints extracted from both the models and a scene containing these shapes. More information can be found in 93.

Figure 9.9 Examples of six activities and one frame from the color and depth videos, with the associated 3D skeleton overlaid. More information can be found at 242..

Figure 9.10 Examples of the 10 hand gestures from the Sheffield KInect Gesture (SKIG) Dataset, showing both an example of the gesture and a frame of the corresponding depth image. More information can be found in 243.

Figure 9.11 Examples of a depth image of a (a) hand pose and (b) of the corresponding data glove used to verify the estimated finger and hand positions. These were used in experiments investigating the recognition of some characters from Chinese number counting and American Sign Language. More information can be found in 244.

Figure 9.12 Examples of intensity and corresponding depth images for people carrying different weights. One can see how their posture changes.

Figure 9.13 Several frames of a punch interaction between two people showing the intensity image, the depth image and the stick figures extracted from the depth image 246. Other approaches to skeleton or stick figure extraction use RGB images, point clouds, or various combinations of these data.

Figure 9.14 An example of a (a) fight punching movement, (b) its depth image and (c) associated skeleton, as captured by a Kinect sensor. See 248 for more information.

Figure 9.15 (a) Example of 3D tracking of a seated person reaching to different locations 249. (b) 3D pose fitted to a set of 3D points captured by a Kinect sensor and used in an action analysis project. See 250 for more information.

Chapter 10

Figure 10.1 (a) A conventional 3D face verification system: the face description of the probe is compared to the gallery representation. If the matching score between the two faces is greater than a predefined threshold, then the individual is identified as the claimed entity; otherwise, it is considered as an imposter. (b) A conventional 3D face identification system: The 3D face description of the unknown subject is compared to all the descriptions of all 3D faces in the gallery. The unknown subject takes the identity of the highest matching score.

Figure 10.2 Conventional 3D face recognition pipeline.

Figure 10.3 The conventional LBP operator computed on a pixel. The value (gray‐level) of each pixel in the first ring neighborhood is compared to the value of the central pixel. If the value of the pixel is greater than the value of the central one, then a corresponding binary bit is assigned to one; otherwise, it is assigned to zero. The decimal value is used in the matching process.

Chapter 11

Figure 11.1 A general scheme of hypothesize‐and‐test based methods. This scheme consists of three modules, namely: a feature matching module, a hypothesis generation module, and a hypothesis verification module.

Figure 11.2 An illustration of active space violations. (a) Consistent surfaces: the space along the line of sight from the depth sensor

to the surface of the scene is clear. (b) Free space violation (FSV): the transformed model

blocks the visibility of the scene

from the depth sensor

. (c) Occupied space violation (OSV): a region of the scene

that is visible to the depth sensor

is not observed.

Figure 11.3 Illustration of the different terms for global verification. (a) A scene and model hypotheses, where the scene point cloud is shown in dark gray and the active model hypotheses are superimposed onto the scene. (b) The model inliers (shown in light gray) and the model outliers (shown in dark gray). (c) The scene points with a single hypothesis, scene points with multiple hypotheses, and unexplained scene points are shown in different gray levels. (d) A segmented scene, where each gray level represents a segment label.

Figure 11.4 The training process for Hough forest‐based 3D object detection. The point cloud is first divided into individual supervoxels. A local patch is then extracted for each supervoxel. The parameters of the split function on each branch node of the tree in the forest are learned using the features of the local patches.

Figure 11.5 The online object detection process for Hough forest‐based 3D object detection. Ground points are first removed from the scene to obtain individual segments of non‐ground objects. Local patches are extracted from the remaining point cloud and then passed down to a leaf node of each tree using the splitting information. The offset vectors in the leaf nodes are finally used to vote for the object center.

Figure 11.6 The volumetric representation of a chair model under different resolutions. (a) Chair with a resolution of

. (b) Chair with a resolution of

. (c) Chair with a resolution of

.

Chapter 12

Figure 12.1 Some examples of topological noise. When the arm of the 3D human touches the body of the foot, the tessellation of the 3D model changes and results in a different topology. All the models, however, have the same shape (up to isometric deformations) and thus a good 3D retrieval system should be invariant to this type of topological noise. The 3D models are from the SHREC'2016 dataset 414.

Figure 12.2 Global and local‐based 3D shape retrieval pipelines. Dashed lines indicate the offline processes while solid lines indicate online processes.

Chapter 13

Figure 13.1 Sample of sketches taken from the SHREC 2013 dataset 462. Observe that the same object can be sketched differently by different users.

Figure 13.2 Photo synthesis from a 3D model.

Figure 13.3 Sketch synthesis from a 3D model.

Figure 13.4 Joint embedding using Siamese network.

Figure 13.5 Example of five modalities from the same shape class (airplane in this case). The 3D shape is set in the center of interest. The comparison between the other modalities is feasible through the 3D shape.

Figure 13.6 Cross‐domain shape retrieval pipeline using 3D centric method.

Figure 13.7 Embedding space construction. Given the pairwise similarities between 3D shapes in a collection, the aim is to find an (Euclidean) embedding space where the distances between shapes are preserved.

Figure 13.8 Learning shapes from synthesized photos. The learning is based on a regression model. A CNN is given a large amount of training data and the coordinates of the points induced by each shape. By doing so, the CNN learns to map images onto the embedding space.

Figure 13.9 The testing phase. The CNN maps input images onto the embedding space. The dissimilarity between different entities is computed as a distance in the embedding space.

Figure 13.10 A visualization of some 3D shapes in the embedding space and a set of projected images and sketches 459. The red (dark gray) symbols denote images from the wild, and the green (light gray) ones denote hand‐drawn sketches. The symbols indicate the class membership of the objects.

Guide

Cover

Table of Contents

Begin Reading

Pages

C1

vi

2

3

4

xv

xvi

xvii

xviii

11

1

2

3

4

5

6

7

8

9

10

11

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

65

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

135

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

301

302

303

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

337

338

339

340

341

342

343

344

345

346

E1

3D Shape Analysis

Fundamentals, Theory, and Applications

Hamid Laga

Murdoch University and University of South Australia, Australia

 

Yulan Guo

National University of Defense Technology, China

 

Hedi Tabia

ETIS UMR 8051, Paris Seine University, University of Cergy-Pontoise, ENSEA, CNRS, France

 

Robert B. Fisher

University of Edinburgh, United Kingdom

 

Mohammed Bennamoun

The University of Western Australia, Australia

 

Copyright

This edition first published 2019

© 2019 John Wiley & Sons Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Hamid Laga, Yulan Guo, Hedi Tabia, Robert B. Fisher, and Mohammed Bennamoun to be identified as the authors of this work has been asserted in accordance with law.

Registered Office

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office

111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data

Names: Laga, Hamid, author.

Title: 3D shape analysis : fundamentals, theory, and applications / Hamid

Laga, Murdoch University and University of South Australia, Australia,

Yulan Guo, National University of Defense Technology, China, Hedi Tabia,

ETIS UMR 8051, Paris Seine University, University of Cergy-Pontoise,

ENSEA, CNRS, France, Robert B. Fisher, University of Edinburgh, United

Kingdom, Mohammed Bennamoun, The University of Western Australia,

Australia.

Description: 1st edition. | Hoboken, NJ, USA : Wiley, 2019. | Includes

bibliographical references and index.

Identifiers: LCCN 2018033203| ISBN 9781119405108 (hardcover) | ISBN

9781119405191 (epub)

Subjects: LCSH: Three-dimensional imaging. | Pattern recognition systems. |

Shapes-Computer simulation. | Machine learning.

Classification: LCC TA1560 .L34 2019 | DDC 006.6/93-dc23 LC record available at https://lccn.loc.gov/2018033203

Cover design by Wiley

Cover image: © KTSDESIGN/SCIENCE PHOTO LIBRARY/Getty Images

Preface

The primary goal of this book is to provide an in‐depth review of 3D shape analysis, which is an important problem and a building block to many applications. This book covers a wide range of basic, intermediate, and advanced topics relating to both the theoretical and practical aspects of 3D shape analysis. It provides a comprehensive overview of the key developments that have occurred for the past two decades in this exciting and continuously expanding field.

This book is organized into 14 chapters, which include an introductory chapter (Chapter 1) and a Conclusions and Perspectives chapter (Chapter 14). The remaining chapters (Chapters 2-13) are structured into three parts. The first part, which is composed of two chapters, introduces the reader to the background concepts of geometry and topology (Chapter 2) that are relevant to most of the 3D shape analysis aspects. It also provides a comprehensive overview of the techniques that are used to capture, create, and preprocess 3D models (Chapter 3). Understanding these techniques will help the reader, not only to understand the various challenges faced in 3D shape analysis but will also motivate the use of 3D shape analysis techniques in improving the algorithms for 3D reconstruction, which is a long‐standing problem in computer vision and computer graphics.

The second part, which is composed of two chapters, presents a wide range of mathematical and algorithmic tools that are used for shape description and comparison. In particular, Chapter 4 presents various global descriptors that have been proposed in the literature to characterize the overall shape of a 3D object using its geometry and/or topology. Chapter 5 covers the key algorithms and techniques that are used to detect local features and to characterize the shape of local regions using local descriptors. Both local and global descriptors can be used for shape‐based retrieval, recognition, and classification of 3D models. Local descriptors can be also used to compute correspondences between, and to register, 3D objects. This is the focus of the third part of the book, which covers the three commonly studied aspects of the registration and correspondence problem, mainly: rigid registration (Chapter 6), nonrigid registration (Chapter 7), and semantic correspondence (Chapter 8).

The last part and its five chapters are more focused on the application aspects. Specifically, Chapter 9 reviews some of the semantic applications of 3D shape analysis. Chapter 10 focuses on a specific type of 3D object, human faces, and reviews some techniques which are used for 3D face recognition and classification. Chapter 11 focuses on the problem of recognizing objects in 3D scenes. Nowadays, cars, robots, and drones are equipped with 3D sensors, which capture their environments. Tasks such as navigation, target detection and identification, and object tracking require the analysis of the 3D information that is captured by these sensors. Chapter 12 focuses on a classical problem of 3D shape analysis, i.e. how to retrieve 3D objects of interest from a collection of 3D models. It provides a comparative analysis and discusses the pros and cons of various descriptors and similarity measures. Chapter 13, on the other hand, treats the same problem of shape retrieval but this time by using multimodal queries. This is one of the emerging fields of 3D shape analysis and it aims to narrow the gap between the different visual representations of the 3D world (e.g. images, 2D sketches, 3D models, and videos). Finally, Chapter 14 summarizes the book and discusses some of the open problems and future challenges of 3D shape analysis.

The purpose of this book is not to provide a complete and detailed survey of the 3D shape analysis field. Rather, it succinctly covers the key developments of the field in the past two decades and shows their applications in various 3D vision and graphics problems. It is intended to advanced graduate students, postgraduate students, and researchers working in the field. It can also serve as a reference to practitioners and engineers working on the various applications of 3D shape analysis.

May 2018

Hamid Laga

Yulan Guo

Hedi Tabia

Robert B. Fisher

Mohammed Bennamoun

Acknowledgments

The completion of this book would not have been possible without the help, advice, and support of many people. This book is written based on the scientific contributions of a number of colleagues and collaborators. Without their ground breaking contributions to the field, this book would have never matured into this form.

Some of the research presented in this book was developed in collaboration with our PhD students, collaborators, and supervisors. We were very fortunate to work with them and are very grateful for their collaboration. Particularly, we would like to thank (in alphabetical order) Ian H. Jermyn, Sebastian Kurtek, Jonathan Li, Stan Miklavcic, Jacob Montiel, Michela Mortara, Masayuki Nakajima (Hamid Laga's PhD advisor), David Picard, Michela Spagnuolo, Ferdous Sohel, Anuj Srivastava, Antonio Verdone Sanchez, Jianwei Wan, Guan Wang, Hazem Wannous, and Ning Xie. We would also like to thank all our current and previous colleagues at Murdoch University, Tokyo Institute of Technology, National University of Defense Technology, Institute of Computing Technology, Chinese Academy of Sciences, the University of South Australia, the ETIS laboratory, the Graduate School in Electrical Engineering Computer Science and Communications Networks (ENSEA), the University of Western Australia, and The University of Edinburgh.

We are also very grateful to the John Wiley team for helping us create this book.

This work was supported in part by funding from the Australian Research Council (ARC), particularly ARC DP150100294 and ARC DP150104251, National Natural Science Foundation of China (Nos. 61602499 and 61471371), and the National Postdoctoral Program for Innovative Talents of China (No. BX201600172).

Lastly, this book would not have been possible without the incredible support and encouragement of our families.

Hamid Laga dedicates this book to his parents, sisters, and brothers whose love and generosity have always inspired him; to his wife Lan and son Ilyan whose daily encouragement and support in all matters make it all worthwhile.

Yulan Guo dedicates this book to his parents, wife, and son. His son shares the same time for pregnancy, birth, and growth as this book. This would be the first and best gift for the birth of his son.

Hedi Tabia dedicates this book to his family.

Robert B. Fisher dedicates this book to his wife, Miesbeth, who helped make the home a happy place to do the writing. It is also dedicated to his PhD supervisor, Jim Howe, who got him started, and to Bob Beattie, who introduced him to computer vision.

Mohammed Bennamoun dedicates this book to his parents: Mostefa and Rabia, to his children: Miriam, Basheer, and Rayaane and to his seven siblings.

1Introduction

1.1 Motivation

Shape analysis is an old topic that has been studied, for many centuries, by scientists from different boards, including philosophers, psychologists, mathematicians, biologists, and artists. However, in the past two decades, we have seen a renewed interest in the field motivated by the recent advances in 3D acquisition, modeling, and visualization technologies, and the substantial increase in the computation and storage power. Nowadays, 3D scanning devices are accessible not only to domain‐specific experts but also to the general public. Users can scan the real world at high resolution, using devices that are as cheap as video cameras, edit the 3D data using 3D modeling software, share them across the web, and host them in online repositories that are growing in size and in number. Such repositories can include millions of every day objects, cultural heritage artifacts, buildings, as well as medical, scientific, and engineering models.

The increase in the availability of 3D data comes with new challenges in terms of storage, classification, and retrieval of such data. It also brings unprecedented opportunities for solving long‐standing problems; First, the rich variability of 3D content in existing shape repositories makes it possible to directly reuse existing 3D models, in whole or in part, to construct new 3D models with rich variations. In many situations, 3D designers and content creators will no more need to scan or model a 3D object or scene from scratch. They can query existing repositories, retrieve the desired models, and fine‐tune their geometry and appearance to suit their needs. This concept of context reuse is not specific to 3D models but has been naturally borrowed from other types of media. For instance, one can translate sentences to different languages by performing cross‐language search. Similarly, one can create an image composite or a visual art piece by querying images, copying parts of them and pasting them into their own work.

Second, these large amounts of 3D data can be used to learn computational models that effectively reason about properties and relationships of shapes without relying on hard‐coded rules or explicitly programmed instructions. For instance, they can be used to learn 3D shape variation in medical data in order to model physiological abnormalities in anatomical organs, model their natural growth, and learn how shape is affected by disease progression. They can be also used to model 3D shape variability using statistical models, which, in turn, can be used to facilitate 3D model creation with minimum user interaction.

Finally, data‐driven methods facilitate high‐level shape understanding by discovering geometric and structural patterns among collections of shapes. These patterns can serve as strong priors not only in various geometry processing applications but also in solving long‐standing computer vision problems, ranging from low‐level 3D reconstruction to high‐level scene understanding.

These technological developments and the opportunities they bring have motivated researchers to take a fresh look at the 3D shape analysis problem. Although most of the recent developments are application‐driven, many of them aim to answer fundamental, sometimes philosophical, questions such as: What is shape? Can we mathematically formulate the concept of shape? How to compare the shape of objects? How to quantify and localize shape similarities and differences? This book synthesizes the critical mass of 3D shape analysis research that has accumulated over the past 15 years. This rapidly developing field is both profound and broad, with a wide range of applications and many open research questions that are yet to be answered.

1.2 The 3D Shape Analysis Problem

Shape is the external form, outline or surface, of someone or something as opposed to other properties such as color, texture, or material composition.

Source: Wikipedia and Oxford dictionaries.

Humans can easily abstract the form of an object, describe it with a few geometrical attributes or even with words, relate it to the form of another object, and group together, in multiple ways and using various criteria, different objects to form clusters that share some common shape properties. Shape analysis is the general term used to refer to the process of automating these tasks, which are trivial to humans but very challenging to computers. It has been investigated under the umbrella of many applications and has multiple facets. Below, we briefly summarize a few of them.

3D shape retrieval, clustering, and classification

. Similar to other types of multimedia information, e.g. text documents, images, and videos, the demand for efficient clustering and classification tools that can organize, automatically or semi‐automatically, the continuously expanding collections of 3D models is growing. Likewise, users, whether they are experts, e.g. graphics designers who are increasingly relying on the reuse of existing 3D contents, or novice, will benefit from a search engine that will enable them to search for 3D data of interest in the same way they search for text documents or images.

Correspondence and registration

. This problem, which can be summarized as the ability to say which part of an object matches which part on another object, and the ability to align one object onto another, arises in many domains of computer vision, computer graphics, and medical imaging. Probably, one of the most popular examples is the 3D reconstruction problem where usually a 3D object is scanned by multiple sensors positioned at different locations around the object. To build the complete 3D model of the object, one needs to merge the partial scans produced by each sensor. This operation requires a correct alignment, i.e. registration, step that brings all the acquired 3D data into a common coordinate frame. Note also that, in many cases, 3D objects move and deform, in a nonrigid way, during the scanning process. This makes the alignment process even more complex. Another example is in computer graphics where a 3D designer creates a triangulated 3D mesh model, hereinafter referred to as the reference, and assigns to each of its triangular faces some attributes, e.g. color and material properties. The designer then can create additional models with the same attributes but instead of manually setting them, they can be automatically transferred from the reference model if there is a mechanism which finds for each point on the reference model its corresponding points on the other models.

Detection and recognition

. This includes the detection of low level features such as corners or regions of high curvatures, as well as the localization and recognition of parts in 3D objects, or objects in 3D scenes. The latter became very popular in the past few years with the availability of cheap 3D scanning devices. In fact, instead of trying to localize and recognize objects in a scene from 2D images, one can develop algorithms that operate on the 3D scans of the scene, eventually acquired using commodity devices. This has the advantage that 3D data are less affected than 2D images by the occlusions and ambiguities, which are inherent to the loss of dimensionality when projecting the 3D world onto 2D images. 3D face and 3D action recognition are, among others, examples of applications that have benefited from the recent advances in 3D technologies.

Measurement and characterization

of the geometrical and topological properties of objects on one hand and of the spatial relations between objects on the other hand. This includes the identification of similar regions and finding recurrent patterns within and across 3D objects.

Summarization and exploration

of collections of 3D models. Given a set of objects, one would like to compute a representative 3D model, e.g. the average or median shape, as well as other summary statistics such as covariances and modes of variation of their shapes. One would like also to characterize the collection using probability distributions and sample from these distributions new instances of shapes to enrich the collection. In other words, one needs to manipulate 3D models in the same way one manipulates numbers.

Implementing these representative analysis tasks requires solving a set of challenges, and each has been the subject of important research and contributions. The first challenge is the mathematical representation of the shape of objects. 3D models, acquired with laser scanners or created using some modeling software, can be represented with point clouds, polygonal soup models, or as volumetric images. Such representations are suitable for storage and visualization but not for high‐level analysis tasks. For instance, scanning the same object from two different viewpoints or using different devices will often result in two different point clouds but the shape remains the same. The challenge is in designing mathematical representations that capture the essence of shape. A good representation should be independent of (or invariant to) the pose of the 3D object, the way it is scanned or modeled, and the way it is stored. It is also important to ensure that two different shapes cannot have the same representation.

Figure 1.1Complexity of the shape similarity problem. (a) Nonrigid deformations. (b) Partial similarity. (c) Semantic similarity.

Second, almost every shape analysis task requires a measure that quantifies shape similarities and differences. This measure, called dissimilarity, distance, or metric, is essential to many tasks. It can be used to compare the 3D shape of different objects and localize similar parts in and across 3D models. It can also be used to detect and recognize objects in 3D scenes. Shape similarity is, however, one of the most ambiguous concepts in shape analysis since it depends not only on the geometry of the objects being analyzed but also on their semantics, their context, the application, and on the human perception. Figure 1.1 shows a few examples that illustrate the complexity of the shape similarity problem. In Figure 1.1a, we consider human body shapes of the same person but in different poses. One can consider these models as similar since they are of the same person. One may also treat them as different since they differ in pose. On the other hand, the 3D objects of Figure 1.1b are only partially similar. For instance, one part of the centaur model can be treated as similar to the upper body of the human body shape, while the other part is similar to the 3D shape of a horse. Also, one can consider that the candles of Figure 1.1c are similar despite the significant differences in their geometry and topology. A two‐year‐old child can easily match together the parts of the candles that have the same functionality despite the fact that they have different geometry, structure, and topology.

Finally, these problems, i.e. representation and dissimilarity, which are interrelated (although many state‐of‐the‐art papers treat them separately), are the core components of and the building blocks for almost every 3D shape analysis system.

1.3 About This Book

The field of 3D shape analysis is being actively studied by researchers originating from at least four different domains: mathematics and statistics, image processing and computer vision, computer graphics, and medical imaging. As a result, a critical mass of research has accumulated over the past 15 years, where almost every major conference in these fields included tracks dedicated to 3D shape analysis. This book provides an in‐depth description of the major developments in this continuously expanding field of research. It can serve as a complete reference to graduate students, researchers, and professionals in different fields of mathematics, computer science, and engineering. It could be used for courses of intermediate level in computer vision and computer graphics or for self‐study. It is organized into four main parts:

The first part, which is composed of two chapters, provides an in‐depth review of the background concepts that are relevant to most of the 3D shape analysis aspects. It begins in Chapter 2 with the basic elements of geometry and topology, which are needed in almost every 3D shape analysis task. We will look in this chapter into elements of differential geometry and into how 3D models are represented. While most of this material is covered in many courses and textbooks, putting them in the broader context of shape analysis will help the reader appreciate the benefits and power of these fundamental mathematical tools.

Chapter 3 reviews the techniques that are used to capture, create, and preprocess 3D models. Understanding these techniques will help the reader, not only to understand the various challenges faced in 3D shape analysis but will also motivate the use of 3D shape analysis techniques in improving the algorithms for 3D reconstruction, which is a long‐standing problem in computer vision and computer graphics.

The second part, which is composed of two chapters, presents a range of mathematical and algorithmic tools that are used for shape description and comparison. In particular, Chapter 4 presents the different descriptors that have been proposed in the literature to characterize the global shape of a 3D object using its geometry and/or topology. Early works on 3D shape analysis, in particular classification and retrieval, were based on global descriptors. Although they lack the discrimination power, they are the foundations of modern and powerful 3D shape descriptors.

Chapter 5, on the other hand, covers the algorithms and techniques used for the detection of local features and the characterization of the shape of local regions using local descriptors. Many of the current 3D reconstruction, recognition, and analysis techniques are built on the extraction and matching of feature points. Thus, these are fundamental techniques required in most of the subsequent chapters of the book.

The third part of the book, which is composed of three chapters, focuses on the important problem of computing correspondences and registrations between 3D objects. In fact, almost every task, from 3D reconstruction to animation, and from morphing to attribute transfer, requires accurate correspondence and registration. We will consider the three commonly studied aspects of the problem, which are rigid registration (Chapter 6), nonrigid registration (Chapter 7), and semantic correspondence (Chapter 8). In the first case, we are given two pieces of geometry (which can be partial scans or full 3D models), and we seek to find the rigid transformations (translations, scaling and rotations) that align one piece onto the other. This problem appears mainly in 3D scanning where often a 3D object is scanned by multiple scanners. Each scan produces a set of incomplete point clouds that should be aligned and fused together to form a complete 3D model.

3D models can not only undergo rigid transformations but also nonrigid deformations. Think, for instance, of the problem of scanning a human body. During the scanning process, the body can not only move but also bend. Once it is fully captured, we would like to transfer its properties (e.g. color, texture, and motion) onto another 3D human body of a different shape. This requires finding correspondences and registration between these two 3D objects, which bend and stretch. This is a complex problem since the space of solutions is large and requires efficient techniques to explore it. Solutions to this problem will be discussed in Chapter 7.

Semantic correspondence is even more challenging; think of the problem of finding correspondences between an office chair and a dining chair. While humans can easily match parts across these two models, the problem is very challenging for computers since these two models differ both in geometry and topology. We will review in Chapter 8 the methods that solve this problem using supervised learning, and the methods that used structure and context to infer high‐level semantic concepts.

The last part of the book demonstrates the use of the fundamental techniques described in the earlier chapters in a selection of 3D shape analysis applications. In particular, Chapter 9 reviews some of the semantic applications of 3D shape analysis. It also illustrates the range of applications involving 3D data that have been annotated with some sort of meaning (i.e. semantics or labels).

Chapter 10 focuses on a specific type of 3D objects, which are human faces. With the widespread of commodity 3D scanning devices, several recent works use the 3D geometry of the face for various purposes including recognition, gender classification, age recognition, and disease and abnormalities detection. This chapter will review the most relevant works in this area.

Chapter 11 focuses on the problem of recognizing objects in 3D scenes. Nowadays, cars, robots, and drones are all equipped with 3D sensors that capture their environments. Tasks such as navigation, target detection and identification, object tracking, and so on require the analysis of the 3D information that is captured by these sensors.

Chapter 12 focuses on a classical problem of 3D shape analysis, which is how to retrieve 3D objects of interest from a collection of 3D models. Chapter 13, on the other hand, treats the same problem of shape retrieval but this time by using multimodal queries. This is a very recent problem that has received a lot of interest with the emergence of deep‐learning techniques that enable embedding different modalities into a common space.

The book concludes in Chapter 14 with a summary of the main ideas and a discussion of the future trends in this very active and continuously expanding field of research.

Figure 1.2Structure of the book and dependencies between the chapters.

Readers can proceed sequentially through each chapter. Some readers may want to go straight to topics of their interest. In that case, we recommend to follow the reading chart of Figure 1.2, which illustrates the inter‐dependencies between the different chapters.

1.4 Notation

Table 1.1 summarizes the different notations used throughout the book.

Table 1.1 List of notations used throughout the book.

Symbol

Description

Natural numbers

Real numbers

Strictly positive real numbers

Nonnegative real numbers

2D, 3D,

,

D Euclidean space, respectively