E-Book
29,99 €

3D Deep Learning with Python E-Book

Xudong Ma

0,0

29,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Fachliteratur
Sprache: Englisch

Beschreibung

With this hands-on guide to 3D deep learning, developers working with 3D computer vision will be able to put their knowledge to work and get up and running in no time.
Complete with step-by-step explanations of essential concepts and practical examples, this book lets you explore and gain a thorough understanding of state-of-the-art 3D deep learning. You’ll see how to use PyTorch3D for basic 3D mesh and point cloud data processing, including loading and saving ply and obj files, projecting 3D points into camera coordination using perspective camera models or orthographic camera models, rendering point clouds and meshes to images, and much more. As you implement some of the latest 3D deep learning algorithms, such as differential rendering, Nerf, synsin, and mesh RCNN, you’ll realize how coding for these deep learning models becomes easier using the PyTorch3D library.
By the end of this deep learning book, you’ll be ready to implement your own 3D deep learning models confidently.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 251

Veröffentlichungsjahr: 2022

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

Für immer aufgeräumt – auch digital

Jürgen Kurz

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Mission erfüllt

Owen Mark

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Macht, was ihr liebt!

Anja Förster

Kopf schlägt Kapital

Günter Faltin

Der größte Raubzug der Geschichte

Matthias Weik

Der Mann und das Holz

Lars Mytting

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Die Jahrhundertlüge, die nur Insider kennen

Heiko Schrang

Organisation für Komplexität

Niels Pfläging

Power: Die 48 Gesetze der Macht

Robert Greene

The Truth About Employee Engagement

Patrick M. Lencioni

Leseprobe

3D Deep Learning with Python

Design and develop your computer vision model with 3D data using PyTorch3D and more

Xudong Ma

Vishakh Hegde

Lilit Yolyan

BIRMINGHAM—MUMBAI

3D Deep Learning with Python

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Dinesh Chaudhary

Content Development Editor: Joseph Sunil

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Rekha Nair

Production Designer: Ponraj Dhandapani

Marketing Coordinator: Shifa Ansari

First published: November 2022

Production reference: 1211022

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80324-782-3

www.packt.com

"To my wife and family, for their support and encouragement at every step". - Vishakh Hegde

"To my family and friends, whose love and support have been my biggest motivation". - Lilit Yolyan

Contributors

About the author

Xudong Ma is a Staff Machine Learning engineer with Grabango Inc. in Berkeley California. He was a Senior Machine Learning Engineer at Facebook (Meta) Oculus and worked closely with the 3D PyTorch Team on 3D facial tracking projects. He has many years of experience working on computer vision, machine learning, and deep learning and holds a Ph.D. in Electrical and Computer Engineering.

Vishakh Hegde is a Machine Learning and Computer Vision researcher. He has over 7 years of experience in the field, during which he has authored multiple well-cited research papers and published patents. He holds a masters from Stanford University specializing in applied mathematics and machine learning, and a BS and MS in Physics from IIT Madras. He previously worked at Schlumberger and Matroid. He is a Senior Applied Scientist at Ambient.ai, where he helped build their weapon detection system which is deployed at several Global Fortune 500 companies. He is now leveraging his expertise and passion for solving business challenges to build a technology startup in Silicon Valley. You can learn more about him on his website.

I would like to thank the computer vision researchers whose breakthrough research I got to write about. I want to thank the reviewers for their feedback and the wonderful team at Packt Publishing for giving me the chance to be creative. Finally, I want to thank my wife and family for all their support and encouragement when I most needed it.

Lilit Yolyan is a machine learning researcher working on her Ph.D. at YSU. Her research focuses on building computer vision solutions for smart cities using remote sensing data. She has 5 years of experience in the field of computer vision and has worked on a complex driver safety solution to be deployed by many well-known car manufacturing companies.

About the reviewer

Eya Abid is a Masters of Engineering student specializing in Deep Learning and Computer Vision. She holds the position of an AI instructor within NVIDIA and quantum machine learning at CERN.

I would like to dedicate this work first to my family, friends, and whoever helped me through this process. A special dedication to Aymen, to whom I am forever grateful.

Ramesh Sekhar is the CEO and co-founder of Dapster.ai, a company that builds affordable and easily deployable robots that perform the most arduous tasks in warehouses. Ramesh has worked at companies like Symbol, Motorola, and Zebra and specializes in building products at the intersection of computer vision, AI, and Robotics. He has a BS in Electrical Engineering and an MS in Computer Science. Ramesh founded Dapster.ai in 2020. Dapster’s mission is to build robots that positively impact human beings by performing dangerous and unhealthy tasks. Their vision is to unlock better jobs, fortify supply chains, and better negotiate the challenges arising from climate change.

Utkarsh Srivastava is an AI/ML professional, trainer, YouTuber, and blogger. He loves to tackle and develop ML, NLP, and computer vision algorithms to solve complex problems. He started his data science career as a blogger on his blog (datamahadev.com) and YouTube channel (datamahadev), followed by working as a senior data science trainer at an institute in Gujarat. Additionally, he has trained and counseled 1,000+ working professionals and students in AI/ML. Utkarsh has completed 40+ freelance training and development work/projects in data science and analytics, AI/ML, Python development, and SQL. He hails from Lucknow and is currently settled in Bangalore, India, as an analyst at Deloitte USI Consulting.

I would like to thank my mother, Mrs. Rupam Srivastava, for her continuous guidance and support throughout my hardships and struggles. Thanks also to the Supreme Para-Brahman.

Mason McGough is a Sr. R&D Engineer and Computer Vision Specialist at Lowe’s Innovation Labs. He has a passion for imaging and has spent over a decade solving computer vision problems across a broad range of industrial and academic disciplines including geology, bio-informatics, game development, and retail. Most recently he is exploring the use of Digital Twins and 3D scanning for retail stores.

I wish to thank Andy Lykos, Joseph Canzano, Alexander Arango, Oleg Alexander, Erin Clark, and my family for their support.

Preface

PART 1: 3D Data Processing Basics

1 Introducing 3D Data Processing

Technical requirements

Setting up a development environment

3D data representation

Understanding point cloud representation

Understanding mesh representation

Understanding voxel representation

3D data file format – Ply files

3D data file format – OBJ files

Understanding 3D coordination systems

Understanding camera models

Coding for camera models and coordination systems

Summary

2 Introducing 3D Computer Vision and Geometry

Technical requirements

Exploring the basic concepts of rendering, rasterization, and shading

Understanding barycentric coordinates

Light source models

Understanding the Lambertian shading model

Understanding the Phong lighting model

Coding exercises for 3D rendering

Using PyTorch3D heterogeneous batches and PyTorch optimizers

A coding exercise for a heterogeneous mini-batch

Understanding transformations and rotations

A coding exercise for transformation and rotation

Summary

PART 2: 3D Deep Learning Using PyTorch3D

3 Fitting Deformable Mesh Models to Raw Point Clouds

Technical requirements

Fitting meshes to point clouds – the problem

Formulating a deformable mesh fitting problem into an optimization problem

Loss functions for regularization

Mesh Laplacian smoothing loss

Mesh normal consistency loss

Mesh edge loss

Implementing the mesh fitting with PyTorch3D

The experiment of not using any regularization loss functions

The experiment of using only the mesh edge loss

Summary

4 Learning Object Pose Detection and Tracking by Differentiable Rendering

Technical requirements

Why we want to have differentiable rendering

How to make rendering differentiable

What problems can be solved by using differentiable rendering

The object pose estimation problem

How it is coded

An example of object pose estimation for both silhouette fitting and texture fitting

Summary

5 Understanding Differentiable Volumetric Rendering

Technical requirements

Overview of volumetric rendering

Understanding ray sampling

Using volume sampling

Exploring the ray marcher

Differentiable volumetric rendering

Reconstructing 3D models from multi-view images

Summary

6 Exploring Neural Radiance Fields (NeRF)

Technical requirements

Understanding NeRF

What is a radiance field?

Representing radiance fields with neural networks

Training a NeRF model

Understanding the NeRF model architecture

Understanding volume rendering with radiance fields

Projecting rays into the scene

Accumulating the color of a ray

Summary

PART 3: State-of-the-art 3D Deep Learning Using PyTorch3D

7 Exploring Controllable Neural Feature Fields

Technical requirements

Understanding GAN-based image synthesis

Introducing compositional 3D-aware image synthesis

Generating feature fields

Mapping feature fields to images

Exploring controllable scene generation

Exploring controllable car generation

Exploring controllable face generation

Training the GIRAFFE model

Frechet Inception Distance

Training the model

Summary

8 Modeling the Human Body in 3D

Technical requirements

Formulating the 3D modeling problem

Defining a good representation

Understanding the Linear Blend Skinning technique

Understanding the SMPL model

Defining the SMPL model

Using the SMPL model

Estimating 3D human pose and shape using SMPLify

Defining the optimization objective function

Exploring SMPLify

Running the code

Exploring the code

Summary

9 Performing End-to-End View Synthesis with SynSin

Technical requirements

Overview of view synthesis

SynSin network architecture

Spatial feature and depth networks

Neural point cloud renderer

Refinement module and discriminator

Hands-on model training and testing

Summary

10 Mesh R-CNN

Technical requirements

Overview of meshes and voxels

Mesh R-CNN architecture

Graph convolutions

Mesh predictor

Demo of Mesh R-CNN with PyTorch

Demo

Summary

Index

Other Books You May Enjoy

PART 1: 3D Data Processing Basics

This first part of the book will define the most basic concepts for data and image processing since these concepts are essential to our later discussions. This part of the book makes the book self-contained so that readers do not need to read any other books to get started with learning about PyTorch3D.

This part includes the following chapters:

Chapter 1, Introducing 3D Data ProcessingChapter 2, Introducing 3D Computer Vision and Geometry

1 Introducing 3D Data Processing

In this chapter, we are going to discuss some basic concepts that are very fundamental to 3D deep learning and that will be used frequently in later chapters. We will begin by learning about the most frequently used 3D data formats, as well as the many ways that we are going to manipulate them and convert them to different formats. We will start by setting up our development environment and installing all the necessary software packages, including Anaconda, Python, PyTorch, and PyTorch3D. We will then talk about the most frequently used ways to represent 3D data – for example, point clouds, meshes, and voxels. We will then move on to the 3D data file formats, such as PLY and OBJ files. We will then discuss 3D coordination systems. Finally, we will discuss camera models, which are mostly related to how 3D data is mapped to 2D images.

After reading this chapter, you will be able to debug 3D deep learning algorithms easily by inspecting output data files. With a solid understanding of coordination systems and camera models, you will be ready to build on that knowledge and learn about more advanced 3D deep learning topics.

In this chapter, we’re going to cover the following main topics:

Setting up a development environment and installing Anaconda, PyTorch, and PyTorch3D3D data representation3D data formats – PLY and OBJ files3D coordination systems and conversion between themCamera models – perspective and orthographic cameras

Technical requirements

In order to run the example code snippets in this book, you will need to have a computer ideally with a GPU. However, running the code snippets with only CPUs is possible.

The recommended computer configuration includes the following:

A GPU such as the GTX series or RTX series with at least 8 GB of memoryPython 3The PyTorch library and PyTorch3D libraries

The code snippets for this chapter can be found at https://github.com/PacktPublishing/3D-Deep-Learning-with-Python.

Setting up a development environment

Let us first set up a development environment for all the coding exercises in this book. We recommend using a Linux machine for all the Python code examples in this book:

We will first set up Anaconda. Anaconda is a widely used Python distribution that bundles with the powerful CPython implementation. One advantage of using Anaconda is its package management system, enabling users to create virtual environments easily. The individual edition of Anaconda is free for solo practitioners, students, and researchers. To install Anaconda, we recommend visiting the website, anaconda.com, for detailed instructions. The easiest way to install Anaconda is usually by running a script downloaded from their website. After setting up Anaconda, run the following command to create a virtual environment of Python 3.7:

$ conda create -n python3d python=3.7

This command will create a virtual environment with Python version 3.7. In order to use this virtual environment, we need to activate it first by running the command:

Activate the newly created virtual environments with the following command:

$ source activate python3d

Install PyTorch. Detailed instructions on installing PyTorch can be found on its web page at www.pytorch.org/get-started/locally/. For example, I will install PyTorch 1.9.1 on my Ubuntu desktop with CUDA 11.1, as follows:

$ conda install pytorch torchvision torchaudio cudatoolkit-11.1 -c pytorch -c nvidia

Install PyTorch3D. PyTorch3D is an open source Python library for 3D computer vision recently released by Facebook AI Research. PyTorch3D provides many utility functions to easily manipulate 3D data. Designed with deep learning in mind, almost all 3D data can be handled by mini-batches, such as cameras, point clouds, and meshes. Another key feature of PyTorch3D is the implementation of a very important 3D deep learning technique, called differentiable rendering. However, the biggest advantage of PyTorch3D as a 3D deep learning library is its close ties to PyTorch.

PyTorch3D may need some dependencies, and detailed instructions on how to install these dependencies can be found on the PyTorch3D GitHub home page at github.com/facebookresearch/pytorch3d. After all the dependencies have been installed by following the instructions from the website, installing PyTorch3D can be easily done by running the following command:

$ conda install pytorch3d -c pytorch3d

Now that we have set up the development environment, let’s go ahead and start learning data representation.

3D data representation

In this section, we will learn the most frequently used data representation of 3D data. Choosing data representation is a particularly important design decision for many 3D deep learning systems. For example, point clouds do not have grid-like structures, thus convolutions cannot be usually used directly for them. Voxel representations have grid-like structures; however, they tend to consume a high amount of computer memory. We will discuss the pros and cons of these 3D representations in more detail in this section. Widely used 3D data representations usually include point clouds, meshes, and voxels.

Understanding point cloud representation

A 3D point cloud is a very straightforward representation of 3D objects, where each point cloud is just a collection of 3D points, and each 3D point is represented by one three-dimensional tuple (x, y, or z). The raw measurements of many depth cameras are usually 3D point clouds.

From a deep learning point of view, 3D point clouds are one of the unordered and irregular data types. Unlike regular images, where we can define neighboring pixels for each individual pixel, there are no clear and regular definitions for neighboring points for each point in a point cloud – that is, convolutions usually cannot be applied to point clouds. Thus, special types of deep learning models need to be used for processing point clouds, such as PointNet: https://arxiv.org/abs/1612.00593.

Another issue for point clouds as training data for 3D deep learning is the heterogeneous data issue – that is, for one training dataset, different point clouds may contain different numbers of 3D points. One approach for avoiding such a heterogeneous data issue is forcing all the point clouds to have the same number of points. However, this may not be always possible – for example, the number of points returned by depth cameras may be different from frame to frame.

The heterogeneous data may create some difficulties for mini-batch gradient descent in training deep learning models. Most deep learning frameworks assume that each mini-batch contains training examples of the same size and dimensions. Such homogeneous data is preferred because it can be most efficiently processed by modern parallel processing hardware, such as GPUs. Handling heterogeneous mini-batches in an efficient way needs some additional work. Luckily, PyTorch3D provides many ways of handling heterogeneous mini-batches efficiently, which are important for 3D deep learning.

Understanding mesh representation

Meshes are another widely used 3D data representation. Like points in point clouds, each mesh contains a set of 3D points called vertices. In addition, each mesh also contains a set of polygons called faces, which are defined on vertices.

In most data-driven applications, meshes are a result of post-processing from raw measurements of depth cameras. Often, they are manually created during the process of 3D asset design. Compared to point clouds, meshes contain additional geometric information, encode topology, and have surface-normal information. This additional information becomes especially useful in training learning models. For example, graph convolutional neural networks usually treat meshes as graphs and define convolutional operations using the vertex neighboring information.

Just like point clouds, meshes also have similar heterogeneous data issues. Again, PyTorch3D provides efficient ways for handling heterogeneous mini-batches for mesh data, which makes 3D deep learning efficient.

Understanding voxel representation

Another important 3D data representation is voxel representation. A voxel is the counterpart of a pixel in 3D computer vision. A pixel is defined by dividing a rectangle in 2D into smaller rectangles and each small rectangle is one pixel. Similarly, a voxel is defined by dividing a 3D cube into smaller-sized cubes and each cube is called one voxel. The processes are shown in the following figure:

Figure 1.1 – Voxel representation is the 3D counterpart of 2D pixel representation, where a cubic space is divided into small volume elements

Voxel representations usually use Truncated Signed Distance Functions (TSDFs) to represent 3D surfaces. A Signed Distance Function (SDF) can be defined at each voxel as the (signed) distance between the center of the voxel to the closest point on the surface. A positive sign in an SDF indicates that the voxel center is outside an object. The only difference between a TSDF and an SDF is that the values of a TSDF are truncated, such that the values of a TSDF always range from -1 to +1.

Unlike point clouds and meshes, voxel representation is ordered and regular. This property is like pixels in images and enables the use of convolutional filters in deep learning models. One potential disadvantage of voxel representation is that it usually requires more computer memory, but this can be reduced by using techniques such as hashing. Nevertheless, voxel representation is an important 3D data representation.

There are 3D data representations other than the ones mentioned here. For example, multi-view representations use multiple images taken from different viewpoints to represent a 3D scene. RGB-D representations use an additional depth channel to represent a 3D scene. However, in this book, we will not be diving too deep into these 3D representations. Now that we have learned the basics of 3D data representations, we will dive into a few commonly used file formats for point clouds and meshes.

3D data file format – Ply files

The PLY file format was developed in the mid-1990s by a group of researchers from Stanford University. It has since evolved into one of the most widely used 3D data file formats. The file format has both an ASCII version and a binary version. The binary version is preferred in cases where file sizes and processing efficiencies are needed. The ASCII version makes it quite easy to debug. Here, we will discuss the basic format of PLY files and how to use both Open3D and PyTorch3D to load and visualize 3D data from PLY files.

In this section, we are going to discuss the two most frequently used data file formats to represent point clouds and meshes, the PLY file format and the OBJ file format. We are going to discuss the formats and how to load and save these file formats using PyTorch3D. PyTorch3D provides excellent utility functions, so loading from and saving to these file formats is efficient and easy using these utility functions.

An example, a cube.ply file, is shown in the following code snippet:

plyformat ascii 1.0comment created for the book 3D Deep Learning with Pythonelement vertex 8property float32 xproperty float32 yproperty float32 zelement face 12property list uint8 int32 vertex_indicesend_header-1 -1 -11 -1 -11 1 -1-1 1 -1-1 -1 11 -1 11 1 1-1 1 13 0 1 23 5 4 73 6 2 13 3 7 43 7 3 23 5 1 03 0 2 33 5 7 63 6 1 53 3 4 03 7 2 63 5 0 4

As seen here, each PLY file contains a header part and a data part. The first line of every ASCII PLY file is always ply, which indicates that this is a PLY file. The second line, format ascii 1.0, shows that the file is of the Ascii type with a version number. Any lines starting with comment will be considered as a comment line, and thus anything following comment will be ignored when the PLY file is loaded by a computer. The element vertex 8 line means that the first type of data in the PLY file is vertex and we have eight vertices. property float32 x means that each vertex has a property named x of the float32 type. Similarly, each vertex also has y and z properties. Here, each vertex is one 3D point. The element face 12 line means that the second type of data in this PLY file is of the face type and we have 12 faces. property list unit8 int32 vertex_indices shows that each face will be a list of vertex indices. The header part of the ply file always ends with an end_header line.

The first part of the data part of the PLY file consists of eight lines, where each line is the record for one vertex. The three numbers in each line represent the three x, y, and z properties of the vertex. For example, the three numbers -1, -1, -1 specify that the vertex has an x coordinate of -1, y coordinate of -1, and z coordinate of -1.

The second part of the data part of the ply file consists of 12 lines, where each line is the record for one face. The first number in the sequence of numbers indicates the number of vertices that the face has, and the following numbers are the vertex indices. The vertex indices are determined by the order that the vertices are declared in the PLY file.

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

3D Deep Learning with Python E-Book

Xudong Ma

3D Deep Learning with Python

3D Deep Learning with Python

Contributors

About the author

About the reviewer

Table of Contents

Preface

PART 1: 3D Data Processing Basics

1

Introducing 3D Data Processing

Technical requirements

Setting up a development environment

3D data representation

Understanding point cloud representation

Understanding mesh representation

Understanding voxel representation

3D data file format – Ply files

3D data file format – OBJ files

Understanding 3D coordination systems

Understanding camera models

Coding for camera models and coordination systems

Summary

2

Introducing 3D Computer Vision and Geometry

Technical requirements

Exploring the basic concepts of rendering, rasterization, and shading

Understanding barycentric coordinates

Light source models

Understanding the Lambertian shading model

Understanding the Phong lighting model

Coding exercises for 3D rendering

Using PyTorch3D heterogeneous batches and PyTorch optimizers

A coding exercise for a heterogeneous mini-batch

Understanding transformations and rotations

A coding exercise for transformation and rotation

Summary

PART 2: 3D Deep Learning Using PyTorch3D

3

Fitting Deformable Mesh Models to Raw Point Clouds

Technical requirements

Fitting meshes to point clouds – the problem

Formulating a deformable mesh fitting problem into an optimization problem

Loss functions for regularization

Mesh Laplacian smoothing loss

Mesh normal consistency loss

Mesh edge loss

Implementing the mesh fitting with PyTorch3D

The experiment of not using any regularization loss functions

The experiment of using only the mesh edge loss

Summary

4

Learning Object Pose Detection and Tracking by Differentiable Rendering

Technical requirements

Why we want to have differentiable rendering

How to make rendering differentiable

What problems can be solved by using differentiable rendering

The object pose estimation problem

How it is coded

An example of object pose estimation for both silhouette fitting and texture fitting

Summary

5

Understanding Differentiable Volumetric Rendering

Technical requirements

Overview of volumetric rendering

Understanding ray sampling

Using volume sampling

Exploring the ray marcher

Differentiable volumetric rendering

Reconstructing 3D models from multi-view images

Summary

6

Exploring Neural Radiance Fields (NeRF)

Technical requirements

Understanding NeRF

What is a radiance field?

Representing radiance fields with neural networks

Training a NeRF model

Understanding the NeRF model architecture