Hands-On Computer Vision with Detectron2 - Van Vung Pham - E-Book

Hands-On Computer Vision with Detectron2 E-Book

Van Vung Pham

0,0
32,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Computer vision is a crucial component of many modern businesses, including automobiles, robotics, and manufacturing, and its market is growing rapidly. This book helps you explore Detectron2, Facebook's next-gen library providing cutting-edge detection and segmentation algorithms. It’s used in research and practical projects at Facebook to support computer vision tasks, and its models can be exported to TorchScript or ONNX for deployment.
The book provides you with step-by-step guidance on using existing models in Detectron2 for computer vision tasks (object detection, instance segmentation, key-point detection, semantic detection, and panoptic segmentation). You’ll get to grips with the theories and visualizations of Detectron2’s architecture and learn how each module in Detectron2 works. As you advance, you’ll build your practical skills by working on two real-life projects (preparing data, training models, fine-tuning models, and deployments) for object detection and instance segmentation tasks using Detectron2. Finally, you’ll deploy Detectron2 models into production and develop Detectron2 applications for mobile devices.
By the end of this deep learning book, you’ll have gained sound theoretical knowledge and useful hands-on skills to help you solve advanced computer vision tasks using Detectron2.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 333

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Hands-On Computer Vision with Detectron2

Develop object detection and segmentation models with a code and visualization approach

Van Vung Pham

BIRMINGHAM—MUMBAI

Hands-On Computer Vision with Detectron2

Copyright © 2023 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Dhruv J. Kataria

Content Development Editor: Shreya Moharir

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Jyoti Chauhan

Marketing Coordinators: Shifa Ansari, Vinishka Kalra

First published: April 2023

Production reference: 1290323

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80056-162-5

www.packtpub.com

To my father, Pham Van Hung, and my mother, Pham Thi Doai, for their sacrifices and love. To my loving wife, Thi Hong Hanh Le, for her unwavering support during this exciting and also time-consuming endeavor, and my children, Le Tra My Pham and Le Ha Chi Pham, for checking on me about my progress, and to my little one, Liam Le Pham, who was born while I was writing this book and brought new excitement and source of energy for me to complete it.

Foreword

I have known and worked with Van Vung Pham for more than 10 years and was also his academic advisor for his doctoral degree. Vung won several data visualization, computer vision, and machine learning challenges during his Ph.D. program, including using Detectron2 to detect and classify road damage. In this book, Hands-On Computer Vision with Detectron2, Vung takes you on a learning journey that starts with common computer vision tasks. He then walks you through the steps for developing computer vision applications using stunning deep-learning models with simple code by utilizing pre-trained models on the Detectron2 Model Zoo.

Existing models, trained on huge datasets, and for the most common object types, can meet common computer vision tasks. However, this book also focuses on developing computer vision applications on a custom domain for specific business requirements. For this, Vung provides the steps to collect and prepare data, train models, and fine-tune models on brain tumor datasets for object detection and instance segmentation tasks to illustrate how to develop computer vision applications on custom business domains.

In his presentations and examples, Vung provides code that can be conveniently executed on Google Colab and visualizations to help illustrate theoretical concepts. The ability to execute the code on Google Colab helps eliminate the burden of hardware and software setup, so you can get started quickly and conveniently. The visualizations allow you to easily grasp complicated computer vision concepts, better understand deep learning architectures for computer vision tasks, and become an expert in this area.

Beyond developing deep learning models for computer vision tasks, you will learn how to deploy the trained models to various environments. Vung explains different model formats, such as TorchScript and ONNX formats, and their respective execution platforms and environments, such as C++ servers, web browsers, or mobile and edge devices.

Become a developer and an expert in developing and deploying computer vision applications with Detectron2.

– Tommy Dang

iDVL director and assistant professor, Texas Tech University

Contributors

About the author

Van Vung Pham is a passionate research scientist in machine learning, deep learning, data science, and data visualization. He has years of experience and numerous publications in these areas. He is currently working on projects that use deep learning to predict road damage from pictures or videos taken from roads. One of the projects uses Detectron2 and Faster R-CNN to predict and classify road damage and achieve state-of-the-art results for this task. Dr. Pham obtained his Ph.D. from the Computer Science Department, at Texas Tech University, Lubbock, Texas, USA. He is currently an assistant professor at the Computer Science Department, Sam Houston State University, Huntsville, Texas, USA.

I want to thank the people who have been close and supported me, especially my wife, Hanh, my parents, my children, and my Ph.D. advisor (Dr. Tommy Dang from Texas Tech University).

About the reviewers

Yiqiao Yin is a senior data scientist at an S&P 500 company LabCorp, developing AI-driven solutions for drug diagnostics and development. He has a BA in mathematics and a BSc in finance from the University of Rochester. He was a PhD student in statistics at Columbia University and has a wide range of research interests in representation learning: feature learning, deep learning, computer vision, and natural language processing. He has held professional positions as an enterprise-level data scientist at EURO STOXX 50 company Bayer, a quantitative researcher at AQR, working on alternative quantitative strategies to portfolio management and factor-based trading, and an equity trader at T3 Trading on Wall Street.

Nikita Dalvi is a highly skilled and experienced technical professional, currently pursuing a master’s degree in computing and data science at Sam Houston State University. With a background in information and technology, she has honed her skills in programming languages such as Java and Python over the past five years, having worked with prestigious organizations such as Deloitte and Tech Mahindra. Driven by her passion for programming, she has taught herself new languages and technologies over the years and stayed up to date with the latest industry trends and best practices.

Table of Contents

Preface

Part 1: Introduction to Detectron2

1

An Introduction to Detectron2 and Computer Vision Tasks

Technical requirements

Computer vision tasks

Object detection

Instance segmentation

Keypoint detection

Semantic segmentation

Panoptic segmentation

An introduction to Detectron2 and its architecture

Introducing Detectron2

Detectron2 architecture

Detectron2 development environments

Cloud development environment for Detectron2 applications

Local development environment for Detectron2 applications

Connecting Google Colab to a local development environment

Summary

2

Developing Computer Vision Applications Using Existing Detectron2 Models

Technical requirements

Introduction to Detectron2’s Model Zoo

Developing an object detection application

Getting the configuration file

Getting a predictor

Performing inferences

Visualizing the results

Developing an instance segmentation application

Selecting a configuration file

Getting a predictor

Performing inferences

Visualizing the results

Developing a keypoint detection application

Selecting a configuration file

Getting a predictor

Performing inferences

Visualizing the results

Developing a panoptic segmentation application

Selecting a configuration file

Getting a predictor

Performing inferences

Visualizing the results

Developing a semantic segmentation application

Selecting a configuration file and getting a predictor

Performing inferences

Visualizing the results

Putting it all together

Getting a predictor

Performing inferences

Visualizing the results

Performing a computer vision task

Summary

Part 2: Developing Custom Object Detection Models

3

Data Preparation for Object Detection Applications

Technical requirements

Common data sources

Getting images

Selecting an image labeling tool

Annotation formats

Labeling the images

Annotation format conversions

Converting YOLO datasets to COCO datasets

Converting Pascal VOC datasets to COCO datasets

Summary

4

The Architecture of the Object Detection Model in Detectron2

Technical requirements

Introduction to the application architecture

The backbone network

Region Proposal Network

The anchor generator

The RPN head

The RPN loss calculation

Proposal predictions

Region of Interest Heads

The pooler

The box predictor

Summary

5

Training Custom Object Detection Models

Technical requirements

Processing data

The dataset

Downloading and performing initial explorations

Data format conversion

Displaying samples

Using the default trainer

Selecting the best model

Evaluation metrics for object detection models

Selecting the best model

Inferencing thresholds

Sample predictions

Developing a custom trainer

Utilizing the hook system

Summary

6

Inspecting Training Results and Fine-Tuning Detectron2’s Solvers

Technical requirements

Inspecting training histories with TensorBoard

Understanding Detectron2’s solvers

Gradient descent

Stochastic gradient descent

Momentum

Variable learning rates

Fine-tuning the learning rate and batch size

Summary

7

Fine-Tuning Object Detection Models

Technical requirements

Setting anchor sizes and anchor ratios

Preprocessing input images

Sampling training data and generating the default anchors

Generating sizes and ratios hyperparameters

Setting pixel means and standard deviations

Preparing a data loader

Calculating the running means and standard deviations

Putting it all together

Summary

8

Image Data Augmentation Techniques

Technical requirements

Image augmentation techniques

Why image augmentations?

What are image augmentations?

How to perform image augmentations

Detectron2’s image augmentation system

Transformation classes

Augmentation classes

The AugInput class

Summary

9

Applying Train-Time and Test-Time Image Augmentations

Technical requirements

The Detectron2 data loader

Applying existing image augmentation techniques

Developing custom image augmentation techniques

Modifying the existing data loader

Developing the MixUp image augmentation technique

Developing the Mosaic image augmentation technique

Applying test-time image augmentation techniques

Summary

Part 3: Developing a Custom Detectron2 Model for Instance Segmentation Tasks

10

Training Instance Segmentation Models

Technical requirements

Preparing data for training segmentation models

Getting images, labeling images, and converting annotations

Introduction to the brain tumor segmentation dataset

The architecture of the segmentation models

Training custom segmentation models

Summary

11

Fine-Tuning Instance Segmentation Models

Technical requirements

Introduction to PointRend

Using existing PointRend models

Training custom PointRend models

Summary

Part 4: Deploying Detectron2 Models into Production

12

Deploying Detectron2 Models into Server Environments

Technical requirements

Supported file formats and runtimes

Development environments, file formats, and runtimes

Exporting PyTorch models using the tracing method

When the tracing method fails

Exporting PyTorch models using the scripting method

Mixing tracing and scripting approaches

Deploying models using a C++ environment

Deploying custom Detectron2 models

Detectron2 utilities for exporting models

Exporting a custom Detectron2 model

Summary

13

Deploying Detectron2 Models into Browsers and Mobile Environments

Technical requirements

Deploying Detectron2 models using ONNX

Introduction to ONNX

Exporting a PyTorch model to ONNX

Loading an ONNX model to the browser

Exporting a custom Detectron2 model to ONNX

Developing mobile computer vision apps with D2Go

Introduction to D2Go

Using existing D2Go models

Training custom D2Go models

Model quantization

Summary

Index

Other Books You May Enjoy

Preface

Computer vision takes part and has become a critical success factor in many modern businesses such as automobile, robotics, manufacturing, and biomedical image processing – and its market is growing rapidly. This book will help you explore Detectron2. It is the next-generation library that provides cutting-edge computer vision algorithms. Many research and practical projects at Facebook (now Meta) use it as a library to support computer vision tasks. Its models can be exported to TorchScript and Open Neural Network Exchange (ONNX) format for deployments into server production environments (such as C++ runtime), browsers, and mobile devices.

By utilizing code and visualizations, this book will guide you on using existing models in Detectron2 for computer vision tasks (object detection, instance segmentation, key-point detection, semantic detection, and panoptic segmentation). It also covers theories and visualizations of Detectron2’s architectures and how each module in Detectron2 works. This book walks you through two complete hands-on, real-life projects (preparing data, training models, fine-tuning models, and deployments) for object detection and instance segmentation of brain tumors using Detectron2.

The data preparation section discusses common sources of datasets for computer vision applications and tools to collect and label data. It also describes common image data annotation formats and codes to convert from different formats to the one Detectron2 supports. The training model section guides the steps to prepare the configuration file, load pre-trained weights for transfer learning (if necessary), and modify the default trainer to meet custom business requirements.

The fine-tuning model section includes inspecting training results using TensorBoard and optimizing Detectron2 solvers. It also provides a primer to common and cutting-edge image augmentation techniques and how to use existing Detectron2 image augmentation techniques or to build and apply custom image augmentation techniques at training and testing time. There are also techniques to fine-tune object detection models, such as computing appropriate configurations for generating anchors (sizes and ratios of the anchors) or means or standard deviations of the pixel values from custom datasets. For instance segmentation task, this book also discusses the use of PointRend to improve the quality of the boundaries of the detected instances.

This book also covers steps for deploying Detectron2 models into production and developing Detectron2 applications for mobile devices. Specifically, it provides the model formats and platforms that Detectron2 supports, such as TorchScript and ONNX formats. It provides the code to convert Detectron2 into these formats models using tracing and scripting approaches. Additionally, code snippets illustrate how to deploy Detectron2 models into C++ and browser environments. Finally, this book also discusses D2Go, a platform to train, fine-tune, and quantize computer visions so they can be deployable to mobile and edge devices with low-computation resource awareness.

Through this book, you will find that Detectron2 is a valuable framework for anyone looking to build robust computer vision applications.

Who this book is for

If you are a deep learning application developer, researcher, or software developer with some prior knowledge about deep learning, this book is for you to get started and develop deep learning models for computer vision applications. Even if you are an expert in computer vision and curious about the features of Detectron2, or you would like to learn some cutting-edge deep learning design patterns, you will find this book helpful. Some HTML, Android, and C++ programming skills are advantageous if you want to deploy computer vision applications using these platforms.

What this book covers

Chapter 1, An Introduction to Detectron2 and Computer Vision Tasks, introduces Detectron2, its architectures, and the computer vision tasks that Detectron2 can perform. Additionally, this chapter provides the steps to set up environments for developing computer vision applications using Detectron2.

Chapter 2, Developing Computer Vision Applications Using Existing Detectron2 Models, guides you through the steps to develop applications for computer vision tasks using state-of-the-art models in the Detectron2 Model Zoo. Thus, you can quickly develop practical computer vision applications without having to train custom models.

Chapter 3, Data Preparation for Object Detection Applications, discusses the steps to prepare data for training models using Detectron2. Additionally, this chapter covers the techniques to convert standard annotation formats to the data format required by Detectron2 in case the existing datasets come in different formats.

Chapter 4, The Architecture of the Object Detection Model in Detectron2, dives deep into the architecture of Detectron2 for the object detection task. This chapter is essential for understanding common terminologies when designing deep neural networks for vision systems.

Chapter 5, Training Custom Object Detection Models, provides steps to prepare data, train an object detection model, select the best model, and perform inferencing object detection tasks. Additionally, it details the development process of a custom trainer by extending the default trainer and incorporating a hook into the training process.

Chapter 6, Inspecting Training Results and Fine-Tuning Detectron2's Solver, covers the steps to use TensorBoard to inspect training histories. It utilizes the codes and visualizations approach for explaining the concepts behind Detectron2’s solvers and their hyperparameters. The related concepts include gradient descent, Stochastic gradient descent, momentum, and variable learning rate optimizers.

Chapter 7, Fine-Tuning Object Detection Models, explains how Detectron2 processes its inputs and provides codes to analyze the ground-truth boxes from a training dataset and find appropriate values for the anchor sizes and ratio configuration parameters. Additionally, this chapter provides the code to calculate the input image pixels' means and standard deviations from the training dataset in a rolling manner. The rolling calculations of these hyperparameters are essential if the training dataset is large and does not fit in the memory.

Chapter 8, Image Data Augmentation Techniques, introduces Detectron2’s image augmentation system with three main components: Transformation, Augmentation, and AugInput. It describes classes in these components and how they work together to perform image augmentation while training Detectron2 models.

Chapter 9, Applying Train-Time and Test-Time Image Augmentations, introduces the steps to apply these existing classes to training. This chapter also explains how to modify existing codes to implement custom techniques that need to load data from different inputs. Additionally, this chapter details the steps for applying image augmentations during test time to improve accuracy.

Chapter 10, Training Instance Segmentation Models, covers the steps to construct a dataset in the format supported by Detectron2 and train a model for a segmentation task. This chapter also utilizes the codes and visualizations approach to explain the architecture of an object segmentation application developed using Detectron2.

Chapter 11, Fine-Tuning Instance Segmentation Models, introduces PointRend, a project inside Detectron2 that helps improve the sharpness of the object’s boundaries. This chapter also covers the steps to use existing PointRend models and to train custom models using PointRend.

Chapter 12, Deploying Detectron2 Models into Server Environments, walks you through the steps in an export process to convert Detectron2 models into deployable artifacts. This chapter then provides the steps to deploy the exported models into the server environments.

Chapter 13, Deploying Detectron2 Models into Browsers and Mobile Environments, introduces the ONNX framework. It is extremely helpful when deploying Detectron2 models into browsers or mobile environments is needed. This chapter also describes D2Go for training, quantizing lightweight models extremely useful for deploying into mobile or edge devices.

To get the most out of this book

Detectron2, D2Go, and PyTorch are under active development, and therefore Detectron2 or D2Go may not be compatible with the PyTorch version you have or that Google Colab provides by default. The source code is fully tested using the following versions on Google Colab:

Software/hardware covered in the book

Operating system requirements

Python 3.8 and 3.9

Google Colab

PyTorch 1.13

CUDA: cu116

Detectron2 (commit 3ed6698)

D2Go (commit 1506551)

Chapter 1 of this book also provides installation instructions and information you need to start. Additionally, this book provides Code in Action videos where you can view the Python and commit versions of all the packages being used.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Note that some less important portions of the codes are truncated inside the book for space efficiency and legibility. Therefore, simply copying and pasting codes from the book may lead to execution errors. It is recommended to follow the complete code found in the book’s GitHub repository, detailed in the following section.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Hands-On-Computer-Vision-with-Detectron2. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Code in Action

The Code in Action videos for this book can be viewed at http://bit.ly/40DJdpd.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.”

A block of code is set as follows:

html, body, #map { height: 100%; margin: 0; padding: 0 }

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default] exten => s,1,Dial(Zap/1|30) exten => s,2,Voicemail(u100) exten => s,102,Voicemail(b100) exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

$ mkdir css $ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Select System info from the Administration panel.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Hands-On Computer Vision with Detectron2, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781800561625

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

Part 1: Introduction to Detectron2

This first part introduces Detectron2, its architectures, and the computer vision tasks that Detectron2 can perform. In other words, it discusses why we need computer vision applications and what computer vision tasks Detectron2 can perform. Additionally, this part provides the steps to set up environments for developing computer vision applications using Detectron2 locally or on the cloud using Google Colab. Also, it guides you through the steps to build applications for computer vision tasks using state-of-the-art models in Detectron2. Specifically, it discusses the existing and pre-trained models in Detectron2’s Model Zoo and the steps to develop applications for object detection, instance segmentation, key-point detection, semantic segmentation, and panoptic segmentation using these models.

The first part covers the following chapters:

Chapter 1, AnIntroduction to Detectron2 and Computer Vision TasksChapter 2, Developing Computer Vision Applications Using Existing Detectron2 Models