Accelerate Deep Learning Workloads with Amazon SageMaker - Vadim Dabravolski - E-Book

Accelerate Deep Learning Workloads with Amazon SageMaker E-Book

Vadim Dabravolski

0,0
33,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Over the past 10 years, deep learning has grown from being an academic research field to seeing wide-scale adoption across multiple industries. Deep learning models demonstrate excellent results on a wide range of practical tasks, underpinning emerging fields such as virtual assistants, autonomous driving, and robotics. In this book, you will learn about the practical aspects of designing, building, and optimizing deep learning workloads on Amazon SageMaker. The book also provides end-to-end implementation examples for popular deep-learning tasks, such as computer vision and natural language processing. You will begin by exploring key Amazon SageMaker capabilities in the context of deep learning. Then, you will explore in detail the theoretical and practical aspects of training and hosting your deep learning models on Amazon SageMaker. You will learn how to train and serve deep learning models using popular open-source frameworks and understand the hardware and software options available for you on Amazon SageMaker. The book also covers various optimizations technique to improve the performance and cost characteristics of your deep learning workloads.

By the end of this book, you will be fluent in the software and hardware aspects of running deep learning workloads using Amazon SageMaker.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 337

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Accelerate Deep Learning Workloads with Amazon SageMaker

Train, deploy, and scale deep learning models effectively using Amazon SageMaker

Vadim Dabravolski

Vadim Dabravolski

BIRMINGHAM—MUMBAI

Accelerate Deep Learning Workloads with 
Amazon SageMaker

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Gebin George

Content Development Editor: Priyanka Soam

Technical Editor: Sweety Pagaria

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Rekha Nair

Production Designer: Aparna Bhagat

Marketing Coordinators: Shifa Ansari, Abeer Riyaz Dawe

First published: October 2022

Production reference: 1191022

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80181-644-1

www.packt.com

Contributors

About the Author

Vadim Dabravolski is a Solutions Architect and Machine Learning Engineer. He has had a career in software engineering for over 15 years, with a focus on data engineering and machine learning. During his tenure in AWS, Vadim helped many organizations to migrate their existing ML workloads or engineer new workloads for the Amazon SageMaker platform. Vadim was involved in the development of Amazon SageMaker capabilities and the adoption of them in practical scenarios.

Currently, Vadim works as an ML engineer, focusing on training and deploying large NLP models. His areas of interest include engineering distributed model training and evaluation, complex model deployment use cases, and optimizing inference characteristics of DL models.

About the reviewer

Brent Rabowsky is a manager and principal data science consultant at AWS, with over 10 years of experience in the field of ML. At AWS, he manages a team of data scientists and leverages his expertise to help AWS customers with their ML projects. Prior to AWS, Brent was on an ML and algorithms team at Amazon.com, and worked on conversational AI agents for a government contractor and a research institute. He also served as a technical reviewer of Data Science on AWS published by O’Reilly, and the following from Packt: Learn Amazon SageMaker, SageMaker Best Practices, and Getting Started with Amazon SageMaker Studio.

Table of Contents

Preface

Part 1: Introduction to Deep Learning on Amazon SageMaker

1

Introducing Deep Learning with Amazon SageMaker

Technical requirements

Exploring DL with Amazon SageMaker

Using SageMaker

Choosing Amazon SageMaker for DL workloads

Managed compute and storage infrastructure

Managed DL software stacks

Exploring SageMaker’s managed training stack

Step 1 – configuring and creating a training job

Step 2 – provisioning the SageMaker training cluster

Step 3 – SageMaker accesses the training data

Step 4 – SageMaker deploys the training container

Step 5 – SageMaker starts and monitors the training job

Step 6 – SageMaker persists the training artifacts

Using SageMaker’s managed hosting stack

Real-time inference endpoints

Creating and using your SageMaker endpoint

SageMaker asynchronous endpoints

SageMaker Batch Transform

Integration with AWS services

Data storage services

Amazon EFS

Amazon FSx for Lustre

Orchestration services

Security services

Monitoring services

Summary

2

Deep Learning Frameworks and Containers on SageMaker

Technical requirements

Exploring DL frameworks on SageMaker

TensorFlow containers

PyTorch containers

Hugging Face containers

Using SageMaker Python SDK

Using SageMaker DL containers

Container usage patterns

SageMaker toolkits

Developing for script mode

Developing an inference script for script mode

Extending the prebuilt containers

Developing a BYO container for inference

Problem overview

Developing the serving container

Summary

3

Managing SageMaker Development Environment

Technical requirements

Selecting a development environment for SageMaker

Setting up a local environment for SageMaker

Using SageMaker Studio

Debugging SageMaker code locally

Summary

4

Managing Deep Learning Datasets

Technical requirements

Selecting storage solutions for ML datasets

Amazon EBS – high-performance block storage

Amazon S3 – industry-standard object storage

Amazon EFS – general-purpose shared filesystem

Amazon FSx for Lustre – high-performance filesystem

SageMaker Feature Store – purpose-built ML storage

Processing data at scale

Augmenting image data using SageMaker Processing

Optimizing data storage and retrieval

Choosing a storage solution

Streaming datasets

Summary

Part 2: Building and Training Deep Learning Models

5

Considering Hardware for Deep Learning Training

Technical requirements

Selecting optimal compute instances

Reviewing specialized DL hardware

Choosing optimal instance types

Improving network throughput with EFA

Introducing EFA

Using EFA with custom training containers

Compiling models for GPU devices with Training Compiler

Introducing the XLA optimization library

Using SageMaker Training Compiler

Using Training Compiler

Summary

6

Engineering Distributed Training

Technical requirements

Engineering data parallel training

Coordination patterns – Parameter Server versus Allreduce

Engineering TensorFlow data parallel training

Engineering PyTorch data parallel training

Engineering SageMaker’s DDP jobs

Engineering model parallel training jobs

Engineering training with SDMP

Optimizing distributed training jobs

Cluster layout and computation affinity

Summary

7

Operationalizing Deep Learning Training

Technical requirements

Debugging training jobs

Using TensorBoard with SageMaker

Monitoring training with SageMaker Debugger

Profiling your DL training

Hyperparameter optimization

Using EC2 Spot Instances

Summary

Part 3: Serving Deep Learning Models

8

Considering Hardware for Inference

Technical requirements

Selecting hardware accelerators in AWS Cloud

Latency-throughput trade-offs

Cost

Supported frameworks and operators

G4 instance family – best price and performance ratio for inference

P3 instance family – performant and expensive for inference

AWS Inferentia

Amazon Elastic Inference

Compiling models for inference

Using TensorRT

Using Neuron SDK

Using SageMaker Neo

Summary

9

Implementing Model Servers

Technical requirements

Using TFS

Reviewing TFS concepts

Integrating TFS with SageMaker

Optimizing TFS

Implementing TFS serving

Using PTS

Integration with SageMaker

Optimizing PTS on SageMaker

Serving models with PTS

Using NVIDIA Triton

Integration with SageMaker

Optimizing Triton inference

Serving models with Triton on SageMaker

Summary

10

Operationalizing Inference Workloads

Technical requirements

Managing inference deployments

Considering model deployment options

Advanced model deployment techniques

Monitoring inference workloads

Using Amazon CloudWatch

Monitoring inference workload quality

Selecting your workload configuration

Using SageMaker Inference Recommender

Summary

Index

Other Books You May Enjoy

Preface

Deep Learning (DL) is a relatively new type of machine learning which demonstrates incredible results in tasks such as natural language understanding and computer vision. At times, DL can be more accurate than humans.

Thanks to the proliferation of open source frameworks, publicly available model architectures and pertained models, many people and organizations can successfully apply cutting-edge DL models to their practical use cases. However, developing, training, and deploying DL models also requires highly specialized and costly types of hardware, software stacks, expertise, and management capabilities which may considerably slow down the adoption.

This book focuses on how to engineer and manage Deep Learning workloads on Amazon SageMaker, which allows you to overcome the aforementioned barriers. SageMaker is a sprawling AWS cloud Machine Learning platform with a variety of capabilities. This book does not intend to cover all available SageMaker capabilities in detail, but rather dive deep into the features relevant to DL workloads. We prioritized depth over breadth when writing this book. The goal of this book is to provide you with practical guidelines on how to efficiently implement real-time use cases involving Deep Learning models on Amazon SageMaker.

Since cloud adoption and machine learning adoption are both accelerating, this book may be of interest to a wide audience, from beginners to experienced ML practitioners. Specifically, this book is for ML engineers who work on DL model development and training, and Solutions Architects who are in charge of designing and optimizing DL workloads.

It is assumed that you are familiar with the Python ecosystem, and the principles of Machine Learning and Deep Learning. Familiarity with AWS and practical experience working with it are also helpful.

The complexity of the chapters increases as we move from introductory and overview topics to advanced implementation and optimization techniques. You may skip certain chapters, or select specific topics which are relevant to your specific task at hand.

Most chapters of this book have corresponding code examples so you can develop practical experience working with Amazon SageMaker. It’s recommended that you try to run the code samples yourself, however, you may also review them. We also provide commentary for code samples as part of each chapter.

Please note that running code examples will results in AWS charges. Make sure to check the Amazon SageMaker pricing page for details.

We welcome your feedback and suggestions on this book, and hope that you enjoy your learning journey.

Who this book is for

This book is written for DL and AI engineers who have a working knowledge of the DL domain and who want to learn and gain practical experience in training and hosting DL models in the AWS cloud using the Amazon SageMaker service capabilities.

What this book covers

Chapter 1, Introducing Deep Learning with Amazon SageMaker, will introduce Amazon SageMaker: how it simplifies infrastructure and workload management, and what the key principles of this AWS service and its main capabilities are. We will then focus on the managed training, hosting infrastructure, and integration with the rest of the AWS services it provides.

Chapter 2, Deep Learning Frameworks and Containers on SageMaker, will review in detail how SageMaker extensively utilizes Docker containers. We will start by diving into pre-built containers for popular DL frameworks (Tensorflow, PyTorch, and MXNet). Then, we will consider how to extend pre-build SageMaker containers and BYO containers. For the latter case, we will review the technical requirements for training and serving containers in SageMaker.

Chapter 3, Managing SageMaker Development Environment, will discuss how to manage SageMaker resources programmatically using a CLI, SDKs, and CloudFormation. We will discuss how to organize an efficient development process using SageMaker Studio and Notebooks as well as how to integrate with your favorite IDE. We will also review troubleshooting your DL code using SageMaker Local Mode. We will review various SageMaker capabilities that allow us to organize and manage datasets and discuss various storage options on AWS and their application use cases.

Chapter 4, Managing Deep Learning Datasets, will provide practical guidance on setting up the first DL project on SageMaker and then building, training, and using a simple DL model. We will provide a follow-along implementation of this project so that readers can learn and experiment with the core SageMaker capabilities themselves.

Chapter 5, Considering Hardware for Deep Learning Training, will consider the price performance characteristics of the most suitable instances for DL models and cover in which scenarios to use one instance type or another for optimal performance.

Chapter 6, Engineering Distributed Training, will focus on understanding the common approaches to distributing your training processes and why you may need to do so for DL models. Then, we will provide an overview of both open source training distribution frameworks as well as innate SageMaker libraries for distributed training.

Chapter 7, Operationalizing Deep Learning Training, will discuss how to monitor and debug your DL training job using SageMaker Debugger and its Profiler as well as how to optimize for cost using Managed Spot Training, early stopping, and other strategies.

Chapter 8, Considering Hardware for Inference, will provide practical guidance on building NLP state-of-the-art models using the PyTorch and Hugging Face frameworks. Readers will follow along with the code to learn how to prepare a training script for distributed training on Amazon SageMaker and then monitor and further optimize the training job. We will use the SageMaker Data Parallel library for distributing training computations.

Chapter 9, Implementing Model Servers, will start by reviewing key components of SageMaker Managed Hosting, such as real-time endpoints and batch inference jobs, model registry, and serving containers. Readers will learn how to configure their endpoint deployment and batch inference jobs using a Python SDK.

Chapter 10, Operationalizing Inference Workloads, will focus on the software stack of DL servers, specifically, on model servers. We will review the model servers provided by the popular TensorFlow and PyTorch solutions as well as framework-agnostic model servers such as SageMaker Multi Model Server. We will discuss when to choose one option over another.

To get the most out of this book

Software/hardware covered in the book

Operating system requirements

local SageMaker-compatible environment established

Windows, macOS, or Linux

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Accelerate-Deep-Learning-Workloads-with-Amazon-SageMaker. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/FXLPc.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “In the following code block, we use the '_build_tf_config()' method to set up this variable.”

A block of code is set as follows:

estimator.fit({ "train":"s3://unique/path/train_files/", "test":"s3://unique/path/test_files"} )

Any command-line input or output is written as follows:

conda create -n sagemaker python=3.9

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

tensorboard --logdir ${tb_debug_path}

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “In the Create the default IAM role popup window, select Any S3 bucket.”

Tips or Important Notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Accelerate Deep Learning Workloads with Amazon SageMaker, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application. 

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781801816441

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

Part 1: Introduction to Deep Learning on Amazon SageMaker

In the first part, we will start with a brief introduction to deep learning and Amazon SageMaker and then focus on the key SageMaker capabilities that will be used throughout the book.

This section comprises the following chapters:

Chapter 1, Introducing Deep Learning with Amazon SageMakerChapter 2, Deep Learning Frameworks and Containers on SageMakerChapter 3, Managing SageMaker Development Environment Chapter 4, Managing Deep Learning Datasets