Learn Amazon SageMaker - Julien Simon - E-Book

Learn Amazon SageMaker E-Book

Julien Simon

0,0
34,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Amazon SageMaker enables you to quickly build, train, and deploy machine learning models at scale without managing any infrastructure. It helps you focus on the machine learning problem at hand and deploy high-quality models by eliminating the heavy lifting typically involved in each step of the ML process. This second edition will help data scientists and ML developers to explore new features such as SageMaker Data Wrangler, Pipelines, Clarify, Feature Store, and much more.
You'll start by learning how to use various capabilities of SageMaker as a single toolset to solve ML challenges and progress to cover features such as AutoML, built-in algorithms and frameworks, and writing your own code and algorithms to build ML models. The book will then show you how to integrate Amazon SageMaker with popular deep learning libraries, such as TensorFlow and PyTorch, to extend the capabilities of existing models. You'll also see how automating your workflows can help you get to production faster with minimum effort and at a lower cost. Finally, you'll explore SageMaker Debugger and SageMaker Model Monitor to detect quality issues in training and production.
By the end of this Amazon book, you'll be able to use Amazon SageMaker on the full spectrum of ML workflows, from experimentation, training, and monitoring to scaling, deployment, and automation.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 474

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Learn Amazon SageMaker

Second Edition

A guide to building, training, and deploying machine learning models for developers and data scientists

Julien Simon

BIRMINGHAM—MUMBAI

Learn Amazon SageMaker Second Edition

Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Ali Abidi

Senior Editor: David Sugarman

Content Development Editor: Joseph Sunil

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Aparna Nair

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Joshua Misquitta

First published: August 2020

Second published: November 2021

Production reference: 2191121

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80181-795-0

www.packt.com

Contributors

About the author

Julien Simon is a principal developer advocate for AI and Machine Learning (ML) at Amazon Web Services (AWS). He focuses on helping developers and enterprises bring their ideas to life. He frequently speaks at conferences, blogs on the AWS Blog, as well as on Medium, and he also runs an AI/ML podcast.

Prior to joining AWS, Julien served as the CTO/VP of engineering in top-tier web start-ups over a period of 10 years, where he led large software and ops teams in charge of thousands of servers worldwide. In the process, he fought his way through a wide range of technical, business, and procurement issues, which helped him gain a deep understanding of physical infrastructure, its limitations, and how cloud computing can help.

About the reviewers

Antje Barth is a principal developer advocate for AI and ML at AWS, based in Düsseldorf, Germany. Antje is the co-author of the O'Reilly book, Data Science on AWS, the co-founder of the Düsseldorf chapter of Women in Big Data, and frequently speaks at AI and ML conferences and meetups around the world. She also chairs and curates content for O'Reilly AI Superstream events. Previously, Antje was an engineer at Cisco and MapR, focused on data center technologies, cloud computing, big data, and AI applications.

Brent Rabowsky is a principal data science consultant at AWS with over 10 years' experience in the field of ML. At AWS, he leverages his expertise to help AWS customers with their data science projects. Prior to AWS, he joined Amazon.com on an ML and algorithms team and previously worked on conversational AI agents for a government contractor and a research institute. He has also served as a technical reviewer of the books Data Science on AWS, by Chris Fregly and Antje Barth, published by O'Reilly, and SageMaker Best Practices, published by Packt.

Mia Champion is a HealthAI leader passionate about transformative technologies and strategic markets in the areas of life sciences, healthcare, ML/AI, and cloud computing. She has both a technical and entrepreneurial skillset that includes experience as a principal research scientist, cloud computing architect and developer, new business developer, and business strategist.

Table of Contents

Preface

Section 1: Introduction to Amazon SageMaker

Chapter 1: Introducing Amazon SageMaker

Technical requirements

Exploring the capabilities of Amazon SageMaker

The main capabilities of Amazon SageMaker5

The Amazon SageMaker API 7

Setting up Amazon SageMaker on your local machine

Installing the SageMaker SDK with virtualenv10

Installing the SageMaker SDK with Anaconda 12

A word about AWS permissions14

Setting up Amazon SageMaker Studio

Onboarding to Amazon SageMaker Studio16

Onboarding with the quick start procedure16

Deploying one-click solutions and models with Amazon SageMaker JumpStart

Deploying a solution22

Deploying a model25

Fine-tuning a model28

Summary

Chapter 2: Handling Data Preparation Techniques

Technical requirements

Labeling data with Amazon SageMaker Ground Truth

Using workforces35

Creating a private workforce36

Uploading data for labeling39

Creating a labeling job39

Labeling images44

Labeling text46

Transforming data with Amazon SageMaker Data Wrangler

Loading a dataset in SageMaker Data Wrangler50

Transforming a dataset in SageMaker Data Wrangler57

Exporting a SageMaker Data Wrangler pipeline62

Running batch jobs with Amazon SageMaker Processing

Discovering the Amazon SageMaker Processing API64

Processing a dataset with scikit-learn64

Processing a dataset with your own code72

Summary

Section 2: Building and Training Models

Chapter 3: AutoML with Amazon SageMaker Autopilot

Technical requirements

Discovering Amazon SageMaker Autopilot

Analyzing data79

Feature engineering80

Model tuning80

Using Amazon SageMaker Autopilot in SageMaker Studio

Launching a job81

Monitoring a job86

Comparing jobs89

Deploying and invoking a model94

Using the SageMaker Autopilot SDK

Launching a job97

Monitoring a job98

Cleaning up100

Diving deep on SageMaker Autopilot

The job artifacts100

The data exploration notebook102

The candidate generation notebook103

Summary

Chapter 4: Training Machine Learning Models

Technical requirements

Discovering the built-in algorithms in Amazon SageMaker

Supervised learning110

Unsupervised learning111

A word about scalability112

Training and deploying models with built-in algorithms

Understanding the end-to-end workflow113

Using alternative workflows114

Using fully managed infrastructure114

Using the SageMaker SDK with built-in algorithms

Preparing data116

Configuring a training job119

Launching a training job121

Deploying a model123

Cleaning up124

Working with more built-in algorithms

Regression with XGBoost125

Recommendation with Factorization Machines127

Using Principal Component Analysis135

Detecting anomalies with Random Cut Forest137

Summary

Chapter 5: Training CV Models

Technical requirements

Discovering the CV built-in algorithms in Amazon SageMaker

Discovering the image classification algorithm146

Discovering the object detection algorithm147

Discovering the semantic segmentation algorithm148

Training with CV algorithms149

Preparing image datasets

Working with image files150

Working with RecordIO files157

Working with SageMaker Ground Truth files163

Using the built-in CV algorithms

Training an image classification model165

Fine-tuning an image classification model170

Training an object detection model172

Training a semantic segmentation model175

Summary

Chapter 6: Training Natural Language Processing Models

Technical requirements

Discovering the NLP built-in algorithms in Amazon SageMaker

Discovering the BlazingText algorithm185

Discovering the LDA algorithm185

Discovering the NTM algorithm186

Discovering the seq2sea algorithm187

Training with NLP algorithms188

Preparing natural language datasets

Preparing data for classification with BlazingText189

Preparing data for classification with BlazingText, version 2193

Preparing data for word vectors with BlazingText196

Preparing data for topic modeling with LDA and NTM197

Using datasets labeled with SageMaker Ground Truth203

Using the built-in algorithms for NLP

Classifying text with BlazingText205

Computing word vectors with BlazingText207

Using BlazingText models with FastText208

Modeling topics with LDA210

Modeling topics with NTM214

Summary

Chapter 7: Extending Machine Learning Services Using Built-In Frameworks

Technical requirements

Discovering the built-in frameworks in Amazon SageMaker

Running a first example with XGBoost221

Working with framework containers225

Training and deploying locally226

Training with script mode227

Understanding model deployment229

Managing dependencies231

Putting it all together233

Running your framework code on Amazon SageMaker

Using the built-in frameworks

Working with TensorFlow and Keras239

Working with PyTorch242

Working with Hugging Face245

Working with Apache Spark253

Summary

Chapter 8: Using Your Algorithms and Code

Technical requirements

Understanding how SageMaker invokes your code

Customizing an existing framework container

Setting up your build environment on EC2266

Building training and inference containers266

Using the SageMaker Training Toolkit with scikit-learn

Building a fully custom container for scikit-learn

Training with a fully custom container272

Deploying a fully custom container274

Building a fully custom container for R

Coding with R and plumber278

Building a custom container 280

Training and deploying a custom container on SageMaker281

Training and deploying with your own code on MLflow

Installing MLflow282

Training a model with MLflow283

Building a SageMaker container with MLflow285

Building a fully custom container for SageMaker Processing

Summary

Section 3: Diving Deeper into Training

Chapter 9: Scaling Your Training Jobs

Technical requirements

Understanding when and how to scale

Understanding what scaling means296

Adapting training time to business requirements297

Right-sizing training infrastructure297

Deciding when to scale298

Deciding how to scale299

Scaling a BlazingText training job300

Monitoring and profiling training jobs with Amazon SageMaker Debugger

Viewing monitoring and profiling information in SageMaker Studio304

Enabling profiling in SageMaker Debugger306

Solving training challenges309

Streaming datasets with pipe mode

Using pipe mode with built-in algorithms312

Using pipe mode with other algorithms and frameworks313

Simplifying data loading with MLIO313

Training factorization machines with pipe mode314

Distributing training jobs

Understanding data parallelism and model parallelism315

Distributing training for built-in algorithms315

Distributing training for built-in frameworks316

Distributing training for custom containers316

Scaling an image classification model on ImageNet

Preparing the ImageNet dataset317

Defining our training job319

Training on ImageNet320

Updating batch size322

Adding more instances323

Summing things up324

Training with the SageMaker data and model parallel libraries

Training on TensorFlow with SageMaker DDP325

Training on Hugging Face with SageMaker DDP328

Training on Hugging Face with SageMaker DMP329

Using other storage services

Working with SageMaker and Amazon EFS330

Working with SageMaker and Amazon FSx for Lustre335

Summary

Chapter 10: Advanced Training Techniques

Technical requirements

Optimizing training costs with managed spot training

Comparing costs340

Understanding Amazon EC2 Spot Instances341

Understanding managed spot training342

Using managed spot training with object detection344

Using managed spot training and checkpointing with Keras345

Optimizing hyperparameters with automatic model tuning

Understanding automatic model tuning350

Using automatic model tuning with object detection351

Using automatic model tuning with Keras354

Using automatic model tuning for architecture search359

Exploring models with SageMaker Debugger

Debugging an XGBoost job361

Inspecting an XGBoost job362

Debugging and inspecting a Keras job366

Managing features and building datasets with SageMaker Feature Store

Engineering features with SageMaker Processing370

Creating a feature group371

Ingesting features374

Querying features to build a dataset374

Exploring other capabilities of SageMaker Feature Store375

Detecting bias in datasets and explaining predictions with SageMaker Clarify

Configuring a bias analysis with SageMaker Clarify376

Running a bias analysis378

Analyzing bias metrics379

Running an explainability analysis380

Mitigating bias382

Summary

Section 4: Managing Models in Production

Chapter 11: Deploying Machine Learning Models

Technical requirements

Examining model artifacts and exporting models

Examining and exporting built-in models 389

Examining and exporting built-in CV models 391

Examining and exporting XGBoost models392

Examining and exporting scikit-learn models393

Examining and exporting TensorFlow models394

Examining and exporting Hugging Face models394

Deploying models on real-time endpoints

Managing endpoints with the SageMaker SDK396

Managing endpoints with the boto3 SDK402

Deploying models on batch transformers

Deploying models on inference pipelines

Monitoring prediction quality with Amazon SageMaker Model Monitor

Capturing data410

Creating a baseline411

Setting up a monitoring schedule413

Sending bad data414

Examining violation reports415

Deploying models to container services

Training on SageMaker and deploying on Amazon Fargate418

Summary

Chapter 12: Automating Machine Learning Workflows

Technical requirements

Automating with AWS CloudFormation

Writing a template429

Deploying a model to a real-time endpoint432

Modifying a stack with a change set435

Adding a second production variant to the endpoint438

Implementing canary deployment440

Implementing blue-green deployment444

Automating with AWS CDK

Installing the CDK446

Creating a CDK application446

Writing a CDK application448

Deploying a CDK application450

Building end-to-end workflows with AWS Step Functions

Setting up permissions452

Implementing our first workflow453

Adding parallel execution to a workflow460

Adding a Lambda function to a workflow461

Building end-to-end workflows with Amazon SageMaker Pipelines

Defining workflow parameters468

Processing the dataset with SageMaker Processing469

Ingesting the dataset in SageMaker Feature Store with SageMaker Processing470

Building a dataset with Amazon Athena and SageMaker Processing471

Training a model472

Creating and registering a model in SageMaker Pipelines473

Creating a pipeline474

Running a pipeline475

Deploying a model from the model registry477

Summary

Chapter 13: Optimizing Prediction Cost and Performance

Technical requirements

Autoscaling an endpoint

Deploying a multi-model endpoint

Understanding multi-model endpoints487

Building a multi-model endpoint with scikit-learn487

Deploying a model with Amazon Elastic Inference

Deploying a model with Amazon Elastic Inference493

Compiling models with Amazon SageMaker Neo

Understanding Amazon SageMaker Neo497

Compiling and deploying an image classification model on SageMaker498

Exploring models compiled with Neo500

Deploying an image classification model on a Raspberry Pi501

Deploying models on AWS Inferentia503

Building a cost optimization checklist

Optimizing costs for data preparation504

Optimizing costs for experimentation505

Optimizing costs for model training506

Optimizing costs for model deployment508

Summary

Other Books You May Enjoy

Section 1: Introduction to Amazon SageMaker

The objective of this section is to introduce you to the key concepts, help you download supporting data, and introduce you to example scenarios and use cases.

This section comprises the following chapters:

Chapter 1, Introducing Amazon SageMakerChapter 2, Handling Data Preparation Techniques