Automated Machine Learning on AWS - Trenton Potgieter - E-Book

Automated Machine Learning on AWS E-Book

Trenton Potgieter

0,0
34,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

AWS provides a wide range of solutions to help automate a machine learning workflow with just a few lines of code. With this practical book, you'll learn how to automate a machine learning pipeline using the various AWS services.
Automated Machine Learning on AWS begins with a quick overview of what the machine learning pipeline/process looks like and highlights the typical challenges that you may face when building a pipeline. Throughout the book, you'll become well versed with various AWS solutions such as Amazon SageMaker Autopilot, AutoGluon, and AWS Step Functions to automate an end-to-end ML process with the help of hands-on examples. The book will show you how to build, monitor, and execute a CI/CD pipeline for the ML process and how the various CI/CD services within AWS can be applied to a use case with the Cloud Development Kit (CDK). You'll understand what a data-centric ML process is by working with the Amazon Managed Services for Apache Airflow and then build a managed Airflow environment. You'll also cover the key success criteria for an MLSDLC implementation and the process of creating a self-mutating CI/CD pipeline using AWS CDK from the perspective of the platform engineering team.
By the end of this AWS book, you'll be able to effectively automate a complete machine learning pipeline and deploy it to production.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 401

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Automated Machine Learning on AWS

Fast-track the development of your production-ready machine learning applications the AWS way

Trenton Potgieter

BIRMINGHAM—MUMBAI

Automated Machine Learning on AWS

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Devika Battike

Senior Editor: Nathanya Dias

Content Development Editor: Nazia Shaikh

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Aparna Ravikumar Nair

Proofreader: Safis Editing

Indexer: Sejal Dsilva

Production Designer: Roshan Kawale

Marketing Coordinator: Abeer Dawe, Shifa Ansari

First published: April 2022

Production reference: 1100322

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80181-182-8

www.packt.com

Foreword

Virtually everyone struggles with operationalizing machine learning models. Training your first model can sometimes seem like an insurmountable challenge, until you realize that you also need an end-to-end pipeline to supply new data for inference and retraining the model when its performance inevitably degrades. Although AWS offers the broadest and deepest set of machine learning services, figuring out where to get started and how to tie all those options together normally requires months of painful experimentation. This book cuts through the uncertainty based on Trenton's first-hand experiences working with both the most sophisticated technology companies in the world as well as organizations new to machine learning.

I've worked with hundreds of companies around the world trying to get value from artificial intelligence and machine learning. The problem is that machine learning can mean very different things even within the same company, much less across different organizations or industries. Some teams are just starting to invest in AI and machine learning and want to build their first model, while other teams in the same organization want to scale up sophisticated experimentation and monitoring frameworks to support thousands of models in production. Most companies hire data scientists or machine learning engineers with skill mismatches in the hope that they'll figure it out. Trenton has the rare advantage of seeing how large organizations have successfully scaled up their modeling pipelines as well as where they've faltered. Even more importantly, he has hard-won experience helping them solve those challenges.

The machine learning space evolves so quickly that focusing on any single algorithm, package, or platform can lead to outdated content. Trenton avoids this trap by translating timeless software engineering concepts like continuous integration and continuous delivery to the machine learning space. Unlike many approaches, however, he punctuates each concept with hands-on examples to illustrate how everything works in practice so that you don't need to struggle to translate theory to real life applications.

For example, data scientists often view automated machine learning with disdain due to previous exposure to automation that felt more like a straitjacket than an accelerant. People new to machine learning as well as sophisticated data scientist can overlook AutoML on AWS due to inexperience or ignorance of its benefits. Understanding when and why to use AutoML to get an initial benchmark on a new project or avoid manually selecting and tuning algorithms every time you retrain a model can reduce the time you spend on model training by an order of magnitude.

Even more importantly, learning how to think about the long-term maintenance of the machine learning pipelines will help you avoid painful decisions on whether to spend time refactoring existing models or deliver new projects. Software engineers have been leveraging CI/CD processes for over a decade at this point, but most machine learning practitioners aren't aware of best practices from the DevOps space. Most data scientists discover the need for this process only after they've built a few models and realized that reusable model assets and pipelines are required if they want to do anything beyond maintaining brittle modeling workflows by hand.

Finally, Trenton highlights concepts like source-code and data-centric machine learning that normally require hiring working at a top technology company that's overcome scaling challenges that most companies don't experience early on in their machine learning journeys. Most people and organizations hit a wall after implanting a CI/CD pipeline and building their first. They run up against the challenges of scheduling, tracking, and monitoring their machine learning pipelines. This book is the only example I'm aware of that offers prescriptive guidance on how to structure long-term machine learning pipelines and avoid the common pitfalls that machine learning teams typically encounter.

In short, the concepts in this book will help you move beyond the hopes and dreams of machine learning, to getting machine learning applications into production and delivering value.

Jonathan Dahlberg

Head of ML Solution Engineering

Snorkel AI

Contributors

About the author

Trenton Potgieter is a senior AI/ML specialist at AWS and has been working in the field of ML since 2011. At AWS, he assists multiple AWS customers to create ML solutions and has contributed to various use cases, broadly spanning computer vision, knowledge graphs, and ML automation using MLOps methodologies. Trenton plays a key role in evangelizing the AWS ML services and shares best practices through forums such as AWS blogs, whitepapers, reference architectures, and public-speaking events. He has also actively been involved in leading, developing, and supporting an internal AWS community of MLOps-related subject matter experts.

About the reviewer

Hemanth Boinpally is a Machine Learning Engineer at AWS. He has several years of experience working in data science and ML. He has worked with enterprise customers across different industries, such as healthcare, finance, logistics, and manufacturing. He enjoys providing end-to-end ML solutions for complex business problems. His expertise across the technology stack helps him collaborate with cross-functional teams to build successful ML products. This includes engaging with business stakeholders, the research and development of ML models, and operationalizing these models using MLOps principles. He has worked in areas such as model bias detection, interpretable models, NLP, CV, active learning, and deep learning.

Table of Contents

Preface

Section 1: Fundamentals of the Automated Machine Learning Process and AutoML on AWS

Chapter 1: Getting Started with Automated Machine Learning on AWS

Technical requirements

Overview of the ML process

Complexities in the ML process

An example of the end-to-end ML process

Introducing ACME Fishing Logistics

The case for ML

Getting insights from the data

Building the right model

Training the model

Evaluating the trained model

Exploring possible next steps

Tuning our model

Deploying the optimized model into production

Streamlining the ML process with AutoML

How AWS makes automating the ML development and deployment process easier

Summary

Chapter 2: Automating Machine Learning Model Development Using SageMaker Autopilot

Technical requirements

Introducing the AWS AI and ML landscape

Overview of SageMaker Autopilot

Overcoming automation challenges with SageMaker Autopilot

Getting started with SageMaker Studio

Preparing the experiment data

Starting the Autopilot experiment

Running the Autopilot experiment

Post-experimentation tasks

Using the SageMaker SDK to automate the ML experiment

Codifying the Autopilot experiment

Analyzing the Autopilot experiment with code

Deploying the best candidate

Cleaning up

Summary

Chapter 3: Automating Complicated Model Development with AutoGluon

Technical requirements

Introducing the AutoGluon library

Using AutoGluon for tabular data

Prerequisites

Creating the AutoML experiment with AutoGluon

Evaluating the experiment results

Using AutoGluon for image data

Prerequisites

Creating an image prediction experiment

Evaluating the experiment results

Summary

Section 2: Automating the Machine Learning Process with Continuous Integration and Continuous Delivery (CI/CD)

Chapter 4: Continuous Integration and Continuous Delivery (CI/CD) for Machine Learning

Technical requirements

Introducing the CI/CD methodology

Introducing the CI part of CI/CD

Introducing the CD part of CI/CD

Closing the loop

Automating ML with CI/CD

Taking a deployment-centric approach

Creating an MLOps methodology

Creating a CI/CD pipeline on AWS

Using the AWS CI/CD toolchain

Working with additional AWS developer tools

Creating a cloud-native CI/CD pipeline for a production ML model

Preparing the development environment

Creating the pipeline artifact repository

Developing the application artifacts

Summary

Chapter 5: Continuous Deployment of a Production ML Model

Technical requirements

Deploying the CI/CD pipeline

Codifying the pipeline construct

Creating the CDK application

Deploying the pipeline application

Building the ML model artifacts

Reviewing the modeling file

Reviewing the application file

Reviewing the model serving files

Reviewing the container build file

Committing the ML artifacts

Executing the automated ML model deployment

Cleanup

Summary

Section 3: Optimizing a Source Code-Centric Approach to Automated Machine Learning

Chapter 6: Automating the Machine Learning Process Using AWS Step Functions

Technical requirements

Introducing AWS Step Functions

Creating a state machine

Addressing state machine complexity

Using the Step Functions Data Science SDK for CI/CD

Building the CI/CD pipeline resources

Updating the development environment

Creating the pipeline artifact repository

Building the pipeline application artifacts

Deploying the CI/CD pipeline

Summary

Chapter 7: Building the ML Workflow Using AWS Step Functions

Technical requirements

Building the state machine workflow

Setting up the service permissions

Creating an ML workflow

Performing the integration test

Monitoring the pipeline's progress

Summary

Section 4: Optimizing a Data-Centric Approach to Automated Machine Learning

Chapter 8: Automating the Machine Learning Process Using Apache Airflow

Technical requirements

Introducing Apache Airflow

Introducing Amazon MWAA

Using Airflow to process the abalone dataset

Configuring the MWAA prerequisites

Configuring the MWAA environment

Summary

Chapter 9: Building the ML Workflow Using Amazon Managed Workflows for Apache Airflow

Technical requirements

Developing the data-centric workflow

Building and unit testing the data ETL artifacts

Building the Airflow DAG

Creating synthetic Abalone survey data

Executing the data-centric workflow

Cleanup

Summary

Section 5: Automating the End-to-End Production Application on AWS

Chapter 10: An Introduction to the Machine Learning Software Development Life Cycle (MLSDLC)

Technical requirements

Introducing the MLSDLC

Building the application platform

Examining the role of the application owner

Examining the role of the platform engineers

Examining the role of the frontend developers

Examining ML and data engineering roles

Creating a SageMaker Feature Store

Creating ML artifacts

Creating continuous training artifacts

Understanding the security lens

Securing the data

Securing the code

Securing the website

Summary

Chapter 11: Continuous Integration, Deployment, and Training for the MLSDLC

Technical requirements

Codifying the continuous integration stage

Building the integration artifacts

Building the test artifacts

Building the production artifacts

Automating the continuous integration process

Managing the continuous deployment stage

Reviewing the build phase

Reviewing the test phase

Reviewing the deploy and maintain phases

Reviewing the application user experience

Managing continuous training

Creating new Abalone survey data

Reviewing the continuous training process

Cleanup

Summary

Further reading

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Share Your Thoughts

Once you've read Automated Machine Learning on AWS, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

Section 1: Fundamentals of the Automated Machine Learning Process and AutoML on AWS

This section will educate you on the complexities of the machine learning process, what AutoML is, and how it can be used to streamline the process.

This section comprises the following chapters:

Chapter 1, Getting Started with Automated Machine Learning on AWSChapter 2, Automating Machine Learning Model Development Using SageMaker AutopilotChapter 3, Automating Complicated Model Development with AutoGluon