Machine Learning Engineering on AWS - Joshua Arvin Lat - E-Book

Machine Learning Engineering on AWS E-Book

Joshua Arvin Lat

0,0
33,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

There is a growing need for professionals with experience in working on machine learning (ML) engineering requirements as well as those with knowledge of automating complex MLOps pipelines in the cloud. This book explores a variety of AWS services, such as Amazon Elastic Kubernetes Service, AWS Glue, AWS Lambda, Amazon Redshift, and AWS Lake Formation, which ML practitioners can leverage to meet various data engineering and ML engineering requirements in production.
This machine learning book covers the essential concepts as well as step-by-step instructions that are designed to help you get a solid understanding of how to manage and secure ML workloads in the cloud. As you progress through the chapters, you’ll discover how to use several container and serverless solutions when training and deploying TensorFlow and PyTorch deep learning models on AWS. You’ll also delve into proven cost optimization techniques as well as data privacy and model privacy preservation strategies in detail as you explore best practices when using each AWS.
By the end of this AWS book, you'll be able to build, scale, and secure your own ML systems and pipelines, which will give you the experience and confidence needed to architect custom solutions using a variety of AWS services for ML engineering requirements.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 582

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Machine Learning Engineering on AWS

Build, scale, and secure machine learning systems and MLOps pipelines in production

Joshua Arvin Lat

BIRMINGHAM—MUMBAI

Machine Learning Engineering on AWS

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Ali Abidi

Content Development Editor: Priyanka Soam

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Sejal Dsilva

Production Designer: Ponraj Dhandapani

Marketing Coordinators: Shifa Ansari

First published: October 2022

Production reference: 1290922

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80324-759-5

www.packt.com

Contributors

About the author

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO of three Australian-owned companies and also served as Director of Software Development and Engineering for multiple e-commerce start-ups in the past, which allowed him to be more effective as a leader. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.

About the reviewers

Raphael Jambalos manages the Cloud-Native Development Team at eCloudValley, Philippines. His team architects and implements solutions that leverage AWS services to deliver reliable applications. He is also a community leader for the AWS user group MegaManila, organizing monthly meetups and growing the community. In his free time, he loves to read books and write about tech on his blog (https://dev.to/raphael_jambalos). He holds five AWS certifications and is an AWS APN Ambassador for the Philippines. He was also a technical reviewer for the Packt book Machine Learning with Amazon SageMaker Cookbook.

Sophie Soliven is the General Manager of E-commerce Services and Dropship for BeautyMnl. As one of the pioneers and leaders of the company, she contributed to its growth from its humble beginnings to what it is today – the biggest homegrown e-commerce platform in the Philippines – by using a data-driven approach to scale its operations. She has obtained a number of certifications on data analytics and cloud computing, including Microsoft Power BI Data Analyst Associate, Tableau Desktop Specialist, and AWS Certified Cloud Practitioner. For the last couple of years, she has been sharing her knowledge and experience in data-driven operations at local and international conferences and events.

Table of Contents

Preface

Part 1: Getting Started with Machine Learning Engineering on AWS

1

Introduction to ML Engineering on AWS

Technical requirements

What is expected from ML engineers?

How ML engineers can get the most out of AWS

Essential prerequisites

Creating the Cloud9 environment

Increasing Cloud9’s storage

Installing the Python prerequisites

Preparing the dataset

Generating a synthetic dataset using a deep learning model

Exploratory data analysis

Train-test split

Uploading the dataset to Amazon S3

AutoML with AutoGluon

Setting up and installing AutoGluon

Performing your first AutoGluon AutoML experiment

Getting started with SageMaker and SageMaker Studio

Onboarding with SageMaker Studio

Adding a user to an existing SageMaker Domain

No-code machine learning with SageMaker Canvas

AutoML with SageMaker Autopilot

Summary

Further reading

2

Deep Learning AMIs

Technical requirements

Getting started with Deep Learning AMIs

Launching an EC2 instance using a Deep Learning AMI

Locating the framework-specific DLAMI

Choosing the instance type

Ensuring a default secure configuration

Launching the instance and connecting to it using EC2 Instance Connect

Downloading the sample dataset

Training an ML model

Loading and evaluating the model

Cleaning up

Understanding how AWS pricing works for EC2 instances

Using multiple smaller instances to reduce the overall cost of running ML workloads

Using spot instances to reduce the cost of running training jobs

Summary

Further reading

3

Deep Learning Containers

Technical requirements

Getting started with AWS Deep Learning Containers

Essential prerequisites

Preparing the Cloud9 environment

Downloading the sample dataset

Using AWS Deep Learning Containers to train an ML model

Serverless ML deployment with Lambda’s container image support

Building the custom container image

Testing the container image

Pushing the container image to Amazon ECR

Running ML predictions on AWS Lambda

Completing and testing the serverless API setup

Summary

Further reading

Part 2: Solving Data Engineering and Analysis Requirements

4

Serverless Data Management on AWS

Technical requirements

Getting started with serverless data management

Preparing the essential prerequisites

Opening a text editor on your local machine

Creating an IAM user

Creating a new VPC

Uploading the dataset to S3

Running analytics at scale with Amazon Redshift Serverless

Setting up a Redshift Serverless endpoint

Opening Redshift query editor v2

Creating a table

Loading data from S3

Querying the database

Unloading data to S3

Setting up Lake Formation

Creating a database

Creating a table using an AWS Glue Crawler

Using Amazon Athena to query data in Amazon S3

Setting up the query result location

Running SQL queries using Athena

Summary

Further reading

5

Pragmatic Data Processing and Analysis

Technical requirements

Getting started with data processing and analysis

Preparing the essential prerequisites

Downloading the Parquet file

Preparing the S3 bucket

Automating data preparation and analysis with AWS Glue DataBrew

Creating a new dataset

Creating and running a profile job

Creating a project and configuring a recipe

Creating and running a recipe job

Verifying the results

Preparing ML data with Amazon SageMaker Data Wrangler

Accessing Data Wrangler

Importing data

Transforming the data

Analyzing the data

Exporting the data flow

Turning off the resources

Verifying the results

Summary

Further reading

Part 3: Diving Deeper with Relevant Model Training and Deployment Solutions

6

SageMaker Training and Debugging Solutions

Technical requirements

Getting started with the SageMaker Python SDK

Preparing the essential prerequisites

Creating a service limit increase request

Training an image classification model with the SageMaker Python SDK

Creating a new Notebook in SageMaker Studio

Downloading the training, validation, and test datasets

Uploading the data to S3

Using the SageMaker Python SDK to train an ML model

Using the %store magic to store data

Using the SageMaker Python SDK to deploy an ML model

Using the Debugger Insights Dashboard

Utilizing Managed Spot Training and Checkpoints

Cleaning up

Summary

Further reading

7

SageMaker Deployment Solutions

Technical requirements

Getting started with model deployments in SageMaker

Preparing the pre-trained model artifacts

Preparing the SageMaker script mode prerequisites

Preparing the inference.py file

Preparing the requirements.txt file

Preparing the setup.py file

Deploying a pre-trained model to a real-time inference endpoint

Deploying a pre-trained model to a serverless inference endpoint

Deploying a pre-trained model to an asynchronous inference endpoint

Creating the input JSON file

Adding an artificial delay to the inference script

Deploying and testing an asynchronous inference endpoint

Cleaning up

Deployment strategies and best practices

Summary

Further reading

Part 4: Securing, Monitoring, and Managing Machine Learning Systems and Environments

8

Model Monitoring and Management Solutions

Technical prerequisites

Registering models to SageMaker Model Registry

Creating a new notebook in SageMaker Studio

Registering models to SageMaker Model Registry using the boto3 library

Deploying models from SageMaker Model Registry

Enabling data capture and simulating predictions

Scheduled monitoring with SageMaker Model Monitor

Analyzing the captured data

Deleting an endpoint with a monitoring schedule

Cleaning up

Summary

Further reading

9

Security, Governance, and Compliance Strategies

Managing the security and compliance of ML environments

Authentication and authorization

Network security

Encryption at rest and in transit

Managing compliance reports

Vulnerability management

Preserving data privacy and model privacy

Federated Learning

Differential Privacy

Privacy-preserving machine learning

Other solutions and options

Establishing ML governance

Lineage Tracking and reproducibility

Model inventory

Model validation

ML explainability

Bias detection

Model monitoring

Traceability, observability, and auditing

Data quality analysis and reporting

Data integrity management

Summary

Further reading

Part 5: Designing and BuildingEnd-to-end MLOps Pipelines

10

Machine Learning Pipelines with Kubeflow on Amazon EKS

Technical requirements

Diving deeper into Kubeflow, Kubernetes, and EKS

Preparing the essential prerequisites

Preparing the IAM role for the EC2 instance of the Cloud9 environment

Attaching the IAM role to the EC2 instance of the Cloud9 environment

Updating the Cloud9 environment with the essential prerequisites

Setting up Kubeflow on Amazon EKS

Running our first Kubeflow pipeline

Using the Kubeflow Pipelines SDK to build ML workflows

Cleaning up

Recommended strategies and best practices

Summary

Further reading

11

Machine Learning Pipelines with SageMaker Pipelines

Technical requirements

Diving deeper into SageMaker Pipelines

Preparing the essential prerequisites

Running our first pipeline with SageMaker Pipelines

Defining and preparing our first ML pipeline

Running our first ML pipeline

Creating Lambda functions for deployment

Preparing the Lambda function for deploying a model to a new endpoint

Preparing the Lambda function for checking whether an endpoint exists

Preparing the Lambda function for deploying a model to an existing endpoint

Testing our ML inference endpoint

Completing the end-to-end ML pipeline

Defining and preparing the complete ML pipeline

Running the complete ML pipeline

Cleaning up

Recommended strategies and best practices

Summary

Further reading

Index

Other Books You May Enjoy

Preface

There is a growing need for professionals with experience in working on machine learning (ML) engineering requirements as well as those with knowledge of automating complex MLOps pipelines in the cloud. This book explores a variety of AWS services, such as Amazon Elastic Kubernetes Service, AWS Glue, AWS Lambda, Amazon Redshift, and AWS Lake Formation, which ML practitioners can leverage to meet various data engineering and ML engineering requirements in production.

This machine learning book covers the essential concepts as well as step-by-step instructions that are designed to help you get a solid understanding of how to manage and secure ML workloads in the cloud. As you progress through the chapters, you’ll discover how to use several container and serverless solutions when training and deploying TensorFlow and PyTorch deep learning models on AWS. You’ll also delve into proven cost optimization techniques as well as data privacy and model privacy preservation strategies in detail as you explore best practices when using each AWS.

By the end of this AWS book, you'll be able to build, scale, and secure your own ML systems and pipelines, which will give you the experience and confidence needed to architect custom solutions using a variety of AWS services for ML engineering requirements.

Who this book is for

This book is for ML engineers, data scientists, and AWS cloud engineers interested in working on production data engineering, machine learning engineering, and MLOps requirements using a variety of AWS services such as Amazon EC2, Amazon Elastic Kubernetes Service (EKS), Amazon SageMaker, AWS Glue, Amazon Redshift, AWS Lake Formation, and AWS Lambda -- all you need is an AWS account to get started. Prior knowledge of AWS, machine learning, and the Python programming language will help you to grasp the concepts covered in this book more effectively.

What this book covers

Chapter 1, Introduction to ML Engineering on AWS, focuses on helping you get set up, understand the key concepts, and get your feet wet quickly with several simplified AutoML examples.

Chapter 2, Deep Learning AMIs, introduces AWS Deep Learning AMIs and how they are used to help ML practitioners perform ML experiments faster inside EC2 instances. Here, we will also dive a bit deeper into how AWS pricing works for EC2 instances so that you will have a better idea of how to optimize and reduce the overall costs of running ML workloads in the cloud.

Chapter 3, Deep Learning Containers, introduces AWS Deep Learning Containers and how they are used to help ML practitioners perform ML experiments faster using containers. Here, we will also deploy a trained deep learning model inside an AWS Lambda function using Lambda’s container image support.

Chapter 4, Serverless Data Management on AWS, presents several serverless solutions, such as Amazon Redshift Serverless and AWS Lake Formation, for managing and querying data on AWS.

Chapter 5, Pragmatic Data Processing and Analysis, focuses on the different services available when working on data processing and analysis requirements, such as AWS Glue DataBrew and Amazon SageMaker Data Wrangler.

Chapter 6, SageMaker Training and Debugging Solutions, presents the different solutions and capabilities available when training an ML model using Amazon SageMaker. Here, we dive a bit deeper into the different options and strategies when training and tuning ML models in SageMaker.

Chapter 7, SageMaker Deployment Solutions, focuses on the relevant deployment solutions and strategies when performing ML inference on the AWS platform.

Chapter 8, Model Monitoring and Management Solutions, presents the different monitoring and management solutions available on AWS.

Chapter 9, Security, Governance, and Compliance Strategies, focuses on the relevant security, governance, and compliance strategies needed to secure production environments. Here, we will also dive a bit deeper into the different techniques to ensure data privacy and model privacy.

Chapter 10, Machine Learning Pipelines with Kubeflow on Amazon EKS, focuses on using Kubeflow Pipelines, Kubernetes, and Amazon EKS to deploy an automated end-to-end MLOps pipeline on AWS.

Chapter 11, Machine Learning Pipelines with SageMaker Pipelines, focuses on using SageMaker Pipelines to design and build automated end-to-end MLOps pipelines. Here, we will apply, combine, and connect the different strategies and techniques we learned in the previous chapters of the book.

To get the most out of this book

You will need an AWS account and a stable internet connection to complete the hands-on solutions in this book. If you still do not have an AWS account, feel free to check the AWS Free Tier page and click Create a Free Account: https://aws.amazon.com/free/.

Software/hardware covered in the book

Operating system requirements

Chrome, Firefox, Safari, Edge, Opera, or alternative

Windows, macOS, or Linux

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Machine-Learning-Engineering-on-AWS. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/jeBII.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “ENTRYPOINT is set to /opt/conda/bin/python -m awslambdaric. The CMD command is then set to app.handler. The ENTRYPOINT and CMD instructions define which command is executed when the container starts to run.”

A block of code is set as follows:

SELECT booking_changes, has_booking_changes, * FROM dev.public.bookings WHERE (booking_changes=0 AND has_booking_changes='True') OR (booking_changes>0 AND has_booking_changes='False');

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

--- apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata:   name: kubeflow-eks-000   region: us-west-2   version: "1.21" availabilityZones: ["us-west-2a", "us-west-2b", "us-west-2c", "us-west-2d"] managedNodeGroups: - name: nodegroup   desiredCapacity: 5   instanceType: m5.xlarge   ssh:     enableSsm: true

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “After clicking the FILTER button, a drop-down menu should appear. Locate and select Greater than or equal to from the list of options under By condition. This should update the pane on the right side of the page and show the list of configuration options for the Filter values operation.”

Tips or Important Notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, select your book, click on the Errata Submission Form link, and enter the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Machine Learning Engineering on AWS, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Part 1: Getting Started with Machine Learning Engineering on AWS

In this section, readers will be introduced to the world of ML engineering on AWS.

This section comprises the following chapters:

Chapter 1, Introduction to ML Engineering on AWSChapter 2, Deep Learning AMIsChapter 3, Deep Learning Containers