Practical Deep Learning at Scale with MLflow - Yong Liu - E-Book

Practical Deep Learning at Scale with MLflow E-Book

Yong Liu

0,0
33,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The book starts with an overview of the deep learning (DL) life cycle and the emerging Machine Learning Ops (MLOps) field, providing a clear picture of the four pillars of deep learning: data, model, code, and explainability and the role of MLflow in these areas.
From there onward, it guides you step by step in understanding the concept of MLflow experiments and usage patterns, using MLflow as a unified framework to track DL data, code and pipelines, models, parameters, and metrics at scale. You’ll also tackle running DL pipelines in a distributed execution environment with reproducibility and provenance tracking, and tuning DL models through hyperparameter optimization (HPO) with Ray Tune, Optuna, and HyperBand. As you progress, you’ll learn how to build a multi-step DL inference pipeline with preprocessing and postprocessing steps, deploy a DL inference pipeline for production using Ray Serve and AWS SageMaker, and finally create a DL explanation as a service (EaaS) using the popular Shapley Additive Explanations (SHAP) toolbox.
By the end of this book, you’ll have built the foundation and gained the hands-on experience you need to develop a DL pipeline solution from initial offline experimentation to final deployment and production, all within a reproducible and open source framework.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 323

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Practical Deep Learning at Scale with MLflow

Bridge the gap between offline experimentation and online production

Yong Liu

BIRMINGHAM—MUMBAI

Practical Deep Learning at Scale with MLflow

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Dhruv Jagdish Kataria

Senior Editor: Tazeen Shaikh

Content Development Editor: Manikandan Kurup

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Rekha Nair

Production Designer: Jyoti Chauhan

Marketing Coordinators: Shifa Ansari and Abeer Riyaz Dawe

First published: July 2022

Production reference: 2200722

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80324-133-3

www.packt.com

To my father and the memory of my mother for their sacrificial love, prayers, and life-long support.

– Yong Liu

Foreword

I am thrilled to introduce this book on Practical Deep Learning at Scale with MLflow by Dr. Yong Liu. Deep learning has been revolutionizing many areas of computing in the past decade, but good resources for using it in production applications remain scarce. At the same time, practitioners have realized that designing machine learning (ML) applications to be operable, maintainable, and updateable is one of the hardest parts of using ML in production, leading to the new field of MLOps. Dr. Liu tackles these issues head-on by showing you how to build robust and maintainable deep learning applications using MLflow, a widely-used open source MLOps framework, and multiple state-of-the-art methods and software tools.

Dr. Liu brings a wealth of experience in production machine learning that shines through in every chapter of the book. He has been working in large-scale computing since his Ph.D., he has built large-scale production ML applications at Microsoft, Maana, and Outreach, and he has published multiple research papers on deep learning. This means that each chapter recommends practical approaches that have worked in multiple organizations. Dr. Liu also presents all his material clearly to tell you the tradeoffs in each decision, illustrates all the ideas through runnable code and surveys multiple open source and commercial tools for each task.

As one of the original creators of MLflow, I was very excited that Dr. Liu chose MLflow as the MLOps framework for this book. When we started MLflow in 2018, there was no widely used open-source MLOps framework, so we designed a highly extensible framework that can be integrated with a wide variety of other tools and services and customized to each organization’s workflow. We’ve been thrilled with the fast growth of the MLflow open source community since then and with the powerful integrations that the community has contributed to libraries including PyTorch, SHAP, Delta Lake, and others. Dr. Liu’s team was one of the early users of MLflow, so he is an expert on how to use the framework in practice. I hope that you enjoy learning from his experience and building groundbreaking applications using the latest techniques in deep learning.

Dr. Matei Zaharia

Chief Technologist, Databricks, and Co-Creator of MLflow

Contributors

About the author

Yong Liu has been working in big data science, machine learning, and optimization since his doctoral student years at the University of Illinois at Urbana-Champaign (UIUC) and later as a senior research scientist and principal investigator at the National Center for Supercomputing Applications (NCSA), where he led data science R&D projects funded by the National Science Foundation and Microsoft Research. He then joined Microsoft and AI/ML start-ups in the industry. He has shipped ML and DL models to production and has been a speaker at the Spark/Data+AI summit and NLP summit. He has recently published peer-reviewed papers on deep learning, linked data, and knowledge-infused learning at various ACM/IEEE conferences and journals.

I want to thank my wife and my two teenage kids for their support and encouragement during the time of writing this book. I am also grateful for those collaborators, team members, and mentors at Outreach Corporation whom I have learned a lot from.

About the reviewers

Dr. Pavel Dmitriev received a B.S. degree in applied mathematics from Moscow State University in 2002, and a Ph.D. degree in computer science from Cornell University in 2008. He previously worked as an engineer and a data scientist at Yahoo and Microsoft. He is currently a vice president of data science at Outreach where he works on enabling data-driven decision-making in sales through machine learning and experimentation. Pavel's research was presented at a number of international conferences such as KDD, ICSE, WWW, CIKM, BigData, and SEAA. A certified yoga and meditation instructor, he actively works on improving physical and mental well-being in corporations through classes and workshops.

Hong Yung (Joey) Yip is a Ph.D. candidate in computer science at the Artificial Intelligence Institute (AIISC), University of South Carolina. His research interests are the areas of knowledge-infused learning, which intertwines AI and knowledge graphs to enhance neural networks in performance, interpretability, and explainability for dynamic and real-time domains. He has co-authored and published at top venues (WWW, ISWC, and IEEE). He has previously interned at the National Library of Medicine, Bethesda MD, on developing scalable approaches for biomedical vocabulary alignment, and with Outreach Corporation, Seattle WA, on conceptualizing a Sales Engagement Graph framework for temporal pattern discovery and contextual understanding in sales processes.

Table of Contents

Preface

Section 1 - Deep Learning Challenges and MLflow Prime

Chapter 1: Deep Learning Life Cycle and MLOps Challenges

Technical requirements

Understanding the DL life cycle and MLOps challenges

Implementing a basic DL sentiment classifier

Understanding DL's full life cycle development

Understanding MLOps challenges

Understanding DL data challenges

Understanding DL model challenges

Understanding DL code challenges

Understanding DL explainability challenges

Summary

Further reading

Chapter 2: Getting Started with MLflow for Deep Learning

Technical requirements

Setting up MLflow

Setting up MLflow locally using miniconda

Setting up MLflow to interact with a remote MLflow server

Implementing our first DL experiment with MLflow autologging

Exploring MLflow's components and usage patterns

Exploring experiments and runs in MLflow

Exploring MLflow models and their usages

Exploring MLflow code tracking and its usages

Summary

Further reading

Section 2 –Tracking a Deep Learning Pipeline at Scale

Chapter 3: Tracking Models, Parameters, and Metrics

Technical requirements

Setting up a full-fledged local MLflow tracking server

Tracking model provenance

Understanding the open provenance tracking framework

Implementing MLflow model tracking

Tracking model metrics

Tracking model parameters

Summary

Further reading

Chapter 4: Tracking Code and Data Versioning

Technical requirements

Tracking notebook and pipeline versioning

Pipeline tracking

Tracking locally, privately built Python libraries

Tracking data versioning in Delta Lake

An example of tracking data using MLflow

Summary

Further reading

Section 3 –Running Deep Learning Pipelines at Scale

Chapter 5: Running DL Pipelines in Different Environments

Technical requirements

An overview of different execution scenarios and environments

Running locally with local code

Running remote code in GitHub locally

Running local code remotely in the cloud

Running remotely in the cloud with remote code in GitHub

Summary

Further reading

Chapter 6: Running Hyperparameter Tuning at Scale

Technical requirements

Understanding automatic HPO for DL pipelines

Types of hyperparameters and their challenges

How HPO works and which ones to choose

Creating HPO-ready DL models with Ray Tune and MLflow

Setting up Ray Tune and MLflow

Creating the Ray Tune trainable for the DL model

Creating the Ray Tune HPO run function

Running the first Ray Tune HPO experiment with MLflow

Running HPO with Ray Tune using Optuna and HyperBand

Summary

Further reading

Section 4 –Deploying a Deep Learning Pipeline at Scale

Chapter 7: Multi-Step Deep Learning Inference Pipeline

Technical requirements

Understanding patterns of DL inference pipelines

Understanding the MLflow Model Python Function API

Implementing a custom MLflow Python model

Implementing preprocessing and postprocessing steps in a DL inference pipeline

Implementing language detection preprocessing logic

Implementing caching preprocessing and postprocessing logic

Implementing response composition postprocessing logic

Implementing an inference pipeline as a new entry point in the main MLproject

Summary

Further reading

Chapter 8: Deploying a DL Inference Pipeline at Scale

Technical requirements

Understanding different deployment tools and host environments

Deploying locally for batch and web service inference

Batch inference

Model as a web service

Deploying using Ray Serve and MLflow deployment plugins

Deploying to AWS SageMaker – a complete end-to-end guide

Step 1: Build a local SageMaker Docker image

Step 2: Add additional model artifacts layers onto the SageMaker Docker image

Step 3: Test local deployment with the newly built SageMaker Docker image

Step 4: Push the SageMaker Docker image to AWS Elastic Container Registry

Step 5: Deploy the inference pipeline model to create a SageMaker endpoint

Step 6: Query the SageMaker endpoint for online inference

Summary

Further reading

Section 5 – Deep Learning Model Explainability at Scale

Chapter 9: Fundamentals of Deep Learning Explainability

Technical requirements

Understanding the categories and audience of explainability

Audience: who needs to know

Stage: when to provide an explanation in the DL life cycle

Scope: which prediction needs explanation

Input data format: what is the format of the input data

Output data format: what is the format of the output explanation

Problem type: what is the machine learning problem type

Objectives type: what is the motivation or goal to explain

Method type: what is the specific post-hoc explanation method used

Exploring the SHAP Explainability toolbox

Exploring the Transformers Interpret toolbox

Summary

Further reading

Chapter 10: Implementing DL Explainability with MLflow

Technical requirements

Understanding current MLflow explainability integration

Implementing a SHAP explanation using the MLflow artifact logging API

Implementing a SHAP explainer using the MLflow pyfunc API

Creating and logging an MLflow pyfunc explainer

Deploying an MLflow pyfunc explainer for an EaaS

Using an MLflow pyfunc explainer for batch explanation

Summary

Further reading

Other Books You May Enjoy

Section 1 - Deep Learning Challenges and MLflow Prime

In this section, we will learn about the five stages of the full life cycle of deep learning (DL), and understand the emerging field of machine learning operations (MLOps) and the role of MLflow. We will provide an overview of the challenges in the four pillars of a DL process: data, model, code, and explainability. Then, we will learn how to set up a basic local MLflow development environment and run our first MLflow experiment for a natural language processing (NLP) model built on top of PyTorch Lightning Flash. Finally, we will explain the foundational MLflow concepts such as experiments, runs, and many more, through this first MLflow experiment example.

This section comprises the following chapters:

Chapter 1, Deep Learning Life Cycle and MLOps ChallengesChapter 2, Getting Started with MLflow for Deep Learning