Debugging Machine Learning Models with Python - Ali Madani - E-Book

Debugging Machine Learning Models with Python E-Book

Ali Madani

0,0
35,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Debugging Machine Learning Models with Python is a comprehensive guide that navigates you through the entire spectrum of mastering machine learning, from foundational concepts to advanced techniques. It goes beyond the basics to arm you with the expertise essential for building reliable, high-performance models for industrial applications. Whether you're a data scientist, analyst, machine learning engineer, or Python developer, this book will empower you to design modular systems for data preparation, accurately train and test models, and seamlessly integrate them into larger technologies.
By bridging the gap between theory and practice, you'll learn how to evaluate model performance, identify and address issues, and harness recent advancements in deep learning and generative modeling using PyTorch and scikit-learn. Your journey to developing high quality models in practice will also encompass causal and human-in-the-loop modeling and machine learning explainability. With hands-on examples and clear explanations, you'll develop the skills to deliver impactful solutions across domains such as healthcare, finance, and e-commerce.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 447

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Debugging Machine Learning Models with Python

Develop high-performance, low-bias, and explainable machine learning and deep learning models

Ali Madani

BIRMINGHAM—MUMBAI

Debugging Machine Learning Models with Python

Copyright © 2023 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Niranjan Naikwadi

Publishing Product Manager: Anant Jain

Book Project Manager: Hemangi Lotlikar

Senior Editor: Rohit Singh

Technical Editor: Sweety Pagaria

Copy Editor: Safis Editing

Proofreader: Safis Editing

Indexer: Sejal Dsilva

Production Designer: Joshua Misquitta

DevRel Marketing Executive: Vinishka Kalra

First published: Sep 2023

Production reference: 1180823

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-80020-858-2

www.packtpub.com

To my mother, Fatemeh Bekali, and my father, Razi, whose sacrifices and unwavering support have been my foundation. To my loving partner, Parand, whose constant understanding and love have been my inspiration and strength.

– Ali Madani

Foreword

Ali Madani is a global expert in ML-based drug discovery, where he has led the development of multiple robust ML products with real-world applications in the life sciences. Ali is a skilled communicator and he is passionate about practical applications of ML development. He rose to popularity over social media through his educational series on applied ML, distilling complex state-of-the-art AI research topics into brief descriptions and diagrams, which could be easily understood by ML learners and non-technical professionals interested in the scientific and business applications of new technologies. Through his role as the Director of Machine Learning at Cyclica (acquired by Recursion Pharmaceuticals), Ali was involved in all phases of the ML product life cycle, from ideation to continuous development, field testing, and commercialization. He was a mentor to ML-oriented staff developing their technical skillsets as well as scientific-oriented staff and field experts seeking to reconcile their interpretation of ML model evaluations with real-world applications.

In this book, Debugging Machine Learning Models with Python, Ali shares his first-hand experience with readers, covering the practical elements of ML development that are critical for progressing ML technologies from first-pass data science experiments into refined, commercial ML solutions, aimed at real-world performance. This book covers a broad spectrum of topics – from modularizing components of ML life cycles to correctly assessing the performance of ML models and devising improvement strategies. This book extends beyond ML model training and testing, and provides you with technical details on how to detect biases in your models and plan to achieve fairness through different techniques such as methods aiming for local and global ML explainability. You will also practice with Deep Learning supervised, generative, and self-supervised modeling for different data modalities, such as images, texts, and graphs. In this book, you will practice with different Python libraries, such as scikit-learn, PyTorch, Transformers, Ray, imblearn, Shap, AIF360, and many more to gain hands-on experience in implementing these techniques and concepts.

With this book, you’ll learn how to maximize the value of ML technologies, leading the way in developing best-in-class technologies in any domain. Here, Ali provides you with engineering aspects of ML technology development as well as covers topics, such as data and model versioning to achieve reproducibility, data, and concept drift detection to have reliable models in production, and test-driven development to reduce risks of having untrustworthy ML models. You will also learn about different techniques for increasing the security and privacy of your data and models.

Stephen MacKinnon

Vice President, Digital Chemistry

Contributors

About the author

Ali Madani worked as the Director of Machine Learning at Cyclica Inc, leading AI technology development front of Cyclica for drug discovery before its acquisition by Recursion Pharmaceuticals, where Ali continues focusing on the applications of machine learning for drug discovery. Ali completed his Ph.D. at the University of Toronto, focusing on machine learning modeling in a cancer setting, and attained a Master of Mathematics degree from the University of Waterloo. As a believer in industry-oriented education and pro-democratization of knowledge, Ali has actively educated students and professionals through international workshops and courses on basic and advanced high-quality machine learning modeling. When not immersed in machine learning modeling and teaching, Ali enjoys exercising, cooking, and traveling with his partner.

I would like to extend my heartfelt thanks to my partner, Parand, and my parents for their unwavering support and love. I’m also deeply grateful to my mentors throughout the years, whose wisdom and guidance have been invaluable. Thank you all for being an essential part of this journey.

About the reviewers

Krishnan Raghavan is an IT Professional with over 20 years of experience in the field of software development and delivery excellence across multiple domains and technologies, ranging from C++ to Java, Python, Data Warehousing, and Big Data tools and technologies. In his free time, Krishnan likes to spend time with his wife and daughter besides reading fiction, non-fiction as well as technical books. Krishnan tries to give back to the community by being a part of GDG – Pune Volunteer Group and helping the team in organizing events. Currently, he is unsuccessfully trying to learn to play the guitar.

You can connect with him at [email protected] or via LinkedIn.

I would like to thank my wife, Anita, and daughter, Ananya, for giving me the time and space to review this book.

Amreth Chandrasehar is a Director at Informatica responsible for ML Engineering, Observability, and SRE teams. Over the last few years, he has played a key role in Cloud migration, CNCF architecture, Generative AI, Observability, and machine learning adoption at various organizations. He is also a co-creator of the Conducktor Platform, serving T-Mobile’s 140+ million customers, and a Tech/Customer Advisory board member at various companies. He has also co-developed and open sourced Kardio.io. Amreth has been invited and spoken at several key conferences and has won several awards within the company. He was recently awarded a Gold Award at the 15th Annual 2023 Golden Bridge Business and Innovation Awards for his contributions to the field of Observability and Generative AI.

I would like to thank my wife, Ashwinya Mani, and my son, Athvik A, for their patience and support during my review of this book.

Table of Contents

Preface

Part 1: Debugging for Machine Learning Modeling

1

Beyond Code Debugging

Technical requirements

Machine learning at a glance

Types of machine learning modeling

Supervised learning

Unsupervised learning

Self-supervised learning

Semi-supervised learning

Reinforcement learning

Generative machine learning

Debugging in software development

Error messages in Python

Debugging techniques

Debuggers

Best practices for high-quality Python programming

Version control

Debugging beyond Python

Flaws in data used for modeling

Data format and structure

Data quantity and quality

Data biases

Model and prediction-centric debugging

Underfitting and overfitting

Inference in model testing and production

Data or hyperparameters for changing landscapes

Summary

Questions

References

2

Machine Learning Life Cycle

Technical requirements

Before we start modeling

Data collection

Data selection

Data exploration

Data wrangling

Structuring

Enriching

Data transformation

Cleaning

Modeling data preparation

Feature selection and extraction

Designing an evaluation and testing strategy

Model training and evaluation

Testing the code and the model

Model deployment and monitoring

Summary

Questions

References

3

Debugging toward Responsible AI

Technical requirements

Impartial modeling fairness in machine learning

Data bias

Algorithmic bias

Security and privacy in machine learning

Data privacy

Data poisoning

Adversarial attacks

Output integrity attacks

System manipulation

Secure and private machine learning techniques

Transparency in machine learning modeling

Accountable and open to inspection modeling

Data and model governance

Summary

Questions

References

Part 2: Improving Machine Learning Models

4

Detecting Performance and Efficiency Issues in Machine Learning Models

Technical requirements

Performance and error assessment measures

Classification

Regression

Clustering

Visualization for performance assessment

Summary metrics are not enough

Visualizations could be misleading

Don’t interpret your plots as you wish

Bias and variance diagnosis

Model validation strategy

Error analysis

Beyond performance

Summary

Questions

References

5

Improving the Performance of Machine Learning Models

Technical requirements

Options for improving model performance

Grid search

Random search

Bayesian search

Successive halving

Synthetic data generation

Oversampling for imbalanced data

Improving pre-training data processing

Anomaly detection and outlier removal

Benefitting from data of lower quality or relevance

Regularization to improve model generalizability

Summary

Questions

References

6

Interpretability and Explainability in Machine Learning Modeling

Technical requirements

Interpretable versus black-box machine learning

Interpretable machine learning models

Explainability for complex models

Explainability methods in machine learning

Local explainability techniques

Global explanation

Practicing machine learning explainability in Python

Explanations in SHAP

Explanations using LIME

Counterfactual generation using Diverse Counterfactual Explanations (DiCE)

Reviewing why having explainability is not enough

Summary

Questions

References

7

Decreasing Bias and Achieving Fairness

Technical requirements

Fairness in machine learning modeling

Proxies for sensitive variables

Sources of bias

Biases introduced in data generation and collection

Bias in model training and testing

Bias in production

Using explainability techniques

Fairness assessment and improvement in Python

Summary

Questions

References

Part 3: Low-Bug Machine Learning Development and Deployment

8

Controlling Risks Using Test-Driven Development

Technical requirements

Test-driven development for machine learning modeling

Unit testing

Machine learning differential testing

Tracking machine learning experiments

Summary

Questions

References

9

Testing and Debugging for Production

Technical requirements

Infrastructure testing

Infrastructure as Code tools

Infrastructure testing tools

Infrastructure testing using Pytest

Integration testing of machine learning pipelines

Integration testing using pytest

Monitoring and validating live performance

Model assertion

Summary

Questions

References

10

Versioning and Reproducible Machine Learning Modeling

Technical requirements

Reproducibility in machine learning

Data versioning

Model versioning

Summary

Questions

References

11

Avoiding and Detecting Data and Concept Drifts

Technical requirements

Avoiding drifts in your models

Avoiding data drift

Addressing concept drift

Detecting drifts

Practicing with alibi_detect for drift detection

Practicing with evidently for drift detection

Summary

Questions

References

Part 4:Deep Learning Modeling

12

Going Beyond ML Debugging with Deep Learning

Technical requirements

Introduction to artificial neural networks

Optimization algorithms

Frameworks for neural network modeling

PyTorch for deep learning modeling

Summary

Questions

References

13

Advanced Deep Learning Techniques

Technical requirements

Types of neural networks

Categorization based on data type

Convolutional neural networks for image shape data

Performance assessment

CNN modeling using PyTorch

Image data transformation and augmentation for CNNs

Using pre-trained models

Transformers for language modeling

Tokenization

Language embedding

Language modeling using pre-trained models

Modeling graphs using deep neural networks

Graph neural networks

GNNs with PyTorch Geometric

Summary

Questions

References

14

Introduction to Recent Advancements in Machine Learning

Technical requirements

Generative modeling

Generative deep learning techniques

Prompt engineering for text-based generative models

Generative modeling using PyTorch

Reinforcement learning

Reinforcement learning with human feedback (RLHF)

Self-supervised learning (SSL)

Self-supervised learning with PyTorch

Summary

Questions

References

Part 5: Advanced Topics in Model Debugging

15

Correlation versus Causality

Technical requirements

Correlation as part of machine learning models

Causal modeling to reduce risks and improve performance

Assessing causation in machine learning models

Causal inference

Causal modeling using Python

Using dowhy for causal effect estimation

Using bnlearn for causal inference through Bayesian networks

Summary

Questions

References

16

Security and Privacy in Machine Learning

Technical requirements

Encryption techniques and their use in machine learning

Implementing AES encryption in Python

Homomorphic encryption

Differential privacy

Federated learning

Summary

Questions

References

17

Human-in-the-Loop Machine Learning

Humans in the machine learning life cycle

Expert feedback collection

Human-in-the-loop modeling

Summary

Questions

References

Assessments

Chapter 1 – Beyond Code Debugging

Chapter 2 – Machine Learning Life Cycle

Chapter 3 – Debugging toward Responsible AI

Chapter 4 – Detecting Performance and Efficiency Issues in Machine Learning Models

Chapter 5 – Improving the Performance of Machine Learning Models

Chapter 6 – Interpretability and Explainability in Machine Learning Modeling

Chapter 7 – Decreasing Bias and Achieving Fairness

Chapter 8 – Controlling Risks Using Test-Driven Development

Chapter 9 – Testing and Debugging for Production

Chapter 10 – Versioning and Reproducible Machine Learning Modeling

Chapter 11 – Avoiding and Detecting Data and Concept Drifts

Chapter 12 – Going Beyond ML Debugging with Deep Learning

Chapter 13 – Advanced Deep Learning Techniques

Chapter 14 – Introduction to Recent Advancements in Machine Learning

Chapter 15 – Correlation versus Causality

Chapter 16 – Security and Privacy in Machine Learning

Chapter 17 – Human-in-the-Loop Machine Learning

Index

Other Books You May Enjoy

Part 1:Debugging for Machine Learning Modeling

In this part of the book, we will delve into the different aspects of machine learning development that extend beyond traditional paradigms. The first chapter illuminates the nuances between conventional code debugging and the specialized realm of machine learning debugging, emphasizing that the challenges in ML transcend mere code errors. The next chapter provides a comprehensive overview of the machine learning life cycle, highlighting the role of modularization in streamlining and enhancing model development. Finally, we will underscore the importance of model debugging in the pursuit of Responsible AI, emphasizing its role in ensuring ethical, transparent, and effective machine learning solutions.

This part has the following chapters:

Chapter 1, Beyond Code DebuggingChapter 2, Machine Learning Life CycleChapter 3, Debugging toward Responsible AI