Practical Automated Machine Learning Using H2O.ai. - Salil Ajgaonkar - E-Book

Practical Automated Machine Learning Using H2O.ai. E-Book

Salil Ajgaonkar

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

With the huge amount of data being generated over the internet and the benefits that Machine Learning (ML) predictions bring to businesses, ML implementation has become a low-hanging fruit that everyone is striving for. The complex mathematics behind it, however, can be discouraging for a lot of users. This is where H2O comes in – it automates various repetitive steps, and this encapsulation helps developers focus on results rather than handling complexities.

You’ll begin by understanding how H2O’s AutoML simplifies the implementation of ML by providing a simple, easy-to-use interface to train and use ML models. Next, you’ll see how AutoML automates the entire process of training multiple models, optimizing their hyperparameters, as well as explaining their performance. As you advance, you’ll find out how to leverage a Plain Old Java Object (POJO) and Model Object, Optimized (MOJO) to deploy your models to production. Throughout this book, you’ll take a hands-on approach to implementation using H2O that’ll enable you to set up your ML systems in no time.

By the end of this H2O book, you’ll be able to train and use your ML models using H2O AutoML, right from experimentation all the way to production without a single need to understand complex statistics or data science.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 477

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Practical Automated Machine Learning Using H2O.ai

Discover the power of automated machine learning,from experimentation through to deployment to production

Salil Ajgaonkar

BIRMINGHAM—MUMBAI

Practical Automated Machine Learning Using H2O.ai
Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Dhruv Jagdish Kataria

Senior Editor: Nathanya Dias

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Subalakshmi Govindhan

Production Designer: Ponraj Dhandapani

Marketing Coordinators: Shifa Ansari, Abeer Riyaz Dawe

First published: September 2022

Production reference: 1140922

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80107-452-0

www.packt.com

Contributors

About the author

Salil Ajgaonkar is a software engineer experienced in building and scaling cloud-based microservices and productizing machine learning models. His background includes work in transaction systems, artificial intelligence, and cyber security. He is passionate about solving complex scaling problems, building machine learning pipelines, and data engineering. Salil earned his degree in IT from Xavier Institute of Engineering, Mumbai, India, in 2015 and later earned his master’s degree in computer science from Trinity College Dublin, Ireland, in 2018, specializing in future networked systems. His work history includes the likes of BookMyShow, Genesys, and Vectra AI.

I would like to thank my lovely wife, Oshin, for supporting me and making sure I gave my best effort to writing this book. Also, thanks to my parents, who taught me to always say “yes” to all opportunities that come my way.

I would also like to thank the Packt team for giving me the opportunity to write this book and play my part in giving back to the programming community. Special thanks to Dhruv for coordinating the publication effort, Kirti for scheduling and keeping things on track over the year, and Nathanya and the editorial team for ensuring that the book is of the highest quality.

Last but not least, special thanks to Dr. Emir Muñoz for his valuable insights into the technical aspects of the book.

About the reviewer

Emir Muñoz is a senior manager of machine learning at Genesys, where he works on projects to enhance customer experience using artificial intelligence, machine learning, and data science. He has experience in academia and industry, which he uses to leverage emerging technologies and algorithms to deliver innovative solutions. Currently, he leads a team that mines contact center data to train machine learning models to optimize contact center routing.

Emir holds a PhD in computer science with a specialization in machine learning. He also received a BEng in informatics and an MSc in computer engineering. He is the author of several papers and patents on the topics of semantic web, machine learning, knowledge graphs, and contact center analytics.

Table of Contents

Preface

Part: 1H2O AutoML Basics

1

Understanding H2O AutoML Basics

Technical requirements

Understanding AutoML and H2O AutoML

AutoML

H2O AutoML

Minimum system requirements to use H2O AutoML

Installing Java

Basic implementation of H2O using Python

Installing Python

Installing H2O using Python

Basic implementation of H2O using R

Installing R

Installing H2O using R

Training your first ML model using H2O AutoML

Understanding the Iris flower dataset

Model training

Summary

2

Working with H2O Flow(H2O’s Web UI)

Technical requirements

Understanding the basics of H2O Flow

Downloading and launching H2O Flow

Exploring H2O Flow

Working with data functions in H2O Flow

Importing the dataset

Parsing the dataset

Observing the dataframe

Splitting a dataframe

Working with model training functions in H2O Flow

Understanding the AutoML parameters in H2O Flow

Training and understanding models using AutoML in H2O Flow

Working with prediction functions in H2O Flow

Making predictions using H2O Flow

Understanding the prediction results

Summary 

Part 2: H2O AutoMLDeep Dive

3

Understanding Data Processing

Technical requirements

Reframing your dataframe

Combining columns from two dataframes

Combining rows from two dataframes

Merging two dataframes

Handling missing values in the dataframe

Filling NA values

Replacing values in a frame

Imputation

Manipulating feature columns of the dataframe

Sorting columns

Changing column types

Tokenization of textual data

Encoding data using target encoding

Summary

4

Understanding H2O AutoML Architecture and Training

Observing the high-level architecture of H2O

Observing the client layer

Observing the JVM component layer

Learning about the flow of interaction between the client and the H2O service

Learning about H2O client-server interactions during the ingestion of data

Knowing the sequence of interactions in H2O during model training

Understanding how H2O AutoML performs hyperparameter optimization and training

Understanding hyperparameters

Understanding hyperparameter optimization

Summary

5

Understanding AutoML Algorithms

Understanding the different types of ML algorithms

Understanding the Generalized Linear Model algorithm

Introduction to linear regression

Understanding the assumptions of linear regression

Working with a Generalized Linear Model

Understanding the Distributed Random Forest algorithm

Introduction to decision trees

Introduction to Random Forest

Understanding Extremely Randomized Trees

Understanding the Gradient Boosting Machine algorithm

Building a Gradient Boosting Machine

Understanding what is Deep Learning

Understanding neural networks

Summary

6

Understanding H2O AutoML Leaderboard and Other Performance Metrics

Exploring the H2O AutoML leaderboard performance metrics

Understanding the mean squared error and the root mean squared error

Working with the confusion matrix

Calculating the receiver operating characteristic and its area under the curve (ROC-AUC)

Calculating the precision-recall curve and its area under the curve (AUC-PR)

Working with log loss

Exploring other model performance metrics

Understanding the F1 score performance metric

Calculating the absolute Matthews correlation coefficient

Measuring the R2 performance metric

Summary

7

Working with Model Explainability

Technical requirements

Working with the model explainability interface

Implementing the model explainability interface in Python

Implementing the model explainability interface in R

Exploring the various explainability features

Understanding residual analysis

Understanding variable importance

Understanding feature importance heatmaps

Understanding model correlation heatmaps

Understanding partial dependency plots

Understanding SHAP summary plots

Understanding individual conditional expectation plots

Understanding learning curve plots

Summary

Part 3: H2O AutoML Advanced Implementation and Productization

8

Exploring Optional Parameters for H2O AutoML

Technical requirements

Experimenting with parameters that support imbalanced classes

Understanding undersampling the majority class

Understanding oversampling the minority class

Working with class balancing parameters in H2O AutoML

Experimenting with parameters that support early stopping

Understanding early stopping

Working with early stopping parameters in H2O AutoML

Experimenting with parameters that support cross-validation

Understanding cross-validation

Working with cross-validation parameters in H2O AutoML

Summary

9

Exploring Miscellaneous Features in H2O AutoML

Technical requirements

Understanding H2O AutoML integration in scikit-learn

Building and installing scikit-learn

Experimenting with scikit-learn

Using H2O AutoML in scikit-learn

Understanding H2O AutoML event logging

Summary

10

Working with Plain Old Java Objects (POJOs)

Technical requirements

Introduction to POJOs

Extracting H2O models as POJOs

Downloading H2O models as POJOs in Python

Downloading H2O models as POJOs in R

Downloading H2O models as POJOs in H2O Flow

Using a H2O model as a POJO

Summary

11

Working with Model Object, Optimized (MOJO)

Technical requirements

Understanding what a MOJO is

Extracting H2O models as MOJOs

Extracting H2O models as MOJOs in Python

Extracting H2O models as MOJOs in R

Extracting H2O models as MOJOs in H2O Flow

Viewing model MOJOs

Using H2O AutoML model MOJOs to make predictions

Summary

12

Working with H2O AutoML and Apache Spark

Technical requirements

Exploring Apache Spark

Understanding the components of Apache Spark

Understanding the Apache Spark architecture

Understanding what a Resilient Distributed Dataset is

Exploring H2O Sparkling Water

Downloading and installing H2O Sparkling Water

Implementing Spark and H2O AutoML using H2O Sparkling Water

Summary

13

Using H2O AutoML with Other Technologies

Technical requirements

Using H2O AutoML and Spring Boot

Understanding the problem statement

Designing the architecture

Working on the implementation

Using H2O AutoML and Apache Storm

What is Apache Storm?

Understanding the problem statement

Designing the architecture

Working on the implementation

Summary

Index

Other Books You May Enjoy

Part 1 H2O AutoML Basics

The objective of this part is to help you implement an easy, bare-bones demo of how to install, set up, and use H2O AutoML, opening up further exploration of and experimentation with the technology.

This section comprises the following chapters:

Chapter 1, Understanding H2O AutoML BasicsChapter 2, Working with H2O Flow (H2O’s Web UI)

2

Working with H2O Flow (H2O’s Web UI)

Machine Learning (ML) is more than just code. It involves tons of observations from different perspectives. As powerful as actual coding is, a lot of information gets hidden away behind the Terminal on which you code. Humans have always understood pictures more easily than words. Similarly, as complex as ML is, it can be very easy and fun to implement with the help of interactive User Interfaces (UIs). Working with a colorful UI over the dull black and white pixelated Terminal is always a plus when learning about difficult topics.

H2O Flow is a web-based UI developed by the H2O.ai team. This interface works on the same backend that we learned about in Chapter 1, Understanding H2O AutoML Basics. It is simply a web UI wrapped over the main H2O library, which passes inputs and triggers functions on the backend server and reads the results by displaying them back to the user.

In this chapter, we will learn how to work with H2O Flow. We will perform all the typical steps of the ML pipeline, which we learned about in the Understanding AutoML and H2O AutoML section of Chapter 1, Understanding H2O AutoML Basics, from reading datasets to making predictions using the trained models. Also, we will explore a few metrics and model details to help us ease into more advanced topics later. This chapter is hands-on, and we will learn about the various parts of H2O Flow as we create our ML pipeline.

By the end of this chapter, you will be able to navigate and use the various features of H2O Flow. Additionally, you will be able to train your ML models and use them for predictions without needing to write a single line of code using H2O Flow.

In this chapter, we are going to cover the following topics:

Understanding the basics of H2O FlowWorking with data functions in H2O FlowWorking with model training functions in H2O FlowWorking with prediction functions in H2O Flow

Technical requirements

You will require the following:

A decent web browser (Chrome, Firefox, or Edge), the latest version of your preferred web browser.

Understanding the basics of H2O Flow

H2O Flow is an open source web interface that helps users execute code, plot graphs, and display dataframes on a single page called a Flow notebook or just Flow.

Users of Jupyter notebooks will find H2O Flow very similar. You write your executable code in cells, and the output of the code is displayed below it when you execute the cell. Then, the cursor moves on to the next cell. The best thing about a Flow is that it can be easily saved, exported, and imported between various users. This helps a lot of data scientists share results among various stakeholders, as they just need to save the execution results and share the flow.

In the following sub-sections, we will gain an understanding of the basics of H2O Flow. Let’s begin our journey with H2O Flow by, first, downloading it to our system.

Downloading and launching H2O Flow

In order to run H2O Flow, you will need to first download the H2O Flow Java Archive (JAR) file onto your system, and then run the JAR file once it has been downloaded.

You can download and launch H2O Flow using the following steps:

You can download H2O Flow at https://h2o-release.s3.amazonaws.com/h2o/master/latest.html.Once the ZIP file has been downloaded, open a Terminal and run the following commands in the folder where you downloaded the ZIP file:

unzip {name_of_the_h2o_zip_file}

To run H2O Flow, run the following command inside the folder of your recently unzipped h2o file:

java -jar h2o.jar

This will start an H2O Web UI on http://localhost:54321.

Now that we have downloaded and launched H2O Flow, let’s briefly explore it to get an understanding of what functionalities it has to offer.

Exploring H2O Flow

H2O Flow is a very feature-intensive