Automated Machine Learning - Adnan Masood - E-Book

Automated Machine Learning E-Book

Adnan Masood

0,0
34,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Every machine learning engineer deals with systems that have hyperparameters, and the most basic task in automated machine learning (AutoML) is to automatically set these hyperparameters to optimize performance. The latest deep neural networks have a wide range of hyperparameters for their architecture, regularization, and optimization, which can be customized effectively to save time and effort.
This book reviews the underlying techniques of automated feature engineering, model and hyperparameter tuning, gradient-based approaches, and much more. You'll discover different ways of implementing these techniques in open source tools and then learn to use enterprise tools for implementing AutoML in three major cloud service providers: Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform. As you progress, you’ll explore the features of cloud AutoML platforms by building machine learning models using AutoML. The book will also show you how to develop accurate models by automating time-consuming and repetitive tasks in the machine learning development lifecycle.
By the end of this machine learning book, you’ll be able to build and deploy AutoML models that are not only accurate, but also increase productivity, allow interoperability, and minimize feature engineering tasks.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 212

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Automated Machine Learning

Hyperparameter optimization, neural architecture search, and algorithm selection with cloud platforms

Adnan Masood, PhD

BIRMINGHAM—MUMBAI

Automated Machine Learning

Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Kunal Parikh

Publishing Product Manager: Ali Abidi

Senior Editor: Mohammed Yusuf Imaratwale

Content Development Editor: Nazia Shaikh

Technical Editor: Sonam Pandey

Copy Editor: Safis Editing

Project Coordinator: Aparna Ravikumar Nair

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Vijay Kamble

First published: February 2021

Production reference: 1180221

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80056-768-9

www.packt.com

Foreword

There are moments in your life that stick with you no matter the circumstances. For me, it was the moment that I first met Dr. Adnan Masood. It was not at a tech conference or at a work function. It was at a Sunday school event which both of our children attended. He introduced himself and asked me what I did. I usually just give a generic canned response as most folks I speak with outside of my field of work don't really get what I do. Instead, his eyes lit up when I told him that I work with data. He kept asking me deeper and deeper questions about some of the most obscure Machine Learning and Deep Learning Algorithms that even I had not heard in a long time. It is a nice realization when you find out that you are not alone in this world and that there are others who have the same passion as you.

It is this passion that I see Dr. Masood bringing to a quickly growing and often misunderstood field of Automated Machine Learning. As a Data Scientist working at Microsoft, I often hear from organizational leads that Automated Machine Learning will lead to the end of the need for data science expertise. This is truly not the case and Automated Machine Learning should not be treated as a "black-box" or a "One-size-fits-all" approach to feature engineering, data pre-processing, model training, and model selection. Rather, Automated Machine Learning can help cut down the time and cost affiliated with work that takes away from the overall beauty of Data Science, Machine Learning, and Artificial Intelligence.

The great news about the current publication you hold in your hand or read on your tablet is that you now have a nuanced understanding of the benefits of applying Automated Machine Learning with every current and future project in your organization. Additionally, you will get hands-on expertise leveraging AutoML with open-source packages as well as cloud solutions offered by Azure, Amazon Web Services, and Google Cloud Platform. Whether you are a seasoned data scientist, a budding data scientist, a data engineer, an ML engineer, a DevOps engineer, or a data analyst, you will find that AutoML can help get you to the next level in your Machine Learning journey.

Ahmed Sherif

Cloud Solution Architect, AI & Analytics – Microsoft Corporation

Contributors

About the author

Adnan Masood, PhD is an artificial intelligence and machine learning researcher, visiting scholar at Stanford AI Lab, software engineer, Microsoft MVP (Most Valuable Professional), and Microsoft's regional director for artificial intelligence. As chief architect of AI and machine learning at UST Global, he collaborates with Stanford AI Lab and MIT CSAIL, and leads a team of data scientists and engineers building artificial intelligence solutions to produce business value and insights that affect a range of businesses, products, and initiatives.

About the reviewer

Jamshaid Sohail is passionate about data science, machine learning, computer vision, and natural language processing and has more than 2 years of experience in the industry. He has worked at a Silicon Valley-based start-up named FunnelBeam, the founders of which are from Stanford University, as a data scientist. Currently, he is working as a data scientist at Systems Limited. He has completed over 66 online courses from different platforms. He authored the book Data Wrangling with Python 3.X for Packt Publishing and has reviewed multiple books and courses. He is also developing a comprehensive course on data science at Educative and is in the process of writing books for multiple publishers.

Table of Contents

Preface

Section 1: Introduction to Automated Machine Learning

Chapter 1: A Lap around Automated Machine Learning

The ML development life cycle

Automated ML

How automated ML works

Hyperparameters

The need for automated ML

Democratization of data science

Debunking automated ML myths

Myth #1 – The end of data scientists

Myth #2 – Automated ML can only solve toy problems

Automated ML ecosystem

Open source platforms and tools

Microsoft NNI

auto-sklearn

Auto-Weka

Auto-Keras

TPOT

Ludwig – a code-free AutoML toolbox

AutoGluon – an AutoML toolkit for deep learning

Featuretools

H2O AutoML

Commercial tools and platforms

DataRobot

Google Cloud AutoML

Amazon SageMaker Autopilot

Azure Automated ML

H2O Driverless AI

The future of automated ML

The automated ML challenges and limitations

A Getting Started guide for enterprises

Summary

Further reading

Chapter 2: Automated Machine Learning, Algorithms, and Techniques

Automated ML – Opening the hood

The taxonomy of automated ML terms

Automated feature engineering

Hyperparameter optimization

Neural architecture search

Summary

Further reading

Chapter 3: Automated Machine Learning with Open Source Tools and Libraries

Technical requirements

The open source ecosystem for AutoML

Introducing TPOT

How does TPOT do this?

Introducing Featuretools

Introducing Microsoft NNI

Introducing auto-sklearn

AutoKeras

Ludwig – a code-free AutoML toolbox

AutoGluon – the AutoML toolkit for deep learning

Summary

Further reading

Section 2: AutoML with Cloud Platforms

Chapter 4: Getting Started with Azure Machine Learning

Getting started with Azure Machine Learning

The Azure Machine Learning stack

Getting started with the Azure Machine Learning service

Modeling with Azure Machine Learning

Deploying and testing models with Azure Machine Learning

Summary

Further reading

Chapter 5: Automated Machine Learning with Microsoft Azure

AutoML in Microsoft Azure

Time series prediction using AutoML

Summary

Further reading

Chapter 6: Machine Learning with AWS

ML in the AWS landscape

Getting started with AWS ML

AWS SageMaker Autopilot

AWS JumpStart

Summary

Further reading

Chapter 7: Doing Automated Machine Learning with Amazon SageMaker Autopilot

Technical requirements

Creating an Amazon SageMaker Autopilot limited experiment

Creating an AutoML experiment

Running the SageMaker Autopilot experiment and deploying the model

Invoking the model

Building and running SageMaker Autopilot experiments from the notebook

Hosting and invoking the model

Summary

Further reading

Chapter 8: Machine Learning with Google Cloud Platform

Getting started with the Google Cloud Platform services

AI and ML with GCP

Google Cloud AI Platform and AI Hub

Getting started with Google Cloud AI Platform

Automated ML with Google Cloud

Summary

Further reading

Chapter 9: Automated Machine Learning with GCP

Getting started with Google Cloud AutoML Tables

Creating an AutoML Tables experiment

Understanding AutoML Tables model deployment

AutoML Tables with BigQuery public datasets

Automated machine learning for price prediction

Summary

Further reading

Section 3: Applied Automated Machine Learning

Chapter 10: AutoML in the Enterprise

Does my organization need automated ML?

Clash of the titans – automated ML versus data scientists

Automated ML – an accelerator for enterprise advanced analytics

The democratization of AI with human-friendly insights

Augmented intelligence

Automated ML challenges and opportunities

Not having enough data

Model performance

Domain expertise and special use cases

Computational costs

Embracing the learning curve

Stakeholder adaption

Establishing trust – model interpretability and transparency in automated ML

Feature importance

Counterfactual analysis

Data science measures for model accuracy

Pre-modeling explainability

During-modeling explainability

Post-modeling explainability

Introducing automated ML in an organization

Brace for impact

Choosing the right automated ML platform

The importance of data

The right messaging for your audience

Call to action – where do I go next?

References and further reading

Why subscribe?

Other Books You May Enjoy

Preface

Every machine learning engineer deals with systems that have hyperparameters, and the most basic task in automated machine learning (AutoML) is to automatically set these hyperparameters to optimize performance. The latest deep neural networks have a wide range of hyperparameters for their architecture, regularization, and optimization, which can be customized effectively to save time and effort.

This book reviews the underlying techniques of automated feature engineering, model and hyperparameter tuning, gradient-based approaches, and more. You'll explore different ways of implementing these techniques in open source tools. Next, you'll focus on enterprise tools, learning about different ways of implementing AutoML in three major cloud service providers: Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform (GCP). As you progress, you'll explore the features of cloud AutoML platforms by building machine learning models using AutoML. Later chapters will show you how to develop accurate models by automating time-consuming and repetitive tasks involved in the machine learning development life cycle.

By the end of this book, you'll be able to build and deploy AutoML models that are not only accurate, but that also increase productivity, allow interoperability, and minimize featuring engineering tasks.

Who this book is for

Citizen data scientists, machine learning developers, AI enthusiasts, or anyone looking to automatically build machine learning models using the features offered by open source tools, Microsoft Azure Machine Learning, AWS, and Google Cloud Platform will find this book useful.

What this book covers

Chapter 1, A Lap around Automated Machine Learning, presents a detailed overview of AutoML methods by both providing a solid overview for novices and serving as a reference for experienced machine learning practitioners. This chapter starts with the machine learning development life cycle and navigates the problem of hyperparameter optimization that AutoML solves.

Chapter 2, Automated Machine Learning, Algorithms, and Techniques, allows citizen data scientists to build AI solutions without extensive experience. In this chapter, we review the current developments of AutoML in terms of three categories: automated feature engineering (AutoFE), automated model and hyperparameter learning (AutoMHL), and automated deep learning (AutoDL). State-of-the-art techniques adopted in these three categories are presented, including Bayesian optimization, reinforcement learning, evolutionary algorithms, and gradient-based approaches. In this chapter, we'll summarize popular AutoML frameworks and conclude with the current open challenges of AutoML.

Chapter 3, Automated Machine Learning with Open Source Tools and Libraries, teaches you about AutoML open source software (OSS) tools and libraries that automate the entire life cycle of the ideation, conceptualization, development, and deployment of predictive models. From data preparation through model training to validation as well as deployment, these tools do everything with almost zero human intervention. In this chapter, we'll review the major OSS tools, including TPOT, AutoKeras, Auto-Sklearn, Featuretools, H2O AutoML, Auto-PyTorch, Microsoft NNI, and Amazon AutoGluon, and help you to understand the different value propositions and approaches used in each of these libraries.

Chapter 4, Getting Started with Azure Machine Learning, covers Azure Machine Learning, which helps accelerate the end-to-end machine learning life cycle using the power of the Windows Azure platform and services. In this chapter, we will review how to get started with an enterprise-grade machine learning service to build and deploy models that empower developers and data scientists for building, training, and deploying machine learning models faster. With examples, we will set up the groundwork to build and deploy AutoML solutions.

Chapter 5, Automated Machine Learning with Microsoft Azure, reviews in detail and with code examples, how can we automate time-consuming and iterative tasks of model development using an Azure machine learning stack and perform operations such as regression, classification, and time series analysis using Azure AutoML. This chapter will enable you to perform hyperparameter tuning to find the optimal parameters and find the optimal model with Azure AutoML.

Chapter 6, Machine Learning with Amazon Web Services, covers Amazon SageMaker Studio, Amazon SageMaker Autopilot, Amazon SageMaker Ground Truth, and Amazon SageMaker Neo, along with the other AI services and frameworks offered by AWS. As well as hyperscalers (cloud offerings), AWS offers one of the broadest and deepest sets of machine learning services and supporting cloud infrastructure, putting machine learning in the hands of every developer, data scientist, and expert practitioner. AWS offers machine learning services, AI services, deep learning frameworks, and learning tools to build, train, and deploy machine learning models fast.

Chapter 7, Doing Automated Machine Learning with Amazon SageMaker Autopilot, takes us on a deep dive into Amazon SageMaker Studio, using SageMaker Autopilot to run several candidates to figure out the optimal combination of data preprocessing steps, machine learning algorithms, and hyperparameters. The chapter provides a hands-on, illustrative overview of training an inference pipeline, for easy deployment on a real-time endpoint or batch processing.

Chapter 8, Machine Learning with Google Cloud Platform, reviews Google's AI and machine learning offerings. Google Cloud offers innovative machine learning products and services on a trusted and scalable platform. These services include AI Hub, AI building blocks such as sight, language, conversation, and structured data services, and AI Platform. In this chapter, you will become familiar with these offerings and understand how AI Platform supports Kubeflow, Google's open source platform, which lets developers build portable machine learning pipelines with access to cutting-edge Google AI technology such as TensorFlow, TPUs, and TFX tools to deploy your AI applications to production.

Chapter 9, Automated Machine Learning with GCP Cloud AutoML, shows you how to train custom business-specific machine learning models, with minimum effort and machine learning expertise. With hands-on examples and code walk-throughs, we will explore the Google Cloud AutoML platform to create customized deep learning models in natural language, vision, unstructured data, language translation, and video intelligence, without any knowledge of data science or programming.

Chapter 10, AutoML in the Enterprise, presents AutoML in an enterprise setting as a system to automate data science by generating fully automated reports that include an analysis of the data, as well as predictive models and a comparison of their performance. A unique feature of AutoML is that it provides natural-language descriptions of the results, suitable for non-experts in machine learning. We emphasize the operationalization of an MLOps pipeline with a discussion on approaches that perform well on practical problems and determine the best overall approach. The chapter details ideas and concepts behind real-world challenges and provides a journey map to address these problems.

To get the most out of this book

This book is an introduction to AutoML. Familiarity with data science, machine learning, and deep learning methodologies will be helpful to understand how AutoML improves upon existing methods.

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800567689_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Open the autopilot_customer_churn notebook from the amazonsagemaker-examples/autopilot folder."

A block of code is set as follows:

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "From Amazon SageMaker Studio, start a data science notebook by clicking on the Python 3 button."

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1: Introduction to Automated Machine Learning

This part provides a detailed introduction to the landscape of automated machine learning, its pros and cons, and how it can be applied using open source tools and libraries. In this section, you will come to understand, with the aid of hands-on coding examples, that automated machine learning techniques are diverse, and there are different approaches taken by different libraries to address similar problems.

This section comprises the following chapters:

Chapter 1, A Lap around Automated Machine LearningChapter 2, Automated Machine Learning, Algorithms, and TechniquesChapter 3, Automated Machine Learning with Open Source Tools and Libraries

Chapter 1: A Lap around Automated Machine Learning

"All models are wrong, but some are useful."

– George Edward Pelham Box FRS

"One of the holy grails of machine learning is to automate more and more of the feature engineering process."

– Pedro Domingos, A Few Useful Things to Know about Machine Learning

This chapter will provide an overview of the concepts, tools, and technologies surrounding automated Machine Learning (ML). This introduction hopes to provide both a solid overview for novices and serve as a reference for experienced ML practitioners. We will start by introducing the ML development life cycle while navigating through the product ecosystem and the data science problems it addresses, before looking at feature selection, neural architecture search, and hyperparameter optimization.

It's very plausible that you are reading this book on an e-reader that's connected to a website that recommended this manuscript based on your reading interests. We live in a world today where your digital breadcrumbs give telltale signs of not only your reading interests, but where you like to eat, which friend you like most, where you will shop next, whether you will show up to your next appointment, and who you would vote for. In this age of big data, this raw data becomes information that, in turn, helps build knowledge and insights into so-called wisdom.

Artificial Intelligence (AI) and its underlying implementations of ML and deep learning help us not only find the metaphorical needle in the haystack, but also to see the underlying trends, seasonality, and patterns in these large data streams to make better predictions. In this book, we will cover one of the key emerging technologies in AI and ML; that is, automated ML, or AutoML for short.

In this chapter, we will cover the following topics:

The ML development life cycleAutomated MLHow automated ML worksDemocratization of data science Debunking automated ML mythsAutomated ML ecosystem (open source and commercial) Automated ML challenges and limitations

Let's get started!

The ML development life cycle

Before introducing you to automated ML, we should first define how we operationalize and scale ML experiments into production. To go beyond Hello-World apps and works-on-my-machine-in-my-Jupyter-notebook kinds of projects, enterprises need to adapt a robust, reliable, and repeatable model development and deployment process. Just as in a software development life cycle (SDLC), the ML or data science life cycle is also a multi-stage, iterative process.

The life cycle includes several steps – the process of problem definition and analysis, building the hypothesis (unless you are doing exploratory data analysis), selecting business outcome metrices, exploring and preparing data, building and creating ML models, training those ML models, evaluating and deploying them, and maintaining the feedback loop:

Figure 1.1 – Team data science process

A successful data science team has the discipline to prepare the problem statement and hypothesis, preprocess the data, select the appropriate features from the data based on the input of the Subject-Matter Expert (SME) and the right model family, optimize model hyperparameters, review outcomes and the resulting metrics, and finally fine-tune the models. If this sounds like a lot, remember that it is an iterative process where the data scientist also has to ensure that the data, model versioning, and drift are being addressed. They must also put guardrails in place to guarantee the model's performance is being monitored. Just to make this even more interesting, there are also frequent champion challenger and A/B experimentations happening in production – may the best model win.

In such an intricate and multifaceted environment, data scientists can use all the help they can get. Automated ML extends a helping hand with the promise to take care of the mundane, the repetitive, and the intellectually less efficient tasks so that the data scientists can focus on the important stuff.

Automated ML

"How many members of a certain demographic group does it take to perform a specified task?"

"A finite number: one to perform the task and the remainder to act in a manner stereotypical of the group in question." <insert your light bulb joke here>

This is meta humor – the finest type of humor for ensuing hilarity for those who are quantitatively inclined. Similarly, automated ML is a class of meta learning, also known as learning to learn – the idea that you can apply the automation principles to themselves to make the process of gaining insights even faster and more elegant.

Automated ML is the approach and underlying technology of applying certain automation techniques to accelerate the model's development life cycle. Automated ML enables citizen data scientists and domain experts to train ML models, and helps them build optimal solutions to ML problems. It provides a higher level of abstraction for finding out what the best model is, or an ensemble of models suitable for a specific problem. It assists data scientists by automating the mundane and repetitive tasks of feature engineering, including architecture search and hyperparameter optimization. The following diagram represents the ecosystem of automated ML:

Figure 1.2 – Automated ML ecosystem

These three key areas – feature engineering, architecture search, and hyperparameter optimization – hold the most promise for the democratization of AI and ML. Some automated feature engineering techniques that are finding domain-specific usable features in datasets include expand/reduce, hierarchically organizing transformations, meta learning, and reinforcement learning. For architectural search (also known as neural architecture search), evolutionary algorithms, local search, meta learning, reinforcement learning, transfer learning, network morphism, and continuous optimization are employed.

Last, but not least, we have hyperparameter optimization, which is the art and science of finding the right type of parameters outside the model. A variety of techniques are used here, including Bayesian optimization, evolutionary algorithms, Lipchitz functions, local search, meta learning, particle swarm optimization, random search, and transfer learning, to name a few.

In the next section, we will provide a detailed overview of these three key areas of automated ML. You will see some examples of them, alongside code, in the upcoming chapters. Now, let's discuss how automated ML really works in detail by covering feature engineering, architecture search, and hyperparameter optimization.

How automated ML works

ML techniques work great when it comes to finding patterns in large datasets. Today, we use these techniques for anomaly detection, customer segmentation, customer churn analysis, demand forecasting, predictive maintenance, and pricing optimization, among hundreds of other use cases.

A typical ML life cycle is comprised of data collection, data wrangling, pipeline management, model retraining, and model deployment, during which data wrangling is typically the most time-consuming task.

Extracting meaningful features out of data, and then using them to build a model while finding the right algorithm and tuning the parameters, is also a very time-consuming process. Can we automate this process using the very thing we are trying to build here (meta enough?); that is, should we automate ML? Well, that is how this all started – with someone attempting to print a 3D printer using a 3D printer.