Feature Store for Machine Learning - Jayanth Kumar M J - E-Book

Feature Store for Machine Learning E-Book

Jayanth Kumar M J

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Feature store is one of the storage layers in machine learning (ML) operations, where data scientists and ML engineers can store transformed and curated features for ML models. This makes them available for model training, inference (batch and online), and reuse in other ML pipelines. Knowing how to utilize feature stores to their fullest potential can save you a lot of time and effort, and this book will teach you everything you need to know to get started.
Feature Store for Machine Learning is for data scientists who want to learn how to use feature stores to share and reuse each other's work and expertise. You’ll be able to implement practices that help in eliminating reprocessing of data, providing model-reproducible capabilities, and reducing duplication of work, thus improving the time to production of the ML model. While this ML book offers some theoretical groundwork for developers who are just getting to grips with feature stores, there's plenty of practical know-how for those ready to put their knowledge to work. With a hands-on approach to implementation and associated methodologies, you'll get up and running in no time.
By the end of this book, you’ll have understood why feature stores are essential and how to use them in your ML projects, both on your local system and on the cloud.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 273

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Feature Store for Machine Learning

Curate, discover, share and serve ML features at scale

Jayanth Kumar M J

BIRMINGHAM—MUMBAI

Feature Store for Machine Learning

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Dhruv Jagdish Kataria

Content Development Editor: Manikandan Kurup

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Rekha Nair

Production Designer: Roshan Kawale

Marketing Coordinators: Shifa Ansari and Abeer Riyaz Dawe

First published: June 2022

Production reference: 2200622

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80323-006-1

www.packt.com

To my mother Gayatri, for her dedication and determination in educating us, to my brother Santhosh and his family for being supportive, last but not the least to my wife Deepa for being kind and supportive during no fun weekends for the last six months.

– Jayanth Kumar M J

Contributors

About the author

Jayanth Kumar M J is a lead data engineer at Cimpress USA. He specializes in building platform components for data scientists and data engineers to make MLOps smooth and self-service. He is also a Feast feature store contributor.

I want to thank, the whole team who made this possible, all my colleagues, mentors throughout my career from Sapient to Cimpress and to my friends and family who made life easy and fun when they are around.

About the reviewer

Nilan Saha is the chief technology officer at Juna where he builds a telehealth platform with his engineering team enabling people to take control of their sexual health. He has extensive experience building ML-driven engineering products for different companies in the social media, education, and healthcare space. He has a master's degree in data science and is also a Kaggle Kernels and discussion expert.

Table of Contents

Feature Store for Machine Learning

Contributors

About the author

About the reviewer

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Section 1 – Why Do We Need a Feature Store?

Chapter 1: An Overview of the Machine Learning Life Cycle

Technical requirements

The ML life cycle in practice

Problem statement (plan and create)

Data (preparation and cleaning)

Model

Package, release, and monitor

An ideal world versus the real world

Reusability and sharing

Everything in a notebook

The most time-consuming stages of ML

Figuring out the dataset

Data exploration and feature engineering

Modeling to production and monitoring

Summary

Chapter 2: What Problems Do Feature Stores Solve?

Importance of features in production

Ways to bring features to production

Batch model pipeline

Online model pipeline

Common problems with the approaches used for bringing features to production

Re-inventing the wheel

Feature re-calculation

Feature discoverability and sharing

Training vs Serving skew

Model reproducibility

Low latency

Feature stores to the rescue

Standardizing ML with a feature store

Feature store avoids reprocessing data

Features are discoverable and sharable with the feature store

Serving features at low latency with feature stores

Philosophy behind feature stores

Summary

Further reading

Section 2 – A Feature Store in Action

Chapter 3: Feature Store Fundamentals, Terminology, and Usage

Technical requirements

Introduction to Feast and installation

Feast terminology and definitions

Feast initialization

Feast usage

Register feature definitions

Browsing the feature store

Adding an entity and FeatureView

Generate training data

Load features to the online store

Feast behind the scenes

Data flow in Feast

Summary

Further reading

Chapter 4: Adding Feature Store to ML Models

Technical requirements

Creating Feast resources in AWS

Amazon S3 for storing data

AWS Redshift for an offline store

Creating an IAM user to access the resources

Feast initialization for AWS

Exploring the ML life cycle with Feast

Problem statement (plan and create)

Data (preparation and cleaning)

Model (feature engineering)

Summary

References

Chapter 5: Model Training and Inference

Prerequisites

Technical requirements

Model training with the feature store

Dee's model training experiments

Ram's model training experiments

Model packaging

Batch model inference with Feast

Online model inference with Feast

Syncing the latest features from the offline to the online store

Packaging the online model as a REST endpoint with Feast code

Handling changes to the feature set during development

Step 1 – Change feature definitions

Step 2 – Add/update schema in the Glue/Lake Formation console

Step 3 – Update notebooks with the changes

Summary

Further reading

Chapter 6: Model to Production and Beyond

Technical requirements

Setting up Airflow for orchestration

S3 bucket for Airflow metadata

Amazon MWAA environment for orchestration

Productionizing the batch model pipeline

Productionizing an online model pipeline

Orchestration of a feature engineering job

Deploying the model as a SageMaker endpoint

Beyond model production

Feature drift monitoring and model retraining

Model reproducibility and prediction issues

A headstart for the next model

Changes to feature definition after production

Summary

Section 3 – Alternatives, Best Practices, and a Use Case

Chapter 7: Feast Alternatives and ML Best Practices

Technical requirements

The available feature stores on the market

The Tecton Feature Store

Databricks Feature Store

Google's Vertex AI Feature Store

The Hopsworks Feature Store

SageMaker Feature Store

Feature management with SageMaker Feature Store

Resources to use SageMaker

Generating features

Defining the feature group

Feature ingestion

Getting records from an online store

Querying historical data with Amazon Athena

Cleaning up a SageMaker feature group

ML best practices

Data validation at source

Breaking down ML pipeline and orchestration

Tracking data lineage and versioning

The feature repository

Experiment tracking, model versioning, and the model repository

Feature and model monitoring

Miscellaneous

Summary

Chapter 8: Use Case – Customer Churn Prediction

Technical requirements

Infrastructure setup

Introduction to the problem and the dataset

Data processing and feature engineering

Feature group definitions and feature ingestion

Model training

Model prediction

Feature monitoring

Model monitoring

Summary

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Preface

Data-driven decision-making has been the key to the success of any business, and Machine Learning (ML) plays a key role in achieving that and helping businesses stay ahead of the competition. Though ML helps in unlocking the true potential of a business, there are many obstacles along the way. According to a study, 90 percent of ML models never make it to production. The disconnect between model development and productionization as well as bad or mediocre ML practices are a few of the many reasons for this. This is why there are so many end-to-end ML platforms offering to make ML development easy. One of the primary goals of these platforms is to encourage data scientists/ML engineers to follow Machine Learning Operations (MLOps) standards that help in the faster productionization of a model. In recent years, feature management has been one of the aims of the ML platform – whether it is built in-house or offered as a Platform as a Service (PaaS). A feature store that provides the ability to create, share, and discover curated ML features has become an integral part of most of these ML platforms.

The aim of this book is to discuss the significance of a feature store in ML pipelines. Hence, we will start with an ML problem and try to develop a model without a feature store. We will then discuss what aspects of ML can benefit from a feature store and how a few capabilities of feature stores not only help in creating better ML practices but also help in the faster and more cost-effective development of the model. As we move from why we should use a feature store to the what? and how? aspects of it, we will go through feature engineering, model training, inference, and also productionization of batch and online models with practical examples. In the first and second sections of the book, we will use an open source feature store, Feast. In the last section, we will look for alternatives that are available on the market and also try out an end-to-end use case with a managed feature store.

Who this book is for

This book is for data/ML/platform engineers, data scientists, and also data science enthusiasts who want to learn about feature management, how to deploy Feast on the AWS cloud, how to create curated ML features, and how to use and collaborate with other data scientists in model building, using a feature store for batch and online model prediction, as well as moving a model from development to production. This book will be beneficial to ML projects that range from small university projects to enterprise-level ML applications.

What this book covers

Chapter 1, An Overview of the Machine Learning Life Cycle, starts with a small introduction to ML and then dives deep into an ML use case – a customer lifetime value model. The chapter runs through the different stages of ML development, and finally, it discusses the most time-consuming parts of ML and also what an ideal world and the real world look like in ML development.

Chapter 2, What Problems Do Feature Stores Solve?, introduces us to the main focus of the book, which is feature management and feature stores. It discusses the importance of features in production systems, different ways to bring features into production, and common issues with these approaches, followed by how a feature store can overcome these common issues.

Chapter 3, Feature Store Fundamentals, Terminology, and Usage, starts with an introduction to an open source feature store – Feast – followed by installation, different terminology used in the feature store world, and basic API usage. Finally, it briefly introduces different components that work together in Feast.

Chapter 4, Adding Feature Store to ML Models, will help readers install Feast on AWS as it goes through the different resource creations, such as S3 buckets, a Redshift cluster, and the Glue catalog, step by step with screenshots. Finally, it revisits the feature engineering aspect of the customer lifetime value model developed in Chapter 1, An Overview of the Machine Learning Life Cycle, and creates and ingests the curated features into Feast.

Chapter 5, Model Training and Inference, continues from where we left in Chapter 4, Adding Feature Store to ML Models, and discusses how a feature store can help data scientists and ML engineers collaborate in the development of an ML model. It discusses how to use Feast for batch model inference and also how to build a REST API for online model inference.

Chapter 6, Model to Production and Beyond, discusses the creation of an orchestration environment using Amazon Managed Workflows for Apache Airflow (MWAA), uses the feature engineering, model training, and inference code/notebooks built in the previous chapters, and deploys the batch and online model pipelines into production. Finally, it discusses aspects beyond production, such as feature monitoring, changes to feature definitions, and also building the next ML model.

Chapter 7, Feast Alternatives and ML Best Practices, introduces other feature stores, such as Tecton, Databricks Feature Store, Google Cloud's Vertex AI, Hopsworks Feature Store, and Amazon SageMaker Feature Store. It also introduces the basic usage of the latter so that users can get the gist of what is it like to use a managed feature store. Finally, it briefly discusses the ML best practices.

Chapter 8, Use Case – Customer Churn Prediction, uses a managed feature store offering of Amazon SageMaker and runs through an end-to-end use case to predict customer churn on a telecom dataset. It also covers examples of feature drift monitoring and model performance monitoring.

To get the most out of this book

The book uses AWS services for Feast deployment, pipeline orchestration, and a couple of SageMaker offerings. If you create a new AWS account, all the services used are under free-tier or featured products, except the Managed Workflows for Apache Airflow (MWAA) environment. However, we have listed alternatives for Airflow installation that can be used for running the examples.

All the examples are run using Python 3.7 – feast==0.19.3. The appropriate library versions are also mentioned in the notebooks wherever necessary. To run through the examples, all you need is a Jupyter notebook environment (local, Google Colab, SageMaker, or another of your choice) and the mentioned AWS resources and permissions for each chapter or section.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

To get the most out of the book, you should have Python programming experience, and a basic understanding of notebooks, Python environments, and ML and Python ML libraries, such as XGBoost and scikit-learn.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Feature-Store-for-Machine-Learning. If there's an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781803230061_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The preceding code block scales the numerical columns: tenure, MonthlyCharges, and TotalCharges."

A block of code is set as follows:

le = LabelEncoder()for i in bin_cols:    churn_data[i] = le.fit_transform(churn_data[i])

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

project: customer_segmentationregistry: data/registry.dbprovider: awsonline_store:  type: dynamodb  region: us-east-1

Any command-line input or output is written as follows:

$ docker build -t customer-segmentation .

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "On the cluster home page, select the Properties tab and scroll down to Associated IAM roles."

Tips or Important Notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you've read Feature Store for Machine Learning, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

Section 1 – Why Do We Need a Feature Store?

This section concentrates on the significance of a feature store (why?) in Machine Learning (ML) pipelines. We will start with an ML problem and go through the different stages of ML development, such as data exploration, feature engineering, model training, and inference. We will discuss how the availability of features in production affects model performance and start looking into ways in which features are brought to production and the common problems with them. At the end of the section, we will introduce a feature store in the ML pipeline and look at how it resolves the common problems that other alternatives struggle with.

This section comprises the following chapters:

Chapter 1, AnOverview of the ML Life CycleChapter 2, What Problems Do Feature Stores Solve?