29,99 €
Feature store is one of the storage layers in machine learning (ML) operations, where data scientists and ML engineers can store transformed and curated features for ML models. This makes them available for model training, inference (batch and online), and reuse in other ML pipelines. Knowing how to utilize feature stores to their fullest potential can save you a lot of time and effort, and this book will teach you everything you need to know to get started.
Feature Store for Machine Learning is for data scientists who want to learn how to use feature stores to share and reuse each other's work and expertise. You’ll be able to implement practices that help in eliminating reprocessing of data, providing model-reproducible capabilities, and reducing duplication of work, thus improving the time to production of the ML model. While this ML book offers some theoretical groundwork for developers who are just getting to grips with feature stores, there's plenty of practical know-how for those ready to put their knowledge to work. With a hands-on approach to implementation and associated methodologies, you'll get up and running in no time.
By the end of this book, you’ll have understood why feature stores are essential and how to use them in your ML projects, both on your local system and on the cloud.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 273
Veröffentlichungsjahr: 2022
Curate, discover, share and serve ML features at scale
Jayanth Kumar M J
BIRMINGHAM—MUMBAI
Copyright © 2022 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Dhruv Jagdish Kataria
Content Development Editor: Manikandan Kurup
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Project Coordinator: Farheen Fathima
Proofreader: Safis Editing
Indexer: Rekha Nair
Production Designer: Roshan Kawale
Marketing Coordinators: Shifa Ansari and Abeer Riyaz Dawe
First published: June 2022
Production reference: 2200622
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80323-006-1
www.packt.com
To my mother Gayatri, for her dedication and determination in educating us, to my brother Santhosh and his family for being supportive, last but not the least to my wife Deepa for being kind and supportive during no fun weekends for the last six months.
– Jayanth Kumar M J
Jayanth Kumar M J is a lead data engineer at Cimpress USA. He specializes in building platform components for data scientists and data engineers to make MLOps smooth and self-service. He is also a Feast feature store contributor.
I want to thank, the whole team who made this possible, all my colleagues, mentors throughout my career from Sapient to Cimpress and to my friends and family who made life easy and fun when they are around.
Nilan Saha is the chief technology officer at Juna where he builds a telehealth platform with his engineering team enabling people to take control of their sexual health. He has extensive experience building ML-driven engineering products for different companies in the social media, education, and healthcare space. He has a master's degree in data science and is also a Kaggle Kernels and discussion expert.
Data-driven decision-making has been the key to the success of any business, and Machine Learning (ML) plays a key role in achieving that and helping businesses stay ahead of the competition. Though ML helps in unlocking the true potential of a business, there are many obstacles along the way. According to a study, 90 percent of ML models never make it to production. The disconnect between model development and productionization as well as bad or mediocre ML practices are a few of the many reasons for this. This is why there are so many end-to-end ML platforms offering to make ML development easy. One of the primary goals of these platforms is to encourage data scientists/ML engineers to follow Machine Learning Operations (MLOps) standards that help in the faster productionization of a model. In recent years, feature management has been one of the aims of the ML platform – whether it is built in-house or offered as a Platform as a Service (PaaS). A feature store that provides the ability to create, share, and discover curated ML features has become an integral part of most of these ML platforms.
The aim of this book is to discuss the significance of a feature store in ML pipelines. Hence, we will start with an ML problem and try to develop a model without a feature store. We will then discuss what aspects of ML can benefit from a feature store and how a few capabilities of feature stores not only help in creating better ML practices but also help in the faster and more cost-effective development of the model. As we move from why we should use a feature store to the what? and how? aspects of it, we will go through feature engineering, model training, inference, and also productionization of batch and online models with practical examples. In the first and second sections of the book, we will use an open source feature store, Feast. In the last section, we will look for alternatives that are available on the market and also try out an end-to-end use case with a managed feature store.
This book is for data/ML/platform engineers, data scientists, and also data science enthusiasts who want to learn about feature management, how to deploy Feast on the AWS cloud, how to create curated ML features, and how to use and collaborate with other data scientists in model building, using a feature store for batch and online model prediction, as well as moving a model from development to production. This book will be beneficial to ML projects that range from small university projects to enterprise-level ML applications.
Chapter 1, An Overview of the Machine Learning Life Cycle, starts with a small introduction to ML and then dives deep into an ML use case – a customer lifetime value model. The chapter runs through the different stages of ML development, and finally, it discusses the most time-consuming parts of ML and also what an ideal world and the real world look like in ML development.
Chapter 2, What Problems Do Feature Stores Solve?, introduces us to the main focus of the book, which is feature management and feature stores. It discusses the importance of features in production systems, different ways to bring features into production, and common issues with these approaches, followed by how a feature store can overcome these common issues.
Chapter 3, Feature Store Fundamentals, Terminology, and Usage, starts with an introduction to an open source feature store – Feast – followed by installation, different terminology used in the feature store world, and basic API usage. Finally, it briefly introduces different components that work together in Feast.
Chapter 4, Adding Feature Store to ML Models, will help readers install Feast on AWS as it goes through the different resource creations, such as S3 buckets, a Redshift cluster, and the Glue catalog, step by step with screenshots. Finally, it revisits the feature engineering aspect of the customer lifetime value model developed in Chapter 1, An Overview of the Machine Learning Life Cycle, and creates and ingests the curated features into Feast.
Chapter 5, Model Training and Inference, continues from where we left in Chapter 4, Adding Feature Store to ML Models, and discusses how a feature store can help data scientists and ML engineers collaborate in the development of an ML model. It discusses how to use Feast for batch model inference and also how to build a REST API for online model inference.
Chapter 6, Model to Production and Beyond, discusses the creation of an orchestration environment using Amazon Managed Workflows for Apache Airflow (MWAA), uses the feature engineering, model training, and inference code/notebooks built in the previous chapters, and deploys the batch and online model pipelines into production. Finally, it discusses aspects beyond production, such as feature monitoring, changes to feature definitions, and also building the next ML model.
Chapter 7, Feast Alternatives and ML Best Practices, introduces other feature stores, such as Tecton, Databricks Feature Store, Google Cloud's Vertex AI, Hopsworks Feature Store, and Amazon SageMaker Feature Store. It also introduces the basic usage of the latter so that users can get the gist of what is it like to use a managed feature store. Finally, it briefly discusses the ML best practices.
Chapter 8, Use Case – Customer Churn Prediction, uses a managed feature store offering of Amazon SageMaker and runs through an end-to-end use case to predict customer churn on a telecom dataset. It also covers examples of feature drift monitoring and model performance monitoring.
The book uses AWS services for Feast deployment, pipeline orchestration, and a couple of SageMaker offerings. If you create a new AWS account, all the services used are under free-tier or featured products, except the Managed Workflows for Apache Airflow (MWAA) environment. However, we have listed alternatives for Airflow installation that can be used for running the examples.
All the examples are run using Python 3.7 – feast==0.19.3. The appropriate library versions are also mentioned in the notebooks wherever necessary. To run through the examples, all you need is a Jupyter notebook environment (local, Google Colab, SageMaker, or another of your choice) and the mentioned AWS resources and permissions for each chapter or section.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
To get the most out of the book, you should have Python programming experience, and a basic understanding of notebooks, Python environments, and ML and Python ML libraries, such as XGBoost and scikit-learn.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Feature-Store-for-Machine-Learning. If there's an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781803230061_ColorImages.pdf.
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The preceding code block scales the numerical columns: tenure, MonthlyCharges, and TotalCharges."
A block of code is set as follows:
le = LabelEncoder()for i in bin_cols: churn_data[i] = le.fit_transform(churn_data[i])When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
project: customer_segmentationregistry: data/registry.dbprovider: awsonline_store: type: dynamodb region: us-east-1Any command-line input or output is written as follows:
$ docker build -t customer-segmentation .
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "On the cluster home page, select the Properties tab and scroll down to Associated IAM roles."
Tips or Important Notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you've read Feature Store for Machine Learning, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.
Section 1 – Why Do We Need a Feature Store?
This section concentrates on the significance of a feature store (why?) in Machine Learning (ML) pipelines. We will start with an ML problem and go through the different stages of ML development, such as data exploration, feature engineering, model training, and inference. We will discuss how the availability of features in production affects model performance and start looking into ways in which features are brought to production and the common problems with them. At the end of the section, we will introduce a feature store in the ML pipeline and look at how it resolves the common problems that other alternatives struggle with.
This section comprises the following chapters:
Chapter 1, AnOverview of the ML Life CycleChapter 2, What Problems Do Feature Stores Solve?