29,99 €
Streamline data preprocessing and feature engineering in your machine learning project with this third edition of the Python Feature Engineering Cookbook to make your data preparation more efficient.
This guide addresses common challenges, such as imputing missing values and encoding categorical variables using practical solutions and open source Python libraries.
You’ll learn advanced techniques for transforming numerical variables, discretizing variables, and dealing with outliers. Each chapter offers step-by-step instructions and real-world examples, helping you understand when and how to apply various transformations for well-prepared data.
The book explores feature extraction from complex data types such as dates, times, and text. You’ll see how to create new features through mathematical operations and decision trees and use advanced tools like Featuretools and tsfresh to extract features from relational data and time series.
By the end, you’ll be ready to build reproducible feature engineering pipelines that can be easily deployed into production, optimizing data preprocessing workflows and enhancing machine learning model performance.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 435
Veröffentlichungsjahr: 2024
Python Feature Engineering Cookbook
A complete guide to crafting powerful features for your machine learning models
Soledad Galli
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
The author acknowledges the use of cutting-edge AI, such as ChatGPT, with the sole aim of enhancing the language and clarity within the book, thereby ensuring a smooth reading experience for readers. It’s important to note that the content itself has been crafted by the author and edited by a professional publishing team.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Associate Group Product Manager: Niranjan Naikwadi
Publishing Product Manager: Nitin Nainani
Book Project Manager: Hemangi Lotlikar
Senior Editor: Tiksha Abhimanyu Lad
Technical Editor: Sweety Pagaria
Copy Editor: Safis Editing
Proofreader: Tiksha Abhimanyu Lad
Indexer: Manju Arasan
Production Designer: Joshua Misquitta and Alishon Mendonca
Senior DevRel Marketing Executive: Vinishka Kalra
First published: January 2020
Second edition: October 2022
Third edition: August 2024
Production reference: 1260724
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-83588-358-7
www.packtpub.com
This book would not have been possible without the dedicated efforts of those who contribute to the Python open source ecosystem for data science and machine learning. We often overlook the fact that these contributors are real people with families, jobs, and hobbies, who generously allocate their time to develop these essential tools. I am deeply grateful to the developers of scikit-learn and pandas, pivotal libraries for data analysis and processing, as well as the maintainers of tsfresh and category encoders. A special acknowledgment goes to Nathan Parsons, current maintainer of Featuretools, for his invaluable support in crafting Chapter 8 of this book.
I am grateful to my editor, Tiksha Abhimanyu Lad, and her team for their invaluable support in bringing this book to fruition. Special thanks to our technical reviewer, Hector Patiño, for meticulously reviewing the code and recipes, ensuring smooth execution, and providing valuable resources to our readers.
A heartfelt thank you to my friend Chris Samiullah for his invaluable support in my growth as a software developer.
Finally, I am grateful to the users and contributors of Feature-engine for their unwavering support, feedback, and engagement, which have been instrumental in shaping the functionality of the library. Lastly, I owe a debt of gratitude to my students, whose feedback and encouragement have helped me become a better instructor and writer.
Thank you all for your invaluable contributions to this endeavor.
– Soledad Galli
From convolutional neural networks to XGBoost, when it comes to machine learning, it’s easy to focus too much on the algorithms. But as the saying goes, “Garbage in, garbage out.” The quality of the features can be more important than the machine learning algorithm itself. Despite advances in feature learning, such as embedding in neural networks, feature engineering remains as important as ever. Particularly when dealing with categorical, numerical, and time-series features, feature engineering is a critical skill. With the right features, you can greatly improve model performance and ensure that models are more interpretable and robust.
Sole is a remarkable data science and machine learning educator. She has taught tens of thousands of students through her online courses on topics ranging from machine learning interpretability to hyperparameter optimization. It’s fantastic that she has taken on this timeless topic of feature engineering. Her approach is direct, pragmatic, and practical. As the author of the popular Feature-engine, a Python library for feature engineering, and a respected machine learning educator, Sole is uniquely qualified to cover this topic.
The third edition of this book, which you have in your hands now, provides updated guidelines for selecting methods based on the data and the model. It also covers the integration of scikit-learn with pandas through the recently released set_output API. Finally, it covers automating feature creation using decision trees.
Whether you are a beginner or an experienced practitioner, this book will provide you with practical insights, lots of code examples, and various techniques to improve your machine learning models through effective feature engineering.
Christoph Molnar
Author of Interpretable Machine Learning and Modeling Mindsets
Soledad Galli is a bestselling data science instructor, book author, and open source Python developer. As the leading instructor at Train in Data, Sole teaches intermediate and advanced courses in machine learning that have enrolled 64k+ students worldwide and continue to receive positive reviews. Sole is also the developer and maintainer of the Python open source library Feature-engine, which offers an extensive array of methods for feature engineering and selection.
Sole worked as a data scientist in finance and insurance companies, where she developed and put into production machine learning models to assess insurance claims and credit risk and prevent fraud.
Sole has been selected multiple times as a LinkedIn voice in data science. She is passionate about sharing her knowledge and experience, and that is why you’ll often hear her talking at meetups, podcasts, or authoring articles online.
Sole is constantly looking for people like you, who can support her in enhancing the functionality of Feature-engine or delivering more and better courses, so if you are interested, contact her over social media or at her Train in Data website.
Hector Patiño Rivera has been involved with machine learning for geosciences since 2015, especially for subjects related to satellite imagery. He has a strong knowledge of Python and SQL and is a proficient developer of PostgresQLS, ArcGIS, QGIS, and more GIS-related software. He is an experienced Django developer. When Hector is not programming, he loves playing tennis and hanging out with his friends.