29,99 €
Data labeling is the invisible hand that guides the power of artificial intelligence and machine learning. In today’s data-driven world, mastering data labeling is not just an advantage, it’s a necessity. Data Labeling in Machine Learning with Python empowers you to unearth value from raw data, create intelligent systems, and influence the course of technological evolution.
With this book, you'll discover the art of employing summary statistics, weak supervision, programmatic rules, and heuristics to assign labels to unlabeled training data programmatically. As you progress, you'll be able to enhance your datasets by mastering the intricacies of semi-supervised learning and data augmentation. Venturing further into the data landscape, you'll immerse yourself in the annotation of image, video, and audio data, harnessing the power of Python libraries such as seaborn, matplotlib, cv2, librosa, openai, and langchain. With hands-on guidance and practical examples, you'll gain proficiency in annotating diverse data types effectively.
By the end of this book, you’ll have the practical expertise to programmatically label diverse data types and enhance datasets, unlocking the full potential of your data.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 454
Veröffentlichungsjahr: 2024
Data Labeling in Machine Learning with Python
Explore modern ways to prepare labeled data for training and fine-tuning ML and generative AI models
Vijaya Kumar Suda
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Niranjan Naikwadi
Publishing Product Manager: Sanjana Gupta
Book Project Manager: Hemangi Lotlikar
Content Development Editor: Shreya Moharir
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Proofreader: Safis Editing
Indexer: Tejal Soni
Production Designer: Joshua Misquitta
DevRel Marketing Coordinator: Vinishka Kalra
First published: January 2024
Production reference: 1300124
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-80461-054-1
www.packtpub.com
I extend my heartfelt gratitude to my mother, Rajya Lakshmi Suda, and dedicate this work to the cherished memory of my father, Koteswara Rao Suda. Their sacrifices and unwavering determination have been a profound source of inspiration.
Special thanks to my wife, Radhika, for her enduring support and patience throughout the writing of this book.
To my son, Chandra Suda (Rise Global Winner 2023), and daughter, Akshaya, your talents and creativity have shown me the beautiful evolution of skill.
I am deeply appreciative of my siblings, Rama Devi, Swarna Kumar, and Dr. Sri Kumar, for their continuous support.
A sincere acknowledgment to my mentors and managers, Kevin Fleck and Des Quinta, for their invaluable support and motivation throughout the writing process of this book.
Finally, I want to thank the Packt Publishing team, especially Shreya and Hemangi, for their fantastic support, which made the writing process an absolute pleasure.
Vijaya Kumar Suda is a seasoned data and AI professional, boasting over two decades of expertise collaborating with global clients. Having resided and worked in diverse locations such as Switzerland, Belgium, Mexico, Bahrain, India, Canada, and the USA, Vijaya has successfully assisted customers spanning various industries. Currently serving as a senior data and AI consultant at Microsoft, he is instrumental in guiding industry partners through their digital transformation endeavors using cutting-edge cloud technologies and AI capabilities. His proficiency encompasses architecture, data engineering, machine learning, generative AI, and cloud solutions. Vijaya also shares his insights through engaging videos on the cloud, data, and AI on his YouTube channel, Cloud & Data Science(https://youtu.be/piVqFcuBV2c).
Pritesh Kanani is a full stack developer with experience in data wrangling and supervised machine learning. He helped a major oil and gas company with building a tool to monitor drilling operations and handling thousands of high frequency data streams. He completed a post-graduation course in applied AI and is currently utilizing his full stack data science and cloud computing skills at a leading nuclear and renewable energy organization in Ontario, Canada.
Sourav Roy is a passionate data enthusiast, an experienced machine learning practitioner, and an expert book reviewer with a focus on literature linked to data. He possesses a diverse skill set in data engineering and data analytics, which allows him to combine technical proficiency with a deep passion in his work on data-centric books. Sourav obtained a master’s degree in data science and analytics from Queen’s University. He is presently employed as a data engineer in the banking sector.
Mitesh Mangaonkar is an engineering leader pioneering generative AI to transform data platforms. As a tech lead at Airbnb, he builds cutting-edge data pipelines leveraging big technologies and modern data stacks to power trust and safety products. Previously, at AWS, Mitesh helped Fortune 500 companies migrate their data warehouses to the cloud and engineered highly scalable, resilient systems. An innovator at heart, he combines deep data engineering expertise with a passion for AI to create the next generation of data products. Mitesh is an influential voice shaping the future of data engineering and governance.
This part of the book will guide you in exploring tabular data and programmatically labeling the data using Python libraries, such as Snorkel labeling functions. You will be able to achieve this without requiring any prior data science knowledge. Additionally, it covers data labeling using K-means clustering.
This part comprises the following chapters:
Chapter 1, Exploring Data for Machine LearningChapter 2, Labeling Data for ClassificationChapter 3, Labeling Data for Regression