33,59 €
This book provides concepts and practical use cases which can be used to quickly build, monitor, and optimize data science projects. Using Comet, you will learn how to manage almost every step of the data science process from data collection through to creating, deploying, and monitoring a machine learning model.
The book starts by explaining the features of Comet, along with exploratory data analysis and model evaluation in Comet. You’ll see how Comet gives you the freedom to choose from a selection of programming languages, depending on which is best suited to your needs. Next, you will focus on workspaces, projects, experiments, and models. You will also learn how to build a narrative from your data, using the features provided by Comet. Later, you will review the basic concepts behind DevOps and how to extend the GitLab DevOps platform with Comet, further enhancing your ability to deploy your data science projects. Finally, you will cover various use cases of Comet in machine learning, NLP, deep learning, and time series analysis, gaining hands-on experience with some of the most interesting and valuable data science techniques available.
By the end of this book, you will be able to confidently build data science pipelines according to bespoke specifications and manage them through Comet.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 415
Veröffentlichungsjahr: 2022
Enhance your ability to manage and optimize the life cycle of your data science project
Angelica Lo Duca
BIRMINGHAM—MUMBAI
Copyright © 2022 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Gebin George
Publishing Product Manager: Dinesh Chaudhary
Senior Editor: David Sugarman
Technical Editor: Devanshi Ayare
Copy Editor: Safis Editing
Project Coordinator: Farheen Fathima
Proofreader: Safis Editing
Indexer: Sejal Dsilva
Production Designer: Shyam Sundar Korumilli
Marketing Coordinators: Shifa Ansari and Abeer Riyaz Dawe
First published: August 2022
Production reference: 1280722
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80181-443-0
www.packt.com
Jesus Christ: yesterday, today, forever.
- Angelica Lo Duca
We founded Comet in 2017 to provide machine learning developers with tools and empower them to create business value with artificial intelligence. Since then, tens of thousands of community practitioners worldwide have adopted Comet to manage and optimize models across the complete MLOps life cycle.
We work with organizations across industries, including Affirm, Ancestry, Etsy, The RealReal, Uber, and Zappos, who use Comet to increase productivity, accelerate model development, and achieve value with AI.
We were thrilled to have Dr. Angelica Lo Duca join our community when she shared her work on Heartbeat, Comet’s editorially independent publication. As she learned more about our MLOps platform, she shared those learnings with readers through clear, concise language and detailed code snippets and model visualizations.
A data journalist, Dr. Lo Duca guides the reader through the machine learning development journey with Comet, from exploratory data analysis through model deployment, showing the process of creating business value. In reading this book, you will come to understand Dr. Lo Duca’s strategic use of Comet and her perspectives on how our platform plays a key role in the modern MLOps stack.
Writing a book is a huge undertaking, and while Dr. Lo Duca wrote and produced this book independently, we have supported her by reviewing the content. We hope it is useful as you explore what is possible when you use Comet to manage and optimize machine learning and deep learning models across the complete development life cycle.
Sign up for your free Comet account and join our Slack community. Simply visit Comet’s website (www.comet.com) to find us. Also, I personally invite you to share your work with our community! We look forward to learning about the amazing discoveries and business value you create with Comet.
Happy reading,
Gideon Mendels CEO and Founder Comet
Angelica Lo Duca is a researcher at the Institute of Informatics and Telematics of the National Research Council, Italy. She is also an external professor of data journalism at the University of Pisa. Her research includes data science, data journalism, and web applications. She used to work on network security, semantic web, linked data, and blockchain. She has published more than 40 scientific papers at national and international conferences and journals and has participated in many international projects and events, including as a member of the Program Committee. She is also part of the editorial team of the HighTech And Innovation Journal. She owns a personal blog, where she publishes articles on her research interests.
First of all, I would like to thank Packt’s fantastic editorial team. Thank you for your support, commitment, competence, and availability throughout this long journey toward the publication of this book.
Secondly, I would like to thank Comet’s wonderful team, who contributed to defining the topics of each chapter of this book. Without their contribution, it would not have been possible to carry out this project. A special thanks to Emilie Lewis and Dhruv Nair from the Comet team for their patience and dedication.
Next, a big thank you to my husband, my father, my sister, Nando, and Silvana, for their patience and for supporting me during the good times and the most difficult ones.
Finally, a big thank you to my children, who pave the way toward a future with hope through their eyes and their smiles.
Devanshu Tayal is a data science enthusiast with experience in travel and banking. With a master’s degree from BITS Pilani, he has also studied mechanical engineering at IK Gujral Punjab Technical University. In his spare time, Devanshu enjoys researching new applications for data science, playing music, and playing badminton. Diversity and inclusion, Python, algorithms, data structures, machine learning, natural language processing, Tableau, Power BI, data visualization, and AI are some of his other interests.
Emil Bogomolov is a machine learning lead at Youpi Inc. He is engaged in creating new ways of collaboration using video. Previously, he was a research engineer in the computer vision group at the Skolkovo Institute of Science and Technology. He is the co-author of papers published at international conferences, such as VISAPP, WACV, and CVPR, and an educational courses author on data analysis at online schools. Emil is also a frequent speaker at technology conferences and author of tech articles on machine learning and AI.
A recent survey of machine learning professionals (https://www.comet.com/site/about-us/news-and-events/press-releases/comet-releases-new-survey-highlighting-ais-latest-challenges-too-much-friction-too-little-ml/) concluded that about 40%–60% of interviewed professionals abandoned their data science projects because they were not able to manage the full life cycle process of their data science projects. I’m a data science researcher, and before encountering Comet, I belonged to that 40%–60% of professionals who abandon their data science projects. In fact, during my working experience, I have abandoned many projects without concluding them because of the nature of research, where you test an idea and, if it does not work, you drop it.
Almost a year ago, I discovered Comet, a platform for model tracking and monitoring, and some wonderful people from its team, who opened my mind to the many features provided by Comet. I began to study it, with the hope of keeping my projects organized and moving them from early stages to production. I realized that I was able to conclude all the projects I implemented in Comet because of the simplicity of the platform.
Comet for Data Science is the result of my studies and tests, as well as the countless biweekly meetings with the Comet team. The book aims at helping you to learn how to manage a data science project workflow, from its early stages up to project deployment and reporting. In a single sentence, Comet for Data Science is written to help you to conclude your data science projects successfully.
By picking this book, you will look at the general concepts of data science from a Comet perspective, with the hope that you will increase your productivity. The book will take you through the journey of building a data science project and integrating it into Comet, including exploratory data analysis, model building and evaluation, report building, and, finally, moving the model to production. Throughout the book, you will implement many practical examples that you can use to better understand the described concepts, as well as starting points for your projects.
I hope that this book will add something to your knowledge, and – why not? – help you to become a better data scientist!
Happy reading!
This book is for data scientists and data analysts who want to learn how to manage and optimize a complete data science project life cycle using Comet and other DevOps platforms. This book is also useful for those who aim at increasing their productivity by means of a practical tool for model tracking and monitoring. Prior programming knowledge of Python is assumed.
Chapter 1, An Overview of Comet, is a general introduction to Comet, an experimentation platform, which allows you to manage and optimize machine learning projects, from their early stages to their final deployment. First, you will learn what Comet is and who its target users are. Then, you will get familiar with the Comet basic concepts, including projects, experiments, workspaces, and panels. Finally, you will build two basic use cases in Comet.
Chapter 2, Exploratory Data Analysis in Comet, guides you to use Comet to perform Exploratory Data Analysis (EDA). First, you will be introduced to the main steps to perform EDA, including problem setting, data preparation, preliminary data analysis, and preliminary results. Then, you will review the main two techniques used to perform EDA: visual and non-visual EDA. Finally, you will learn how to use Comet for EDA through a practical example.
Chapter 3, Model Evaluation in Comet, guides you to use Comet to perform model evaluation. First, you will be introduced to the main concepts to evaluate the performance of a model, such as data splitting, how to choose metrics for evaluation, and the basic concepts behind error analysis. Then, you will see the main model evaluation techniques for different data science tasks (classification, regression, and clustering). Finally, you will learn how to use Comet for model evaluation, through a practical example.
Chapter 4, Workspaces, Projects, Experiments, and Models, deepens some concepts regarding Comet. First, you will see some advanced concepts on workspaces, projects, and experiments, as well as how to perform parameter optimization in Comet. Then, you will learn how to implement a Comet experiment using R or Java as the main programming language. Finally, you will extend the basic examples implemented in Chapter 1, An Overview of Comet.
Chapter 5, Building a Narrative in Comet, describes some strategies to build a good report in Comet. First, you will review the basic concepts and techniques to build a narrative from data, including an overview of the DIKW pyramid, and how to turn your data into a story. Then, you will learn how to build a narrative in Comet through two practical examples.
Chapter 6, Integrating Comet into DevOps, provides you with practical concepts and examples on DevOps and MLOps and how to integrate them into Comet. First, you will review the basic concepts and best practices related to DevOps and MLOps. Then, you will learn how to integrate Comet into the DevOps/MLOps paradigm through the concept of the REST API. Next, you will analyze Docker and Kubernetes, two of the most common frameworks for DevOps. Finally, you will learn how to integrate Comet in Docker and Kubernetes through two practical examples.
Chapter 7, Extending the GitLab DevOps Platform with Comet, describes the concept of Continuous Integration (CI) and Continuous Delivery (CD), how to implement it using GitLab, and how to integrate Comet in a CI/CD workflow. First, you will review the basic concepts of CI/CD, including the CI/CD workflow and the concept of a source control system. Then, you will see the GitLab basic concepts, including an overview of its architecture, how versioning works, and the basic GitLab commands.
Then, you will configure GitLab to work with Comet. Finally, you will see a practical example that will help you to get familiar with the described concepts.
Chapter 8, Comet for Machine Learning, provides you with an overview of the Machine Learning (ML) concepts, with a focus on the scikit-learn library, and how to integrate them in Comet. First, you will review the basic ML concepts, including a classification of the main ML systems, the main ML models, and their main challenges. Then, you will review the scikit-learn package, with a focus on preprocessing, dimensionality reduction, model selection, supervised learning, and unsupervised learning. Finally, you will learn how to integrate Comet with scikit-learn through a practical example.
Chapter 9, Comet for Natural Language Processing, illustrates the main concepts behind Natural Language Processing (NLP), with a focus on the Spark NLP library, and how to integrate the main concepts in Comet. First, you will review the basic NLP workflow and also learn how to classify the main NLP systems and what their main challenges are. Then, you will review the Spark NLP library, including the concepts of annotation and pipeline. Finally, you will learn how to integrate Comet with Spark NLP through a practical example.
Chapter 10, Comet for Deep Learning, describes the main concepts behind Deep Learning (DL), with a focus on the TensorFlow library, and how to integrate them in Comet. First, you will review the basic concepts behind neural networks, their difference with respect to DL networks, and how to classify DL networks. Then, you will review the TensorFlow library, with a focus on how to load a dataset, as well as how to build a train a model. Finally, you will learn how to integrate Comet with TensorFlow through a practical example.
Chapter 11, Comet for Time Series Analysis, reviews the main concepts related to Time Series Analysis (TSA), with a focus on the Prophet library, and how to integrate them into Comet. First, you will review the basic concepts behind TSA, including the concept of stationarity, the time series components, and how to check the presence of breakpoints in a time series. Then, you will be introduced to the Prophet library, with a focus on how to build a prediction model. Finally, you will learn how to integrate Comet with Prophet through a practical example.
You should have a basic knowledge of data science, as well as its general objectives. In addition, you should be familiar with the Python language, and, in particular with the pandas and matplotlib libraries.
You will need a version of Python installed on your computer – Python 3.8, if possible. Almost all code examples have been tested using macOS Monterey 12.0.1, with the exception of software in Chapter 10, Comet for Deep Learning, which has been tested on Google Colab. However, code examples should work with future version releases too.
You should notice that the code examples described in Chapter 4, Workspaces, Projects, Experiments, and Models, require a different version of Java with respect to those described in Chapter 9, Comet for Natural Language Processing.
In addition, to make Comet work, you need to sign up to the Comet platform (https://www.comet.com/signup) and create an account.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Comet-for-Data-Science. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/sJpZu.
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “We note that some columns, such as country and reservation_status_date, have a high cardinality.”
A block of code is set as follows:
import pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom datetime import datetimeAny command-line input or output is written as follows:
pip install pandas-profiling
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Regarding the Variables section, we can distinguish between categorical and numerical data.”
Tips or Important Notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you’ve read Comet for Data Science, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Preface
Preface
This section will introduce the basic concepts behind Comet, including the main Comet components: workspaces, projects, experiments, and panels (Chapter 1, An Overview of Comet). In addition, you will learn how to use Comet to perform exploratory data analysis (Chapter 2, Exploratory Data Analysis in Comet) and model evaluation in Comet (Chapter 3, Model Evaluation in Comet). Throughout this section, we do not cover model building because we will deal with this aspect in a later section, with more details and examples.
To get more familiar with the described concepts, you will be guided to implement some basic use cases, through step-by-step and commented examples in Python.
The main focus of this section is to get you ready to work with all the main features provided by Comet.
This section includes the following chapters:
Chapter 1, An Overview of CometChapter 2, Exploratory Data Analysis in CometChapter 3, Model Evaluation in Comet