Comet for Data Science - Angelica Lo Duca - E-Book

Comet for Data Science E-Book

Angelica Lo Duca

0,0
33,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

This book provides concepts and practical use cases which can be used to quickly build, monitor, and optimize data science projects. Using Comet, you will learn how to manage almost every step of the data science process from data collection through to creating, deploying, and monitoring a machine learning model.
The book starts by explaining the features of Comet, along with exploratory data analysis and model evaluation in Comet. You’ll see how Comet gives you the freedom to choose from a selection of programming languages, depending on which is best suited to your needs. Next, you will focus on workspaces, projects, experiments, and models. You will also learn how to build a narrative from your data, using the features provided by Comet. Later, you will review the basic concepts behind DevOps and how to extend the GitLab DevOps platform with Comet, further enhancing your ability to deploy your data science projects. Finally, you will cover various use cases of Comet in machine learning, NLP, deep learning, and time series analysis, gaining hands-on experience with some of the most interesting and valuable data science techniques available.
By the end of this book, you will be able to confidently build data science pipelines according to bespoke specifications and manage them through Comet.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 415

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Comet for Data Science

Enhance your ability to manage and optimize the life cycle of your data science project

Angelica Lo Duca

BIRMINGHAM—MUMBAI

Comet for Data Science

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Gebin George

Publishing Product Manager: Dinesh Chaudhary

Senior Editor: David Sugarman

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Sejal Dsilva

Production Designer: Shyam Sundar Korumilli

Marketing Coordinators: Shifa Ansari and Abeer Riyaz Dawe

First published: August 2022

Production reference: 1280722

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80181-443-0

www.packt.com

Jesus Christ: yesterday, today, forever.

- Angelica Lo Duca

Foreword

We founded Comet in 2017 to provide machine learning developers with tools and empower them to create business value with artificial intelligence. Since then, tens of thousands of community practitioners worldwide have adopted Comet to manage and optimize models across the complete MLOps life cycle.

We work with organizations across industries, including Affirm, Ancestry, Etsy, The RealReal, Uber, and Zappos, who use Comet to increase productivity, accelerate model development, and achieve value with AI.

We were thrilled to have Dr. Angelica Lo Duca join our community when she shared her work on Heartbeat, Comet’s editorially independent publication. As she learned more about our MLOps platform, she shared those learnings with readers through clear, concise language and detailed code snippets and model visualizations.

A data journalist, Dr. Lo Duca guides the reader through the machine learning development journey with Comet, from exploratory data analysis through model deployment, showing the process of creating business value. In reading this book, you will come to understand Dr. Lo Duca’s strategic use of Comet and her perspectives on how our platform plays a key role in the modern MLOps stack.

Writing a book is a huge undertaking, and while Dr. Lo Duca wrote and produced this book independently, we have supported her by reviewing the content. We hope it is useful as you explore what is possible when you use Comet to manage and optimize machine learning and deep learning models across the complete development life cycle.

Sign up for your free Comet account and join our Slack community. Simply visit Comet’s website (www.comet.com) to find us. Also, I personally invite you to share your work with our community! We look forward to learning about the amazing discoveries and business value you create with Comet.

Happy reading,

Gideon Mendels CEO and Founder Comet

Contributors

About the author

Angelica Lo Duca is a researcher at the Institute of Informatics and Telematics of the National Research Council, Italy. She is also an external professor of data journalism at the University of Pisa. Her research includes data science, data journalism, and web applications. She used to work on network security, semantic web, linked data, and blockchain. She has published more than 40 scientific papers at national and international conferences and journals and has participated in many international projects and events, including as a member of the Program Committee. She is also part of the editorial team of the HighTech And Innovation Journal. She owns a personal blog, where she publishes articles on her research interests.

First of all, I would like to thank Packt’s fantastic editorial team. Thank you for your support, commitment, competence, and availability throughout this long journey toward the publication of this book.

Secondly, I would like to thank Comet’s wonderful team, who contributed to defining the topics of each chapter of this book. Without their contribution, it would not have been possible to carry out this project. A special thanks to Emilie Lewis and Dhruv Nair from the Comet team for their patience and dedication.

Next, a big thank you to my husband, my father, my sister, Nando, and Silvana, for their patience and for supporting me during the good times and the most difficult ones.

Finally, a big thank you to my children, who pave the way toward a future with hope through their eyes and their smiles.

About the reviewers

Devanshu Tayal is a data science enthusiast with experience in travel and banking. With a master’s degree from BITS Pilani, he has also studied mechanical engineering at IK Gujral Punjab Technical University. In his spare time, Devanshu enjoys researching new applications for data science, playing music, and playing badminton. Diversity and inclusion, Python, algorithms, data structures, machine learning, natural language processing, Tableau, Power BI, data visualization, and AI are some of his other interests.

Emil Bogomolov is a machine learning lead at Youpi Inc. He is engaged in creating new ways of collaboration using video. Previously, he was a research engineer in the computer vision group at the Skolkovo Institute of Science and Technology. He is the co-author of papers published at international conferences, such as VISAPP, WACV, and CVPR, and an educational courses author on data analysis at online schools. Emil is also a frequent speaker at technology conferences and author of tech articles on machine learning and AI.

Table of Contents

Preface

Section 1 – Getting Started with Comet

Chapter 1: An Overview of Comet

Technical requirements

comet-ml

matplotlib

numpy

pandas

scikit-learn

Motivation, purpose, and first access to the Comet platform

Motivation

Purpose

First access to the Comet platform

Getting started with workspaces, projects, experiments, and panels

Workspaces

Projects

Experiments

Panels

First use case – tracking images in Comet

Downloading the dataset

Dataset cleaning

Building the visualizations

Integrating the graphs in Comet

Building a panel

Second use case – simple linear regression

Initializing the context

Defining, fitting, and evaluating the model

Showing results in Comet

Summary

Further reading

Chapter 2: Exploratory Data Analysis in Comet

Technical requirements

pandas Profiling

seaborn

sweetviz

Introducing EDA

Problem setting

Data preparation

Preliminary data analysis

Preliminary results

Exploring EDA techniques

Loading and preparing the dataset

Non-visual EDA

Visual EDA

Using Comet for EDA

Comet logs

Panels

Comet Report

Summary

Further reading

Chapter 3: Model Evaluation in Comet

Technical requirements

Introducing model evaluation

Data splitting

Choosing metrics

Exploring model evaluation techniques

Loading and preparing the dataset

Regression

Classification

Clustering

Using Comet for model evaluation

Comet Log

Comet Dashboard

Registry

Reports

Summary

Further reading

Section 2 – A Deep Dive into Comet

Chapter 4: Workspaces, Projects, Experiments, and Models

Technical requirements

Python

R

Java

Exploring the Comet UI

Workspaces

Projects

Using experiments and models

Experiments

Models

Exploring other languages supported by Comet

R

Java

First use case – offline and existing experiments

Running an offline experiment

Continuing an existing experiment

Improving an existing experiment offline

Second use case – model optimization

Creating and configuring an Optimizer

Optimizing the model

Showing the results in Comet

Summary

Further reading

Chapter 5: Building a Narrative in Comet

Technical requirements

Discovering the DIKW pyramid

Data

Information

Knowledge

Wisdom

Moving from data to wisdom

Turning data into information

Turning information into knowledge

Turning knowledge into wisdom

Choosing the correct chart type

A line chart

A bar chart

An area chart

A pie chart

Using Comet to build a narrative

Using JavaScript panels

Building advanced reports

Summary

Further reading

Chapter 6: Integrating Comet into DevOps

Technical requirements

Python

Docker

Kubernetes

Exploring DevOps and MLOps principles and best practices

The DevOps life cycle

Moving from DevOps to MLOps

Combining Comet and DevOps/MLOps

Comet in the DevOps life cycle

Setting up the Comet REST API service

Using the Comet REST API

Implementing Docker

Overview of Docker

Running Comet in Docker container

Implementing Kubernetes

The Kubernetes architecture

Configuring Kubernetes

Deploying a local Kubernetes cluster

Summary

Further reading

Chapter 7: Extending the GitLab DevOps Platform with Comet

Technical requirements

Python

Git client

Introducing the concept of CI/CD

An overview of CI/CD

The concept of an SCS

The CI/CD workflow

Implementing the CI/CD workflow in GitLab

Creating/modifying a GitLab project

Exploring GitLab's internal structure

Exploring GitLab concepts for CI/CD

Building the CI/CD pipeline

Creating a release

Integrating Comet with GitLab

Running Comet in the CI/CD workflow

Using webhooks

Integrating Docker with the CI/CD workflow

Summary

Further reading

Section 3 – Examples and Use Cases

Chapter 8: Comet for Machine Learning

Technical requirements

shap

Introducing machine learning

Exploring the machine learning workflow

Classifying machine learning systems

Exploring machine learning challenges

Explaining machine learning models

Reviewing the main machine learning models

Supervised learning

Unsupervised learning

Reviewing the scikit-learn package

Preprocessing

Dimensionality reduction

Model selection

Supervised and unsupervised learning

Building a machine learning project from setup to report

Reviewing the scenario

Selecting the best model

Calculating the SHAP value

Building the final report

Summary

Further reading

Chapter 9: Comet for Natural Language Processing

Technical requirements

Introducing basic NLP concepts

Exploring the NLP workflow

Classifying NLP systems

Exploring NLP challenges

Reviewing the most popular models’ hubs

Exploring the Spark NLP package

Introducing the Spark NLP package

Integrating Spark NLP with Comet

Setting up the environment for Spark NLP

Installing Java

Installing Scala (optional)

Installing Apache Spark

Installing PySpark and Spark NLP

Using NLP, from project setup to report building

Configuring the environment

Loading the dataset

Implementing a pretrained pipeline

Logging results in Comet

Using a custom pipeline

Building the final report

Summary

Further reading

Chapter 10: Comet for Deep Learning

Technical requirements

gradio

tensorFlow

Introducing basic deep learning concepts

Introducing neural networks

Exploring the difference between deep learning and neural networks

Classifying deep learning networks

Exploring the TensorFlow package

Introducing the TensorFlow package

Integrating TensorFlow with Comet

Using deep learning- from project setup to report building

Introducing Gradio

Loading the dataset

Implementing a basic model

Exploring results in Comet

Building a prediction interface

Building the final report

Summary

Further reading

Chapter 11: Comet for Time Series Analysis

Technical requirements

Prophet

statsmodels

Introducing basic concepts related to time series analysis

Loading a time series in Python

Checking whether a time series is stationary

Exploring the time series components

Identifying breakpoints in a time series

Exploring the Prophet package

Introducing the Prophet package

Integrating Prophet with Comet

Using time series analysis from project setup to report building

Configuring the Deepnote environment

Loading and preparing the dataset

Checking stationarity in data

Building the models

Exploring results in Comet

Building the final report

Summary

Further reading

Why subscribe?

Other Books You May Enjoy

Preface

A recent survey of machine learning professionals (https://www.comet.com/site/about-us/news-and-events/press-releases/comet-releases-new-survey-highlighting-ais-latest-challenges-too-much-friction-too-little-ml/) concluded that about 40%–60% of interviewed professionals abandoned their data science projects because they were not able to manage the full life cycle process of their data science projects. I’m a data science researcher, and before encountering Comet, I belonged to that 40%–60% of professionals who abandon their data science projects. In fact, during my working experience, I have abandoned many projects without concluding them because of the nature of research, where you test an idea and, if it does not work, you drop it.

Almost a year ago, I discovered Comet, a platform for model tracking and monitoring, and some wonderful people from its team, who opened my mind to the many features provided by Comet. I began to study it, with the hope of keeping my projects organized and moving them from early stages to production. I realized that I was able to conclude all the projects I implemented in Comet because of the simplicity of the platform.

Comet for Data Science is the result of my studies and tests, as well as the countless biweekly meetings with the Comet team. The book aims at helping you to learn how to manage a data science project workflow, from its early stages up to project deployment and reporting. In a single sentence, Comet for Data Science is written to help you to conclude your data science projects successfully.

By picking this book, you will look at the general concepts of data science from a Comet perspective, with the hope that you will increase your productivity. The book will take you through the journey of building a data science project and integrating it into Comet, including exploratory data analysis, model building and evaluation, report building, and, finally, moving the model to production. Throughout the book, you will implement many practical examples that you can use to better understand the described concepts, as well as starting points for your projects.

I hope that this book will add something to your knowledge, and – why not? – help you to become a better data scientist!

Happy reading!

Who this book is for

This book is for data scientists and data analysts who want to learn how to manage and optimize a complete data science project life cycle using Comet and other DevOps platforms. This book is also useful for those who aim at increasing their productivity by means of a practical tool for model tracking and monitoring. Prior programming knowledge of Python is assumed.

What this book covers

Chapter 1, An Overview of Comet, is a general introduction to Comet, an experimentation platform, which allows you to manage and optimize machine learning projects, from their early stages to their final deployment. First, you will learn what Comet is and who its target users are. Then, you will get familiar with the Comet basic concepts, including projects, experiments, workspaces, and panels. Finally, you will build two basic use cases in Comet.

Chapter 2, Exploratory Data Analysis in Comet, guides you to use Comet to perform Exploratory Data Analysis (EDA). First, you will be introduced to the main steps to perform EDA, including problem setting, data preparation, preliminary data analysis, and preliminary results. Then, you will review the main two techniques used to perform EDA: visual and non-visual EDA. Finally, you will learn how to use Comet for EDA through a practical example.

Chapter 3, Model Evaluation in Comet, guides you to use Comet to perform model evaluation. First, you will be introduced to the main concepts to evaluate the performance of a model, such as data splitting, how to choose metrics for evaluation, and the basic concepts behind error analysis. Then, you will see the main model evaluation techniques for different data science tasks (classification, regression, and clustering). Finally, you will learn how to use Comet for model evaluation, through a practical example.

Chapter 4, Workspaces, Projects, Experiments, and Models, deepens some concepts regarding Comet. First, you will see some advanced concepts on workspaces, projects, and experiments, as well as how to perform parameter optimization in Comet. Then, you will learn how to implement a Comet experiment using R or Java as the main programming language. Finally, you will extend the basic examples implemented in Chapter 1, An Overview of Comet.

Chapter 5, Building a Narrative in Comet, describes some strategies to build a good report in Comet. First, you will review the basic concepts and techniques to build a narrative from data, including an overview of the DIKW pyramid, and how to turn your data into a story. Then, you will learn how to build a narrative in Comet through two practical examples.

Chapter 6, Integrating Comet into DevOps, provides you with practical concepts and examples on DevOps and MLOps and how to integrate them into Comet. First, you will review the basic concepts and best practices related to DevOps and MLOps. Then, you will learn how to integrate Comet into the DevOps/MLOps paradigm through the concept of the REST API. Next, you will analyze Docker and Kubernetes, two of the most common frameworks for DevOps. Finally, you will learn how to integrate Comet in Docker and Kubernetes through two practical examples.

Chapter 7, Extending the GitLab DevOps Platform with Comet, describes the concept of Continuous Integration (CI) and Continuous Delivery (CD), how to implement it using GitLab, and how to integrate Comet in a CI/CD workflow. First, you will review the basic concepts of CI/CD, including the CI/CD workflow and the concept of a source control system. Then, you will see the GitLab basic concepts, including an overview of its architecture, how versioning works, and the basic GitLab commands.

Then, you will configure GitLab to work with Comet. Finally, you will see a practical example that will help you to get familiar with the described concepts.

Chapter 8, Comet for Machine Learning, provides you with an overview of the Machine Learning (ML) concepts, with a focus on the scikit-learn library, and how to integrate them in Comet. First, you will review the basic ML concepts, including a classification of the main ML systems, the main ML models, and their main challenges. Then, you will review the scikit-learn package, with a focus on preprocessing, dimensionality reduction, model selection, supervised learning, and unsupervised learning. Finally, you will learn how to integrate Comet with scikit-learn through a practical example.

Chapter 9, Comet for Natural Language Processing, illustrates the main concepts behind Natural Language Processing (NLP), with a focus on the Spark NLP library, and how to integrate the main concepts in Comet. First, you will review the basic NLP workflow and also learn how to classify the main NLP systems and what their main challenges are. Then, you will review the Spark NLP library, including the concepts of annotation and pipeline. Finally, you will learn how to integrate Comet with Spark NLP through a practical example.

Chapter 10, Comet for Deep Learning, describes the main concepts behind Deep Learning (DL), with a focus on the TensorFlow library, and how to integrate them in Comet. First, you will review the basic concepts behind neural networks, their difference with respect to DL networks, and how to classify DL networks. Then, you will review the TensorFlow library, with a focus on how to load a dataset, as well as how to build a train a model. Finally, you will learn how to integrate Comet with TensorFlow through a practical example.

Chapter 11, Comet for Time Series Analysis, reviews the main concepts related to Time Series Analysis (TSA), with a focus on the Prophet library, and how to integrate them into Comet. First, you will review the basic concepts behind TSA, including the concept of stationarity, the time series components, and how to check the presence of breakpoints in a time series. Then, you will be introduced to the Prophet library, with a focus on how to build a prediction model. Finally, you will learn how to integrate Comet with Prophet through a practical example.

To get the most out of this book

You should have a basic knowledge of data science, as well as its general objectives. In addition, you should be familiar with the Python language, and, in particular with the pandas and matplotlib libraries.

You will need a version of Python installed on your computer – Python 3.8, if possible. Almost all code examples have been tested using macOS Monterey 12.0.1, with the exception of software in Chapter 10, Comet for Deep Learning, which has been tested on Google Colab. However, code examples should work with future version releases too.

You should notice that the code examples described in Chapter 4, Workspaces, Projects, Experiments, and Models, require a different version of Java with respect to those described in Chapter 9, Comet for Natural Language Processing.

In addition, to make Comet work, you need to sign up to the Comet platform (https://www.comet.com/signup) and create an account.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Comet-for-Data-Science. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/sJpZu.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “We note that some columns, such as country and reservation_status_date, have a high cardinality.”

A block of code is set as follows:

import pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom datetime import datetime

Any command-line input or output is written as follows:

pip install pandas-profiling

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Regarding the Variables section, we can distinguish between categorical and numerical data.”

Tips or Important Notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Comet for Data Science, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Preface

Preface

Section 1 – Getting Started with Comet

This section will introduce the basic concepts behind Comet, including the main Comet components: workspaces, projects, experiments, and panels (Chapter 1, An Overview of Comet). In addition, you will learn how to use Comet to perform exploratory data analysis (Chapter 2, Exploratory Data Analysis in Comet) and model evaluation in Comet (Chapter 3,
Model Evaluation in Comet). Throughout this section, we do not cover model building because we will deal with this aspect in a later section, with more details and examples.

To get more familiar with the described concepts, you will be guided to implement some basic use cases, through step-by-step and commented examples in Python.

The main focus of this section is to get you ready to work with all the main features provided by Comet.

This section includes the following chapters:

Chapter 1, An Overview of CometChapter 2, Exploratory Data Analysis in CometChapter 3, Model Evaluation in Comet