Data Science with .NET and Polyglot Notebooks - Matt Eland - E-Book

Data Science with .NET and Polyglot Notebooks E-Book

Matt Eland

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

As the fields of data science, machine learning, and artificial intelligence rapidly evolve, .NET developers are eager to leverage their expertise to dive into these exciting domains but are often unsure of how to do so. Data Science in .NET with Polyglot Notebooks is the practical guide you need to seamlessly bring your .NET skills into the world of analytics and AI.
With Microsoft’s .NET platform now robustly supporting machine learning and AI tasks, the introduction of tools such as .NET Interactive kernels and Polyglot Notebooks has opened up a world of possibilities for .NET developers. This book empowers you to harness the full potential of these cutting-edge technologies, guiding you through hands-on experiments that illustrate key concepts and principles. Through a series of interactive notebooks, you’ll not only master technical processes but also discover how to integrate these new skills into your current role or pivot to exciting opportunities in the data science field.
By the end of the book, you’ll have acquired the necessary knowledge and confidence to apply cutting-edge data science techniques and deliver impactful solutions within the .NET ecosystem.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 512

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Data Science with .NET and Polyglot Notebooks

Programmer’s guide to data science using ML.NET, OpenAI, and Semantic Kernel

Matt Eland

Data Science with .NET and Polyglot Notebooks

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Kunal Sawant

Publishing Product Manager: Debadrita Chatterjee

Book Project Manager: Manisha Singh

Senior Editor: Esha Banerjee

Technical Editor: Jubit Pincy

Copy Editor: Safis Editing

Proofreader: Esha Banerjee

Indexer: Subalakshmi Govindhan

Production Designer: Joshua Misquitta

DevRel Marketing Coordinator: Sonia Chauhan

First published: August 2024

Production reference: 1230824

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK

ISBN 978-1-83588-296-2

www.packtpub.com

To Jon, Diego, Brett, Aleksei, Michael, Luis, Cesar, Bruno, and all the others who made the things described in this book possible.

To Sam and Sadukie, who encouraged, supported, and mercilessly heckled me as I began to publicly teach the topics captured in this book.

To Heather, for her endless love and patience, and for telling me to “just go get a master’s degree already!”

Contributors

About the author

Matt Eland is a senior software engineering and data science consultant at Leading EDJE in Columbus, Ohio. He loves sharing his journey by teaching software engineering, AI, and data science concepts in the most engaging ways possible. Matt has used machine learning to settle debates over whether certain movies are Christmas movies, reinforcement learning to create digital attack squirrels, data analytics to suggest improvements to his favorite TV show, and AI agents to play board games and create an AI agent with the personality of a dog. He is the author of multiple books and courses, helps organize a user group and regional conferences, holds a Master’s of Science in Data Analytics, and is a 2x Microsoft MVP in AI and .NET.

Thank you to my friends and family for their continued support, as well as my coworkers at Leading EDJE and my client sites who smile and nod at my evening and weekend writing endeavors. Special thanks to Heather, Sam, Sadukie, Eddie, Victor, Mike, James, Kanan, and Stephanie for celebrating little milestones with me. Extra special thanks to Sam Gomez and Sam Nasr for being my fellow MVPs and technical reviewers. Finally, thank you to Debadrita, Esha, and Manisha for your work on our second project together at Packt.

About the reviewers

Sam Nasr has been a software developer since 1995, focusing primarily on Microsoft technologies. He is a Senior Software Engineer with NIS Technologies where he consults and teaches clients about the latest .NET technologies. Sam has achieved multiple certifications from Microsoft, such as MCSA, MCAD, MCTS, and MCT, and he has been the leader of the Cleveland C# since 2003. He also holds leadership roles for the .NET Study Group and Azure Cleveland User Group.

When not coding, Sam loves spending time with his family and friends or volunteering at his local church. You can learn more about Sam by visiting https://linktr.ee/samnasr.

Samuel Gomez has worked in software development for 15+ years (mostly Microsoft technologies). He is deeply passionate about the problem-solving aspect of his work. Recently, he has dedicated himself to exploring AI and machine learning technologies and has been working on understanding how these technologies can be applied to different aspects of our lives.

Beyond coding, Sam enjoys spending time with his family. As a soccer enthusiast, he loves to play, watch, and coach the sport.

Table of Contents

Preface

Part 1: Data Analysis in Polyglot Notebooks

1

Data Science, Notebooks, and Kernels

Exploring the field of data science

The rise of big data

Data analytics

Machine learning

Artificial intelligence

Data science notebooks and Project Jupyter

Extending notebooks with kernels

Polyglot Notebooks and .NET Interactive

Summary

Further reading

2

Exploring Polyglot Notebooks

Technical requirements

Installing Polyglot Notebooks

Creating your first notebook

Executing notebook cells

Adding code cells

Working with variables

The Variables view

Markdown cells

Declaring classes and methods

Declaring methods

Declaring classes

Working with other languages

Sharing variables between languages

Exporting variables

Troubleshooting notebook execution

Resolving compiler errors

Problems with notebook execution

Diagnostic output for Polyglot Notebooks errors

Issues and the Polyglot Notebooks repository

Summary

Further reading

3

Getting Data and Code into Your Notebooks

Technical requirements

Importing code and NuGet packages

Importing code files

Importing NuGet packages

Importing project files

Reading CSV data

Understanding CSV data

Reading CSV data into a DataFrame

Specialized CSV loading scenarios

Troubleshooting CSV loading errors

Loading TSV and other delimited file formats

Getting JSON data with PowerShell

Building DataFrames from objects

Connecting to databases with SQL

Connecting to a SQL database

Executing SQL from SQL kernels

Sharing SQL results with other kernels

Alternative ways of connecting to the Database

Querying Kusto clusters with KQL

Summary

Further reading

4

Working with Tabular Data and DataFrames

Technical requirements

Understanding data cleaning and data wrangling

Where unclean data comes from

The impact of unclean data

Data cleaning and data wrangling

Working with DataFrames in C#

Viewing and sampling data

Rows

Getting and setting cell values

Iterating over rows

Working with columns

Columns

Analyzing columns

Removing columns

Renaming columns

Adding a new column

Handling missing values

Sorting, filtering, grouping, and merging data

Sorting DataFrames

Grouping and aggregating DataFrames

Merging DataFrames

Filtering DataFrames

DataFrames in other languages

Summary

Further reading

5

Visualizing Data

Technical requirements

Understanding exploratory data analysis

Data visualization’s role in exploratory data analysis

Descriptive statistics for EDA

Extracting insights with descriptive statistics

Using DataFrame.Description to generate descriptive statistics

Descriptive statistics with MathNet.Numerics

Creating a box plot with ScottPlot

Performing univariate analysis with Plotly.NET

Plotly and Plotly.NET

Box plots in Plotly.NET

Violin plots with Plotly.NET

Histograms with Plotly.NET

Summary

Further reading

6

Variable Correlations

Technical requirements

Performing multivariate analysis with Plotly.NET

Loading data and dependencies

Multivariate analysis with box and violin plots

Plotting multiple values with scatter plots

Adding color to a scatter plot

3D scatter plots with Plotly.NET

Identifying variable correlations

Calculating variable correlations

Building feature correlation matrixes

Summary

Further reading

Part 2: Machine Learning with Polyglot Notebooks and ML.NET

7

Classification Experiments with ML.NET AutoML

Technical requirements

Understanding machine learning

Supervised learning

Classification and regression

Introducing ML.NET and AutoML

Understanding AutoML

AutoML and data pre-processing

Creating training and testing datasets

Training a classification model with ML.NET AutoML

Evaluating binary classification models

Evaluating our model

Calculating feature importance

Predicting values with binary classification models

Summary

Further reading

8

Regression Experiments with ML.NET AutoML

Technical requirements

Understanding regression

Our regression task

Regression as a numerical formula

Our regression dataset

Performing a regression experiment

Understanding cross-validation

Interpreting cross-validation results

Evaluating regression metrics

Predicting values for outliers

Applying PFI to regression models

Applying a regression model

Summary

Further reading

9

Beyond AutoML: Pipelines, Trainers, and Transforms

Technical requirements

Performing regression without AutoML

Features and pipelines

Creating an AutoML pipeline

Controlling AutoML pipelines

Customizing the Featurizer

Customizing the model trainer selector

Customizing hyperparameter tuning

Understanding the search space

Customizing the search space

Customizing the hyperparameter tuner

Scaling numeric columns

Selecting regression algorithms

Selecting binary classification algorithms

Summary

Further reading

10

Deploying Machine Learning Models

Technical requirements

Introducing our multi-class classification model

Training our model

Evaluating multi-class classification models

Generating test predictions

Exporting ML.NET models

Hosting ML.NET models in ASP.NET web applications

Configuring a PredictionEnginePool

Using the PredictionEnginePool

Understanding model performance, data drift, and MLOps

Detecting model drift

MLOps and updating models

Surveying additional ML.NET capabilities

ONNX and TensorFlow models in ML.NET

Summary

Further reading

Part 3: Exploring Generative AI with Polyglot Notebooks

11

Generative AI in Polyglot Notebooks

Technical requirements

Understanding Generative AI

Deploying generative AI models on Azure

Creating an Azure OpenAI Service

Deploying models on Azure OpenAI Service

Getting access credentials for Azure OpenAI

Connecting to an Azure OpenAI Service

Chatting with a deployed model

Customizing model behavior with prompt engineering

Zero-shot, one-shot, and few-shot inferencing

Using text embeddings

Generating images with DALL-E

Summary

Further reading

12

AI Orchestration with Semantic Kernel

Technical requirements

Understanding RAG and AI orchestration

Introducing Semantic Kernel

Chatting with Semantic Kernel functions

Building the Kernel

Creating a prompt function

Adding memory to Semantic Kernel

Defining complex functions

Creating functions from methods

Accepting KernelFunction parameters

Defining a memory function

Calling multiple functions using plugins

Examining FunctionResult objects

Azure OpenAI content filtering

Handling complex requests with planners

Knowing where to go from here

Summary

Further reading

Part 4: Polyglot Notebooks in the Enterprise

13

Enriching Documentation with Mermaid Diagrams

Technical requirements

Introducing Mermaid diagrams

Communicating logic with flowcharts

Communicating structure with class diagrams

Communicating data with Entity Relationship Diagrams

Communicating behavior with state diagrams

Communicating flow with sequence diagrams

Communicating workflow with Git graphs

Summary

Further reading

14

Extending Polyglot Notebooks

Technical requirements

Understanding default formatting behavior

Default object formatting

Default collection formatting

Styling output with custom formatters

Exploring magic commands

Creating a Polyglot Notebook extension

Working with parameters

Invoking code on kernels

Summary

Further reading

15

Adopting and Deploying Polyglot Notebooks

Technical requirements

Integrating Polyglot Notebooks into your day job

Enabling rapid experimentation

Supporting AI and analytics workloads

Assisting testing workloads

Training new team members with Polyglot Notebooks

Sharing Polyglot Notebooks with your team

Integrating Polyglot Notebooks into Jupyter or JupyterLab

Storing Notebooks in source control

Deploying Polyglot Notebooks to GitHub Codespaces

Configuring GitHub codespaces

Creating a codespace on GitHub

Advancing into machine learning and AI

Adding data science to your day job

Getting into data science

Succeeding in data science

Summary

Further reading

Index

Other Books You May Enjoy

Part 1: Data Analysis in Polyglot Notebooks

We’ll start our journey by introducing Polyglot Notebooks and their role in software engineering, data analysis, and machine learning workflows. We’ll cover the Polyglot Notebooks technology and user interface as well as the basic decisions and actions you’ll make working with notebooks.

The remainder of this part focuses on loading data into a notebook and then performing data analysis in Polyglot Notebooks using .NET tooling using C# and F#.

Whether you’re an experienced data analyst or have no prior knowledge, you’ll learn how to load up, analyze, clean, and manipulate data using .NET technologies like the DataFrame.

You’ll also see how you can effectively understand data distributions and create helpful visuals using libraries like Plotly.NET, Microsoft.Data.Analysis, and MathNet.Numerics.

This part has the following chapters:

Chapter 1, Data Science, Notebooks, and KernelsChapter 2, Exploring Polyglot NotebooksChapter 3, Getting Data and Code into Your NotebooksChapter 4, Working with Tabular Data and DataFramesChapter 5, Visualizing DataChapter 6, Variable Correlations

2

Exploring Polyglot Notebooks

In the last chapter, we covered what notebooks are and how they’re helpful to developers, data analysts, data scientists, and technology managers.

In this chapter, we’ll dive in and see Polyglot Notebooks in action. We’ll start by installing Polyglot Notebooks and then build and execute our first notebook. By the time we’re done with this chapter, you should have a much clearer idea of what the notebook development process looks like so we can drill into more specific uses in future chapters.

In this chapter, we’ll cover the following:

Installing Polyglot NotebooksCreating your first notebookExecuting notebook cellsDeclaring classes and methodsWorking with other languagesSharing variables between kernelsTroubleshooting notebook execution

Technical requirements

In order to work with Polyglot Notebooks, you will need to first install Visual Studio Code (VS Code) and the version of .NET that Polyglot Notebooks currently requires.

VS Code is freely available at https://Code.VisualStudio.com for Windows 10 and above, macOS 10.15 and above, and various distributions of Linux, including Debian, Ubuntu, Red Hat, Fedora, and SUSE.

Polyglot Notebooks requires a specific version of .NET in order to operate. At the point of this book’s release, that was .NET 8, but the Polyglot Notebooks team periodically updates to new versions of .NET as new major versions are released.

https://dotnet.microsoft.com/en-us/download/visual-studio-sdks VS Code and Polyglot Notebooks can also be run in your browser via GitHub Codespaces. We’ll explore this more in Chapter 15, Adopting and Deploying Polyglot Notebooks. In fact, you can always check the required .NET version by looking at the devcontainer.json file in the .devcontainer folder of the official Polyglot Notebooks repository at https://GitHub.com/dotnet/interactive.

Installing Polyglot Notebooks

Once you have VS Code and the required version of .NET installed, open VS Code and let’s walk through the one-time process of installing Polyglot Notebooks. First, let’s clear our workspace by opening a folder where you’d like to store your notebooks.

Do this by using the File menu followed by Open Folder…and then selecting a folder on your machine to open. I typically select an empty folder to keep my projects focused. You’ll now need to install Polyglot Notebooks through the Extensions Marketplace in VS Code.

You can get to the extensions view by pressing Ctrl + Shift + X or by clicking on the four-boxes icon on the sidebar, as shown here:

Figure 2.1 – The Extensions view location

The Extensions view allows you to install a wide variety of extensions in VS Code. Extensions can do a wide variety of things, including the following:

Adding compiler support for a new languageAdding a word count to your status barAdding a new view for looking at resources on Azure or AWSAdding new editor color themesResponding to key press events by causing digital explosions, fireworks, and screen shaking (yes, this really is a thing – check out the PowerMode extension)

Polyglot Notebooks is packaged and installed as an extension. In order to get started, we’ll need to find it in the Extensions Marketplace.

Type Polyglot Notebooks in the search bar and you should see the list of extensions updated to include the correct result, shown in Figure 2.2by Microsoft:

Figure 2.2 – The Polyglot Notebooks extension

Click Install and you should now see the extension download and install on your copy of VS Code.

A warning on private NuGet feeds

I once encountered a puzzling failure where Polyglot Notebooks would not install on my laptop and would not indicate a failure message. I ultimately discovered that Polyglot Notebooks was failing because my machine had a private NuGet feed registered from a control vendor and the install was failing because it was unable to check that particular NuGet feed during installation. See https://github.com/dotnet/interactive/issues/3319 for more details on this particular issue.

It’s possible that during installation or while creating your first notebook, you’ll encounter an error message indicating you do not have the correct version of .NET installed. The error message will look something like Figure 2.3:

Figure 2.3 – A sample failure when lacking the correct .NET SDK version

Thankfully, this message contains details on what dependencies your machine is missing and how to get them.

If you encounter other issues during installation, you may want to skip ahead in this chapter and see the Troubleshooting notebook execution section for more steps on getting information about failures or submitting issues for additional assistance.

Once you have Polyglot Notebooks installed, you’re ready to create your first notebook.

Creating your first notebook

There is no button in the user interface for adding your first workbook. Instead, we’ll need to access this feature by using VS Code’s Command Palette.

The Command Palette is where VS Code keeps its internal commands and settings, along with those registered by extensions you’ve installed, including Polyglot Notebooks. This helps the user interface stay minimal while keeping commands easily accessible.

Press Ctrl + Shift + P to open the Command Palette and list the available commands.

Ctrl + Shift + P versus Ctrl + P

It’s worth noting that VS Code supports both the Ctrl + Shift + P shortcut for the Command Palette and the Ctrl + P shortcut for navigation. The user interface for both of them is very similar, but the navigator lists files in your editing session while Ctrl + Shift + P lists available commands. See this chapter’s Further reading section for additional VS Code user interface resources.

Next, type in Polyglot Notebook to filter down the command list and choose Create new blank notebook, as shown in Figure 2.4:

Figure 2.4 – The Command Palette

Next, VS Code will ask you if you want to create a new .dib or .ipynb file via a dialog, as shown in Figure 2.5:

Figure 2.5 – Specifying the notebook format

If you’ve used Jupyter Notebooks before, you may recognize the .ipynb extension as short for an interactive Python notebook, otherwise known as a Jupyter notebook.

.ipynb files store both code and execution history, including the output of notebook cells. This means that you can share an .ipynb file with others and they can generally follow along with the notebook without needing to run it themselves.

The downside of this is that .ipynb files are larger and have a more complex file format. This makes storing .ipynb in version control more challenging than storing smaller code files.

.dib files are a new file format introduced by Polyglot Notebooks. .dib files store the notebook structure and code but do not retain the results of the last time cells were executed. This results in .dib files being smaller and easier to manage in version control but they do require the opener to rerun the notebook upon opening it if they want to see past results.

Both approaches are valid, with different advantages and disadvantages. There’s an excellent GitHub discussion in the Polyglot Notebooks repository on the new .dib file format. Those curious can find the link in the Further reading section at the end of the chapter.

For this book, I’ll be sticking with .dib files due to their smaller size and easier ability to track changes to them in version control since most chapters in this book will have code associated with them in a GitHub repository.