29,99 €
As the fields of data science, machine learning, and artificial intelligence rapidly evolve, .NET developers are eager to leverage their expertise to dive into these exciting domains but are often unsure of how to do so. Data Science in .NET with Polyglot Notebooks is the practical guide you need to seamlessly bring your .NET skills into the world of analytics and AI.
With Microsoft’s .NET platform now robustly supporting machine learning and AI tasks, the introduction of tools such as .NET Interactive kernels and Polyglot Notebooks has opened up a world of possibilities for .NET developers. This book empowers you to harness the full potential of these cutting-edge technologies, guiding you through hands-on experiments that illustrate key concepts and principles. Through a series of interactive notebooks, you’ll not only master technical processes but also discover how to integrate these new skills into your current role or pivot to exciting opportunities in the data science field.
By the end of the book, you’ll have acquired the necessary knowledge and confidence to apply cutting-edge data science techniques and deliver impactful solutions within the .NET ecosystem.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 512
Veröffentlichungsjahr: 2024
Data Science with .NET and Polyglot Notebooks
Programmer’s guide to data science using ML.NET, OpenAI, and Semantic Kernel
Matt Eland
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Kunal Sawant
Publishing Product Manager: Debadrita Chatterjee
Book Project Manager: Manisha Singh
Senior Editor: Esha Banerjee
Technical Editor: Jubit Pincy
Copy Editor: Safis Editing
Proofreader: Esha Banerjee
Indexer: Subalakshmi Govindhan
Production Designer: Joshua Misquitta
DevRel Marketing Coordinator: Sonia Chauhan
First published: August 2024
Production reference: 1230824
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK
ISBN 978-1-83588-296-2
www.packtpub.com
To Jon, Diego, Brett, Aleksei, Michael, Luis, Cesar, Bruno, and all the others who made the things described in this book possible.
To Sam and Sadukie, who encouraged, supported, and mercilessly heckled me as I began to publicly teach the topics captured in this book.
To Heather, for her endless love and patience, and for telling me to “just go get a master’s degree already!”
Matt Eland is a senior software engineering and data science consultant at Leading EDJE in Columbus, Ohio. He loves sharing his journey by teaching software engineering, AI, and data science concepts in the most engaging ways possible. Matt has used machine learning to settle debates over whether certain movies are Christmas movies, reinforcement learning to create digital attack squirrels, data analytics to suggest improvements to his favorite TV show, and AI agents to play board games and create an AI agent with the personality of a dog. He is the author of multiple books and courses, helps organize a user group and regional conferences, holds a Master’s of Science in Data Analytics, and is a 2x Microsoft MVP in AI and .NET.
Thank you to my friends and family for their continued support, as well as my coworkers at Leading EDJE and my client sites who smile and nod at my evening and weekend writing endeavors. Special thanks to Heather, Sam, Sadukie, Eddie, Victor, Mike, James, Kanan, and Stephanie for celebrating little milestones with me. Extra special thanks to Sam Gomez and Sam Nasr for being my fellow MVPs and technical reviewers. Finally, thank you to Debadrita, Esha, and Manisha for your work on our second project together at Packt.
Sam Nasr has been a software developer since 1995, focusing primarily on Microsoft technologies. He is a Senior Software Engineer with NIS Technologies where he consults and teaches clients about the latest .NET technologies. Sam has achieved multiple certifications from Microsoft, such as MCSA, MCAD, MCTS, and MCT, and he has been the leader of the Cleveland C# since 2003. He also holds leadership roles for the .NET Study Group and Azure Cleveland User Group.
When not coding, Sam loves spending time with his family and friends or volunteering at his local church. You can learn more about Sam by visiting https://linktr.ee/samnasr.
Samuel Gomez has worked in software development for 15+ years (mostly Microsoft technologies). He is deeply passionate about the problem-solving aspect of his work. Recently, he has dedicated himself to exploring AI and machine learning technologies and has been working on understanding how these technologies can be applied to different aspects of our lives.
Beyond coding, Sam enjoys spending time with his family. As a soccer enthusiast, he loves to play, watch, and coach the sport.
We’ll start our journey by introducing Polyglot Notebooks and their role in software engineering, data analysis, and machine learning workflows. We’ll cover the Polyglot Notebooks technology and user interface as well as the basic decisions and actions you’ll make working with notebooks.
The remainder of this part focuses on loading data into a notebook and then performing data analysis in Polyglot Notebooks using .NET tooling using C# and F#.
Whether you’re an experienced data analyst or have no prior knowledge, you’ll learn how to load up, analyze, clean, and manipulate data using .NET technologies like the DataFrame.
You’ll also see how you can effectively understand data distributions and create helpful visuals using libraries like Plotly.NET, Microsoft.Data.Analysis, and MathNet.Numerics.
This part has the following chapters:
Chapter 1, Data Science, Notebooks, and KernelsChapter 2, Exploring Polyglot NotebooksChapter 3, Getting Data and Code into Your NotebooksChapter 4, Working with Tabular Data and DataFramesChapter 5, Visualizing DataChapter 6, Variable CorrelationsIn the last chapter, we covered what notebooks are and how they’re helpful to developers, data analysts, data scientists, and technology managers.
In this chapter, we’ll dive in and see Polyglot Notebooks in action. We’ll start by installing Polyglot Notebooks and then build and execute our first notebook. By the time we’re done with this chapter, you should have a much clearer idea of what the notebook development process looks like so we can drill into more specific uses in future chapters.
In this chapter, we’ll cover the following:
Installing Polyglot NotebooksCreating your first notebookExecuting notebook cellsDeclaring classes and methodsWorking with other languagesSharing variables between kernelsTroubleshooting notebook executionIn order to work with Polyglot Notebooks, you will need to first install Visual Studio Code (VS Code) and the version of .NET that Polyglot Notebooks currently requires.
VS Code is freely available at https://Code.VisualStudio.com for Windows 10 and above, macOS 10.15 and above, and various distributions of Linux, including Debian, Ubuntu, Red Hat, Fedora, and SUSE.
Polyglot Notebooks requires a specific version of .NET in order to operate. At the point of this book’s release, that was .NET 8, but the Polyglot Notebooks team periodically updates to new versions of .NET as new major versions are released.
https://dotnet.microsoft.com/en-us/download/visual-studio-sdks VS Code and Polyglot Notebooks can also be run in your browser via GitHub Codespaces. We’ll explore this more in Chapter 15, Adopting and Deploying Polyglot Notebooks. In fact, you can always check the required .NET version by looking at the devcontainer.json file in the .devcontainer folder of the official Polyglot Notebooks repository at https://GitHub.com/dotnet/interactive.
Once you have VS Code and the required version of .NET installed, open VS Code and let’s walk through the one-time process of installing Polyglot Notebooks. First, let’s clear our workspace by opening a folder where you’d like to store your notebooks.
Do this by using the File menu followed by Open Folder…and then selecting a folder on your machine to open. I typically select an empty folder to keep my projects focused. You’ll now need to install Polyglot Notebooks through the Extensions Marketplace in VS Code.
You can get to the extensions view by pressing Ctrl + Shift + X or by clicking on the four-boxes icon on the sidebar, as shown here:
Figure 2.1 – The Extensions view location
The Extensions view allows you to install a wide variety of extensions in VS Code. Extensions can do a wide variety of things, including the following:
Adding compiler support for a new languageAdding a word count to your status barAdding a new view for looking at resources on Azure or AWSAdding new editor color themesResponding to key press events by causing digital explosions, fireworks, and screen shaking (yes, this really is a thing – check out the PowerMode extension)Polyglot Notebooks is packaged and installed as an extension. In order to get started, we’ll need to find it in the Extensions Marketplace.
Type Polyglot Notebooks in the search bar and you should see the list of extensions updated to include the correct result, shown in Figure 2.2by Microsoft:
Figure 2.2 – The Polyglot Notebooks extension
Click Install and you should now see the extension download and install on your copy of VS Code.
A warning on private NuGet feeds
I once encountered a puzzling failure where Polyglot Notebooks would not install on my laptop and would not indicate a failure message. I ultimately discovered that Polyglot Notebooks was failing because my machine had a private NuGet feed registered from a control vendor and the install was failing because it was unable to check that particular NuGet feed during installation. See https://github.com/dotnet/interactive/issues/3319 for more details on this particular issue.
It’s possible that during installation or while creating your first notebook, you’ll encounter an error message indicating you do not have the correct version of .NET installed. The error message will look something like Figure 2.3:
Figure 2.3 – A sample failure when lacking the correct .NET SDK version
Thankfully, this message contains details on what dependencies your machine is missing and how to get them.
If you encounter other issues during installation, you may want to skip ahead in this chapter and see the Troubleshooting notebook execution section for more steps on getting information about failures or submitting issues for additional assistance.
Once you have Polyglot Notebooks installed, you’re ready to create your first notebook.
There is no button in the user interface for adding your first workbook. Instead, we’ll need to access this feature by using VS Code’s Command Palette.
The Command Palette is where VS Code keeps its internal commands and settings, along with those registered by extensions you’ve installed, including Polyglot Notebooks. This helps the user interface stay minimal while keeping commands easily accessible.
Press Ctrl + Shift + P to open the Command Palette and list the available commands.
Ctrl + Shift + P versus Ctrl + P
It’s worth noting that VS Code supports both the Ctrl + Shift + P shortcut for the Command Palette and the Ctrl + P shortcut for navigation. The user interface for both of them is very similar, but the navigator lists files in your editing session while Ctrl + Shift + P lists available commands. See this chapter’s Further reading section for additional VS Code user interface resources.
Next, type in Polyglot Notebook to filter down the command list and choose Create new blank notebook, as shown in Figure 2.4:
Figure 2.4 – The Command Palette
Next, VS Code will ask you if you want to create a new .dib or .ipynb file via a dialog, as shown in Figure 2.5:
Figure 2.5 – Specifying the notebook format
If you’ve used Jupyter Notebooks before, you may recognize the .ipynb extension as short for an interactive Python notebook, otherwise known as a Jupyter notebook.
.ipynb files store both code and execution history, including the output of notebook cells. This means that you can share an .ipynb file with others and they can generally follow along with the notebook without needing to run it themselves.
The downside of this is that .ipynb files are larger and have a more complex file format. This makes storing .ipynb in version control more challenging than storing smaller code files.
.dib files are a new file format introduced by Polyglot Notebooks. .dib files store the notebook structure and code but do not retain the results of the last time cells were executed. This results in .dib files being smaller and easier to manage in version control but they do require the opener to rerun the notebook upon opening it if they want to see past results.
Both approaches are valid, with different advantages and disadvantages. There’s an excellent GitHub discussion in the Polyglot Notebooks repository on the new .dib file format. Those curious can find the link in the Further reading section at the end of the chapter.
For this book, I’ll be sticking with .dib files due to their smaller size and easier ability to track changes to them in version control since most chapters in this book will have code associated with them in a GitHub repository.