Hands-On Machine Learning with ML.NET - Jarred Capellman - E-Book

Hands-On Machine Learning with ML.NET E-Book

Jarred Capellman

0,0
40,81 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Create, train, and evaluate various machine learning models such as regression, classification, and clustering using ML.NET, Entity Framework, and ASP.NET Core




Key Features



  • Get well-versed with the ML.NET framework and its components and APIs using practical examples


  • Learn how to build, train, and evaluate popular machine learning algorithms with ML.NET offerings


  • Extend your existing machine learning models by integrating with TensorFlow and other libraries



Book Description



Machine learning (ML) is widely used in many industries such as science, healthcare, and research and its popularity is only growing. In March 2018, Microsoft introduced ML.NET to help .NET enthusiasts in working with ML. With this book, you'll explore how to build ML.NET applications with the various ML models available using C# code.






The book starts by giving you an overview of ML and the types of ML algorithms used, along with covering what ML.NET is and why you need it to build ML apps. You'll then explore the ML.NET framework, its components, and APIs. The book will serve as a practical guide to helping you build smart apps using the ML.NET library. You'll gradually become well versed in how to implement ML algorithms such as regression, classification, and clustering with real-world examples and datasets. Each chapter will cover the practical implementation, showing you how to implement ML within .NET applications. You'll also learn to integrate TensorFlow in ML.NET applications. Later you'll discover how to store the regression model housing price prediction result to the database and display the real-time predicted results from the database on your web application using ASP.NET Core Blazor and SignalR.






By the end of this book, you'll have learned how to confidently perform basic to advanced-level machine learning tasks in ML.NET.




What you will learn



  • Understand the framework, components, and APIs of ML.NET using C#


  • Develop regression models using ML.NET for employee attrition and file classification


  • Evaluate classification models for sentiment prediction of restaurant reviews


  • Work with clustering models for file type classifications


  • Use anomaly detection to find anomalies in both network traffic and login history


  • Work with ASP.NET Core Blazor to create an ML.NET enabled web application


  • Integrate pre-trained TensorFlow and ONNX models in a WPF ML.NET application for image classification and object detection



Who this book is for



If you are a .NET developer who wants to implement machine learning models using ML.NET, then this book is for you. This book will also be beneficial for data scientists and machine learning developers who are looking for effective tools to implement various machine learning algorithms. A basic understanding of C# or .NET is mandatory to grasp the concepts covered in this book effectively.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 272

Veröffentlichungsjahr: 2020

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Hands-On Machine Learning with ML.NET

 

 

Getting started with Microsoft ML.NET to implement popular machine learning algorithms in C#

 

 

 

 

 

 

 

 

 

 

 

 

 

Jarred Capellman

 

 

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Hands-On Machine Learning with ML.NET

Copyright © 2020 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

Commissioning Editor: Pravin DhandreAcquisition Editor: Devika BattikeContent Development Editor: Joseph SunilSenior Editor: David SugarmanTechnical Editor: Utkarsha KadamCopy Editor: Safis EditingProject Coordinator: Aishwarya MohanProofreader: Safis EditingIndexer: Manju ArasanProduction Designer: Aparna Bhagat

 

First published: March 2020

Production reference: 1260320

 

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

 

ISBN 978-1-78980-178-1

www.packt.com

To my amazing wife, Amy, for completing me.
– Jarred Capellman
 

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Fully searchable for easy access to vital information

Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. 

Contributors

About the author

 

Jarred Capellman is a Director of Engineering at SparkCognition, a cutting-edge artificial intelligence company located in Austin, Texas. At SparkCognition, he leads the engineering and data science team on the industry-leading machine learning endpoint protection product, DeepArmor, combining his passion for software engineering, cybersecurity, and data science. In his free time, he enjoys contributing to GitHub daily on his various projects and is working on his DSc in cybersecurity, focusing on applying machine learning to solving network threats. He currently lives just outside of Austin, Texas, with his wife, Amy.

To my wife, Amy, who supported me through the nights and weekends – I devote this book to her.

About the reviewer

AndrewGreenwald holds an MSc in computer science from Drexel University and a BSc in electrical engineering with a minor in mathematics from Villanova University. He started his career designing solid-state circuits to test electronic components. For the past 25 years, he has been developing software for IT infrastructure, financial markets, and defense applications. He is currently applying machine learning to cybersecurity, developing models to detect zero-day malware. Andrew lives in Austin, Texas, with his wife and three sons.

 

 

 

 

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Hands-On Machine Learning with ML.NET

Dedication

About Packt

Why subscribe?

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Fundamentals of Machine Learning and ML.NET

Getting Started with Machine Learning and ML.NET

The importance of learning about machine learning today

The model building process

Defining your problem statement

Defining your features

Obtaining a dataset

Feature extraction and pipeline

Model training

Model evaluation

Exploring types of learning

Supervised learning

Unsupervised learning

Exploring various machine learning algorithms

Binary classification

Regression

Anomaly detection

Clustering

Matrix factorization

What is ML.NET?

Technical details of ML.NET

Components of ML.NET

Extensibility of ML.NET

Summary

Setting Up the ML.NET Environment

Setting up your development environment

Installing Visual Studio

Installing .NET Core 3

Creating a process

Creating your first ML.NET application

Creating the project in Visual Studio

Project architecture

Running the code

The RestaurantFeedback class

The RestaurantPrediction class

The Trainer class

The Predictor class

The BaseML class

The Program class

Running the example

Evaluating the model

Summary

Section 2: ML.NET Models

Regression Model

Breaking down regression models

Choosing the type of regression model

Choosing a linear regression trainer

Choosing a logistic regression trainer

Creating the linear regression application

Diving into the trainer

Exploring the project architecture

Diving into the code

The ExtensionMethods class

The EmploymentHistory class

The EmploymentHistoryPrediction class

The Predictor class

The Trainer class

The Program class

Running the application

Creating the logistic regression application

Exploring the project architecture

Diving into the code

The FeatureExtractor class

The FileInput class

The FilePrediction class

The BaseML class

The Predictor class

The Trainer class

The Program class

Running the application

Evaluating a regression model

Loss function

Mean squared error

Mean absolute error

R-squared

Root mean squared error

Summary

Classification Model

Breaking down classification models

Choosing a classification trainer

Creating a binary classification application

Diving into the trainer

Exploring the project architecture

Diving into the code

The CarInventory class

The CarInventoryPrediction class

The Predictor class

The Trainer class

The Program class

Running the application

Creating a multi-class classification application

Diving into the trainer

Exploring the project architecture

Diving into the code

The Email class

The EmailPrediction class

The Predictor class

The Trainer class

Running the application

Evaluating a classification model

Accuracy

Area Under ROC Curve

F1 Score

Area Under Precision-Recall Curve

Micro Accuracy

Macro Accuracy

Log Loss

Log-Loss Reduction

Summary

Clustering Model

Breaking down the k-means algorithm

Use cases for clustering

Diving into the k-means trainer

Creating the clustering application

Exploring the project architecture

Diving into the code

The Constants class

The BaseML class

The FileTypes enumeration

The FileData class

The FileTypePrediction class

The FeatureExtractor class

The Predictor class

The Trainer class

The Program class

Running the application

Evaluating a k-means model

Average distance

The Davies-Bouldin Index

Normalized mutual information

Summary

Anomaly Detection Model

Breaking down anomaly detection

Use cases for anomaly detection

Diving into the randomized PCA trainer

Diving into time series transforms

Creating a time series application

Exploring the project architecture

Diving into the code

The NetworkTrafficHistory class

The NetworkTrafficPrediction class

The Predictor class

The Trainer class

The Program class

Running the application

Creating an anomaly detection application

Exploring the project architecture

Diving into the code

The Constants class

The LoginHistory class

The LoginPrediction class

The Predictor class

The Trainer class

Running the application

Evaluating a randomized PCA model

Area under the ROC curve

Detection rate at false positive count

Summary

Matrix Factorization Model

Breaking down matrix factorizations

Use cases for matrix factorizations

Diving into the matrix factorization trainer

Creating a matrix factorization application

Exploring the project architecture

Diving into the code

The MusicRating class

The MusicPrediction class

The Predictor class

The Trainer class

The Constants class

Running the application

Evaluating a matrix factorization model

Loss function

MSE

MAE

R-squared 

RMSE

Summary

Section 3: Real-World Integrations with ML.NET

Using ML.NET with .NET Core and Forecasting

Breaking down the .NET Core application architecture

.NET Core architecture

.NET Core targets

.NET Core future

Creating the stock price estimator application

Exploring the project architecture

Diving into the code

The ProgramActions enumeration

The CommandLineParser class

The BaseML class

The StockPrediction class

The StockPrices class

The Predictor class

The Trainer class

The ProgramArguments class

The Program class

Running the application

Exploring additional production application enhancements

Logging

Utilizing Reflection further

Utilizing a database

Summary

Using ML.NET with ASP.NET Core

Breaking down ASP.NET Core

Understanding the ASP.NET Core architecture

Controllers

Models

Views

Blazor

Creating the file classification web application

Exploring the project architecture

Diving into the library

The FileClassificationResponseItem class

The FileData class

The FileDataPrediction class

The Converters class

The ExtensionMethods class

The HashingExtensions class

The FileClassificationFeatureExtractor class

The FileClassificationPredictor class

The FileClassificationTrainer class

Diving into the web application

The UploadController class

The Startup class

The Index.razor file

Diving into the trainer application

The ProgramArguments class

The ProgramActions enumeration

The Program class

Running the trainer application

Running the web application

Exploring additional ideas for improvements

Logging

Utilizing a caching layer

Utilizing a database

Summary

Using ML.NET with UWP

Breaking down the UWP architecture

Views

Models

View Models

Creating the web browser classification application

Exploring the project architecture

Diving into the library

The Constants class

The WebPageResponseItem class

The Converters class

The ExtensionMethods class

The WebPageInputItem class

The WebPagePredictionItem class

The WebContentFeatureExtractor class

The WebContentPredictor class

The WebContentTrainer class

Diving into the UWP browser application

The MainPageViewModel class

MainPage.xaml

MainPage.xaml.cs

Diving into the trainer application

The ProgramArguments class

The Program class

Running the trainer application

Running the browser application

Additional ideas for improvements

Single-download optimization

Logging

Utilizing a database

Summary

Section 4: Extending ML.NET

Training and Building Production Models

Investigating feature engineering

PNG image files with embedded executables

Creating a PNG parser

Obtaining training and testing datasets

Creating your model-building pipeline

Discussing attributes to consider in a pipeline platform

Exploring machine learning platforms

Azure Machine Learning

Apache Airflow

Apache Spark

Summary

Using TensorFlow with ML.NET

Breaking down Google's Inception model

Creating the WPF image classification application

Exploring the project architecture

Diving into the WPF image classification application

The MainWindowViewModel class

The MainWindow.xaml class

The MainWindow.xaml.cs file

The BaseML class

The ImageDataInputItem class

The ImageDataPredictionItem class

The ImageClassificationPredictor class

Running the image classification application

Additional ideas for improvements

Self-training based on the end user's input

Logging

Utilizing a database

Summary

Using ONNX with ML.NET

Breaking down ONNX and YOLO

Introducing ONNX

The YOLO ONNX model

Creating the ONNX object detection application

Exploring the project architecture

Diving into the code

The DimensionsBase class

The YoloBoundingBox class

The MainWindow.xaml file

The ImageClassificationPredictor class

The MainWindowViewModel class

Running the application

Exploring additional production application enhancements

Logging

Image scaling

Utilizing the full YOLO model

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Machine learning (ML) is widely used in many industries, such as science, healthcare, and research and its popularity is only growing. In March 2018, Microsoft introduced ML.NET to help .NET enthusiasts to work with ML. With this book, you'll explore how to build ML.NET applications with the various ML models available using C# code.

The book starts by giving you an overview of ML and the types of ML algorithms used, along with covering what ML.NET is and why you need it to build ML apps. You’ll then explore the ML.NET framework, its components, and APIs. The book will serve as a practical guide to helping you build smart apps using the ML.NET library. You’ll gradually become well-versed in how to implement ML algorithms such as regression, classification, and clustering with real-world examples and datasets. Each chapter will cover the practical implementation, showing you how to implement ML within .NET applications. You'll also learn how to integrate TensorFlow into ML.NET applications. Later, you'll discover how to store the regression model housing price prediction results in the database and display the real-time predicted results from the database on your web application using ASP.NET Core Blazor and SignalR.

By the end of this book, you'll have learned how to confidently perform basic to advanced-level machine learning tasks in ML.NET.

Who this book is for

If you are a .NET developer who wants to implement machine learning models using ML.NET, then this book is for you. This book will also be beneficial to data scientists and machine learning developers who are looking for effective tools to implement various machine learning algorithms. A basic understanding of C# and .NET is mandatory to grasp the concepts covered in this book effectively.

What this book covers

Chapter 1, Getting Started with Machine Learning and ML.NET, talks about what machine learning is and how important machine learning is in our society today. It also introduces ML.NET and talks in more detail about getting started with it after learning about the concepts of machine learning and how they relate. 

Chapter 2, Setting Up the ML.NET Environment, talks in more detail about getting started with ML.NET, continuing the overview of machine learning and how ML.NET can assist in both developing and running models in both new and existing applications. You will ensure your development environment is set up and the chapter ends with a simple pre-trained model in a console application to demonstrate that you are ready to proceed with the training.

Chapter 3, Regression Model, talks about using a regression and logistic regression model in ML.NET in addition to the math and what problems these models can help to solve. In addition, the chapter provides a step-by-step explanation of how to create and work with both a regression model and a logistic regression model in ML.NET. The end of the chapter details a quick console application using the dataset and both the models in ML.NET. 

Chapter 4, Classification Model, talks about using the classifications trainer models in ML.NET and what problems a classification model can help to solve. For this chapter, we will create two applications to demonstrate the classification trainer support in ML.NET.  The first predicts whether a car is of good value based on the several attributes and comparative prices using the FastTree trainer that ML.NET provides. The second application takes email data (Subject, Body, Sender) with the SDCA trainer in ML.NET to classify the email as an Order, Spam or Friend. Through these applications, you will also learn how to evaluate classification models.

Chapter 5, Clustering Model, talks about using the k-means clustering trainer in ML.NET in addition to what problems a clustering model can help to solve.  In this chapter, we will use the k-means cluster trainer that ML.NET provides in order to create an example application that will classify files as either executables, documents, or scripts.  In addition, you will learn how to evaluate clustering models in ML.NET.

Chapter 6, Anomaly Detection Model, talks about using an anomaly detection model in ML.NET in addition to what problems an anomaly detection model can help to solve. For this chapter, we will create two example applications. The first uses ML.NET with SSA to detect Network Traffic anomalies, while the second example uses ML.NET with PCA to detect anomalies in a series of user logins. With these applications, we will also look at how you can evaluate your anomaly detection model once trained.

Chapter 7, Matrix Factorization Model, talks about using a matrix factorization model in ML.NET in addition to the math and what problems a matrix factorization model can help to solve.  In this chapter, we will create a music recommendation application using the matrix factorization trainer that ML.NET provides. Using several data points this recommendation engine will recommend music based on the training data provided to the model. In addition, after creating this application we will learn how to evaluate a matrix factorization model in ML.NET.

Chapter 8, Using ML.NET with .NET Core and Forecasting, covers a real-world application utilizing .NET Core and utilizes both a regression and time series model to demonstrate forecasting on stock shares.

Chapter 9, Using ML.NET with ASP.NET Core, covers a real-world application utilizing ASP.NET with a frontend to upload a file to determine whether it is malicious or not. This chapter focuses on using a binary classifier and how to integrate it into an ASP.NET application. 

Chapter 10, Using ML.NET with UWP, covers a real-world application utilizing UWP and ML.NET. The application will utilize ML.NET to classify whether the web page content is malicious. The chapter will also cover UWP application design and MVVM briefly to give a true production-ready sample app to build on or adapt to other applications for using UWP with ML.NET.

Chapter 11, Training and Building Production Models, covers training a model at scale with all of the considerations, along with the proper training of a production model using the DMTP project. The lessons learned include obtaining proper training sets (diversity being key), proper features, and the true evaluation of your model. The focus of this chapter is on tips, tricks, and best practices for training production-ready models.

Chapter 12, Using TensorFlow with ML.NET, talks about using a pre-trained TensorFlow model with ML.NET to determine whether a car is in a picture or not with a UWP application.

Chapter 13, Using ONNX with ML.NET, talks about using a pre-trained ONNX model with ML.NET in addition to the value added by taking a pre-existing ONNX format model into ML.NET directly.

To get the most out of this book

You will need a version of Angular installed on your computer—the latest version, if possible. All code examples have been tested using Angular 9 on Windows OS. However, they should work with future version releases too.

Software/Hardware covered in the book

OS Requirements

Microsoft Visual Studio 2019

A common Windows 10 development environment with 20-50 GB of free space (a quad core processor and 8 GB of RAM is highly recommended)

 

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copy/pasting of code.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

Support

tab.

Click on

Code Downloads

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Machine-Learning-with-ML.NET. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789801781_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1: Fundamentals of Machine Learning and ML.NET

This section gives an overview of this book's audience and a short introduction to machine learning and the importance of learning how to utilize machine learning. In addition, this section introduces the reader to ML.NET.  It also talks about the tools and framework needed to build the applications and gives a step-by-step explanation of how to work with ML.NET. 

This section comprises the following chapters:

Chapter 1

,

Getting Started with Machine Learning and ML.NET

Chapter 2

Setting Up the ML.NET Environment

Getting Started with Machine Learning and ML.NET

By opening this book, you are taking the first step in disrupting your own knowledge by approaching solutions to complex problems with machine learning. You will be achieving this with the use of Microsoft's ML.NET framework. Having spent several years applying machine learning to cybersecurity, I'm confident that the knowledge you garner from this book will not only open career opportunities to you but also open up your thought processes and change the way you approach problems. No longer will you even approach a complex problem without thinking about how machine learning could possibly solve it.

Over the course of this book, you will learn about the following:

How and when to use five different algorithms that ML.NET provides

Real-world end-to-end examples demonstrating ML.NET algorithms

Best practices when training your models, building your training sets, and feature engineering

Using pre-trained models in both TensorFlow and ONNX formats

This book does assume that you have a reasonably solid understanding of C#. If you have other experience with a strongly typed object-oriented programming language such as C++ or Java, the syntax and design patterns are similar enough to not hinder your ability to follow the book. However, if this is your first deep dive into a strongly typed language such as C#, I strongly suggest picking up Learn C# in 7 Days, by Gaurav Aroraa, published by Packt Publishing, to get a quick foundation. In addition, no prior machine learning experience is required or expected, although a cursory understanding will accelerate your learning.

In this chapter, we will cover the following:

The importance of learning about machine learning today

The model-building process

Exploring types of learning

Exploring various machine learning algorithms

Introduction to ML.NET

By the end of the chapter, you should have a fundamental understanding of what it takes to build a model from start to finish, providing the basis for the remainder of the book.

The importance of learning about machine learning today

In recent years, machine learning and artificial intelligence have become an integral part of many of our lives in use cases as diverse as finding cancer cells in an MRI and facial and object recognition during a professional basketball game. Over the course of just the four years between 2013 and 2017, machine learning patents alone grew 34%, while spending is estimated to grow to $57.6B by 2021 (https://www.forbes.com/sites/louiscolumbus/2018/02/18/roundup-of-machine-learning-forecasts-and-market-estimates-2018/#794d6f6c2225). 

Despite its status as a growing technology, the term machine learning was coined back in 1959 by Arthur Samuel—so what caused the 60-year gap before its adoption? Perhaps the two most significant factors were the availability of technology able to process model predictions fast enough, and the amount of data being captured every minute digitally. According to DOMO Inc, a study in 2017 concluded that 2.5 quintillion bytes were generated daily and that at that time, 90% of the world's data was created between 2015 and 2017 (https://www.domo.com/learn/data-never-sleeps-5?aid=ogsm072517_1&sf100871281=1). By 2025, it is estimated that 463 exabytes of data are going to be created daily (https://www.visualcapitalist.com/how-much-data-is-generated-each-day/), much of which will come from cars, videos, pictures, IoT devices, emails, and even devices that have not made the transition to the smart movement yet. 

The amount that data has grown in the last decade has led to questions about how a business or corporation can use such data for better sales forecasting, anticipating a customer's needs, or detecting malicious bytes in a file. Traditional statistical approaches could potentially require exponentially more staff to keep up with current demands, let alone scale with the data captured. Take, for instance, Google Maps. With Google's acquisition of Waze in 2013, users of Google Maps have been provided with extremely accurate routing suggestions based on the anonymized GPS data of its users. With this model, the more data points (in this case GPS data from smartphones), the better predictions Google can make for your travel. As we will discuss later in this chapter, quality datasets are a critical component of machine learning, especially in the case of Google Maps, where, without a proper dataset, the user experience would be subpar.

In addition, the speed of computer hardware, specifically specialized hardware tailored for machine learning, has also played a role. The use of Application-Specific Integrated Circuits (ASICs) has grown exponentially. One of the most popular ASICs on the market is the Google Tensor Processing Unit (TPU). Originally released in 2016, it has since gone through two iterations and provides cloud-based acceleration for machine learning tasks on Google Cloud Platform. Other cloud platforms, such as Amazon's AWS and Microsoft's Azure, also provide FPGAs.

Additionally, Graphics Processing Units (GPUs) from both AMD and NVIDIA are accelerating both cloud-based and local workloads, with ROCm Platform and CUDA-accelerated libraries respectively. In addition to accelerated workloads, typical professional GPUs offered by AMD and NVIDIA provide a much higher density of processors than the traditional CPU-only approach. For instance, the AMD Radeon Instinct MI60 provides 4,096 stream processors. While not a full-fledged x86 core, it is not a one-to-one comparison, and the peak performance of double-precision floating-point tasks is rated at 7.373 TFLOPs compared to the 2.3 TFLOPs in AMD's extremely powerful EPYC 7742 server CPU. From a cost and scalability perspective, utilizing GPUs in even a workstation configuration would provide an exponential reduction in training time if the algorithms were accelerated to take advantage of the more specialized cores offered by AMD and NVIDIA. Fortunately, ML.NET provides GPU acceleration with little additional effort.

From a software engineering career perspective, with this growth and demand far outpacing the supply, there has never been a better time to develop machine learning skills as a software engineer. Furthermore, software engineers also possess skills that traditional data scientists do not have – for instance, being able to automate tasks such as the model building process rather than relying on manual scripts. Another example of where a software engineer can provide more value is by adding both unit tests and efficacy tests as part of the full pipeline when training a model. In a large production application, having these automated tests is critical to avoid production issues.

Finally, in 2018, for the first time ever, data was considered more valuable than oil. As industries continue to adopt the use of data gathering and existing industries take advantage of the data they have, machine learning will be intertwined with the data. Machine learning to data is what refining plants are to oil.

The model building process

Before diving into ML.NET, an understanding of core machine learning concepts is required. These concepts will help create a foundation for you to build on as we start building models and learning the various algorithms ML.NET provides over the course of this book. At a high level, producing a model is a complex process; however, it can be broken down into six main steps:

Over the next few sections, we will go through each of these steps in detail to provide you with a clear understanding of how to perform each step and how each step relates to the overall machine learning process as a whole.

Defining your problem statement