E-Book
39,59 €

Machine Learning With Go E-Book

Daniel Whitenack

0,0

39,59 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Infuse an extra layer of intelligence into your Go applications with machine learning and AI

Key Features:

Build simple, maintainable, and easy to deploy machine learning applications with popular Go packagesLearn the statistics, algorithms, and techniques to implement machine learningOvercome the common challenges faced while deploying and scaling the machine learning workflows

Book Description:

This updated edition of the popular Machine Learning With Go shows you how to overcome the common challenges of integrating analysis and machine learning code within an existing engineering organization.

Machine Learning With Go, Second Edition, will begin by helping you gain an understanding of how to gather, organize, and parse real-world data from a variety of sources. The book also provides absolute coverage in developing groundbreaking machine learning pipelines including predictive models, data visualizations, and statistical techniques. Up next, you will learn the thorough utilization of Golang libraries including golearn, gorgonia, gosl, hector, and mat64. You will discover the various TensorFlow capabilities, along with building simple neural networks and integrating them into machine learning models. You will also gain hands-on experience implementing essential machine learning techniques such as regression, classification, and clustering with the relevant Go packages. Furthermore, you will deep dive into the various Go tools that help you build deep neural networks. Lastly, you will become well versed with best practices for machine learning model tuning and optimization.

By the end of the book, you will have a solid machine learning mindset and a powerful Go toolkit of techniques, packages, and example implementations

What you will learnBecome well versed with data processing, parsing, and cleaning using Go packagesLearn to gather data from various sources and in various real-world formatsPerform regression, classification, and image processing with neural networksEvaluate and detect anomalies in a time series modelUnderstand common deep learning architectures to learn how each model is builtLearn how to optimize, build, and scale machine learning workflowsDiscover the best practices for machine learning model tuning for successful deployments

Who this book is for:

This book is primarily for Go programmers who want to become a machine learning engineer and to build a solid machine learning mindset along with a good hold on Go packages. This is also useful for data analysts, data engineers, machine learning users who want to run their machine learning experiments using the Go ecosystem. Prior understanding of linear algebra is required to benefit from this book

Daniel Whitenack is a trained PhD data scientist with over 10 years' experience working on data-intensive applications in industry and academia. Recently, Daniel has focused his development efforts on open source projects related to running machine learning (ML) and artificial intelligence (AI) in cloud-native infrastructure (Kubernetes, for instance), maintaining reproducibility and provenance for complex data pipelines, and implementing ML/AI methods in new languages such as Go. Daniel co-hosts the Practical AI podcast, teaches data science/engineering at Ardan Labs and Purdue University, and has spoken at conferences around the world (including ODSC, PyCon, DataEngConf, QCon, GopherCon, Spark Summit, and Applied ML Days, among others). Janani Selvaraj works as a senior research and analytics consultant for a start-up in Trichy, Tamil Nadu. She is a mathematics graduate with PhD in environmental management. Her current interests include data wrangling and visualization, machine learning, and geospatial modeling. She currently trains students in data science and works as a consultant on several data-driven projects in a variety of domains. She is an R programming expert and founder of the R-Ladies Trichy group, a group that promotes gender diversity. She has served as a reviewer for Go-Machine learning Projects book.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 342

Veröffentlichungsjahr: 2019

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Machine Learning With GoSecond Edition

Leverage Go's powerful packages to build smart machine learning and predictive applications

Daniel Whitenack

Janani Selvaraj

BIRMINGHAM - MUMBAI

Machine Learning With Go Second Edition

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Pravin DhandreAcquisition Editor: Devika BattikeContent Development Editor: Snehal KolteTechnical Editor: Naveen SharmaCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Priyanka DhadkeGraphics: Jisha ChirayilProduction Coordinator: Aparna Bhagat

First published: September 2017 Second edition: April 2019

Production reference: 1300419

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78961-989-8

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

Packt.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the authors

Janani Selvaraj works as a senior research and analytics consultant for a start-up in Trichy, Tamil Nadu. She is a mathematics graduate with PhD in environmental management. Her current interests include data wrangling and visualization, machine learning, and geospatial modeling. She currently trains students in data science and works as a consultant on several data-driven projects in a variety of domains. She is an R programming expert and founder of the R-Ladies Trichy group, a group that promotes gender diversity. She has served as a reviewer for Go Machine Learning Projects book.

About the reviewer

Saurabh Chhajed is a machine learning and big data engineer with 9 years' professional experience in the enterprise application development life cycle using the latest frameworks, tools, and design patterns. He has experience designing and implementing some of the most widely used and scalable customer-facing recommendation systems, with extensive usage of big data ecosystem—batch and real time and machine learning pipeline. He has also worked for some of the largest investment banks, credit card companies, and manufacturing companies around the world, implementing a range of robust and scalable product suites.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Title Page

Machine Learning With Go Second Edition

About Packt

Why subscribe?

Packt.com

Contributors

About the authors

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Analysis in Machine Learning Workflows

Gathering and Organizing Data

Handling data – Gopher style

Best practices for gathering and organizing data with Go

CSV files

Reading in CSV data from a file

Handling unexpected fields

Handling unexpected data types

Manipulating CSV data with data frames

Web scraping 

JSON

Parsing JSON

JSON output

SQL-like databases

Connecting to an SQL database

Querying the database

Modifying the database

Caching

Caching data in memory

Caching data locally on disk

Data versioning

Pachyderm jargon

Deploying or installing Pachyderm

Creating data repositories for data versioning

Putting data into data repositories

Getting data out of versioned data repositories

Summary

References

Matrices, Probability, and Statistics

Matrices and vectors

Vectors

Vector operations

Matrices

Matrix operations

Statistics

Distributions

Statistical measures

Measures of central tendency

Measures of spread or dispersion

Visualizing distributions

Histograms

Box plots

Bivariate analysis 

Probability

Random variables

Probability measures

Independent and conditional probability

Hypothesis testing

Test statistics

Calculating p-values

Summary

References

Evaluating and Validating

Evaluating

Continuous metrics

Categorical metrics

Individual evaluation metrics for categorical variables

Confusion matrices, AUC, and ROC

Validating

Training and test sets

Holdout set

Cross-validation

Summary

References

Section 2: Machine Learning Techniques

Regression

Understanding regression model jargon

Linear regression

Overview of linear regression

Linear regression assumptions and pitfalls

Linear regression example

Profiling the data

Choosing our independent variable

Creating our training and test sets

Training our model

Evaluating the trained model

Multiple linear regression

Nonlinear and other types of regression

Summary

References

Classification

Understanding classification model jargon

Logistic regression

Overview of logistic regression

Logistic regression assumptions and pitfalls

Logistic regression example

Cleaning and profiling data

Creating our training and test sets

Training and testing the logistic regression model

k-nearest neighbors 

Overview of kNN

kNN assumptions and pitfalls

kNN example

Decision trees and random forests

Overview of decision trees and random forests

Decision tree and random forest assumptions and pitfalls

Decision tree example

Random forest example

Naive Bayes

Overview of Naive Bayes and its big assumption

Naive Bayes example

Summary

References

Clustering

Understanding clustering model jargon

Measuring distance or similarity

Evaluating clustering techniques

Internal clustering evaluation

External clustering evaluation

k-means clustering

Overview of k-means clustering

k-means assumptions and pitfalls

k-means clustering example

Profiling the data

Generating clusters with k-means

Evaluating the generated clusters

Other clustering techniques

Summary

References

Time Series and Anomaly Detection

Representing time series data in Go

Understanding time series jargon

Statistics related to time series

Autocorrelation

Partial autocorrelation

Auto-regressive models for forecasting

Auto-regressive model overview

Auto-regressive model assumptions and pitfalls

Auto-regressive model example

Transforming into a stationary series

Analyzing the ACF and choosing an AR order

Fitting and evaluating an AR(2) model

Auto-regressive moving averages and other time series models

Anomaly detection

Summary

References

Section 3: Advanced Machine Learning, Deployment, and Scaling

Neural Networks

Understanding neural net jargon

Building a simple neural network

Nodes in the network

Network architecture

Why do we expect this architecture to work?

Training our neural network

Utilizing the simple neural network

Training the neural network on real data

Evaluating the neural network

Summary

References

Deep Learning

Deep learning techniques and jargon

Deep learning with Go

Using the TensorFlow Go bindings

Install TensorFlow for Go

Retrieving and calling a pretrained TensorFlow model

Object detection using TensorFlow from Go

Using TensorFlow models from GoCV

Installing GoCV

Streaming webcam object detection with GoCV

Summary

References

Deploying and Distributing Analyses and Models

Running models reliably on remote machines

A brief introduction to Docker and Docker jargon

Dockerizing a machine learning application

Dockerizing the model training and export

Dockerizing model predictions

Testing the Docker images locally

Running the Docker images on remote machines

Building a scalable and reproducible machine learning pipeline

Setting up a Pachyderm and a Kubernetes cluster

Building a Pachyderm machine learning pipeline

Creating and filling the input repositories

Creating and running the processing stages

Updating pipelines and examining provenance

Scaling pipeline stages

Summary

References

Algorithms/Techniques Related to Machine Learning

Gradient descent

Entropy, information gain, and related methods

Backpropagation

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

This updated edition of the popular Machine Learning With Go - Second Edition shows the readers how to overcome the common challenges of integrating analysis and machine learning code within an existing engineering organization.

Machine Learning With Go - Second Edition, will begin by helping you gain an understanding of how to gather, organize, and parse real-world data from a variety of sources. The book also provides detailed information on developing groundbreaking machine learning pipelines including predictive models, data visualizations, and statistical techniques. Up next, you will learn about the utilization of Golang libraries including golearn, gorgonia, gosl, hector, and mat64, among others. You will discover the various TensorFlow capabilities, along with building simple neural networks and integrating into machine learning models. You will also gain hands-on experience implementing essential machine learning techniques, such as regression, classification, and clustering, with the relevant Go packages. Furthermore, you will deep dive into the various Go tools that can help us to build deep neural networks. Lastly, you will become well versed with best practices for machine learning model tuning and optimization.

By the end of the book, you will have a solid machine learning mindset and a powerful toolkit of Go techniques and packages, backed up with example implementations.

Who this book is for

This book is primarily for Go programmers who want to become machine learning engineers, and to build a solid machine learning mindset, as well as improve their hold on Go packages. This is also useful for data analysts, data engineers, and machine learning users who want to run their machine learning experiments using the Go ecosystem.

What this book covers

Chapter 1, Gathering and Organizing Data, covers the gathering, organization, and parsing of data from local and remote sources. Once the reader is done with this chapter, they will understand how to interact with data stored in various places and in various formats, how to parse and clean that data, and how to output that cleaned and parsed data.

Chapter 2, Matrices, Probability, and Statistics, also covers statistical measures and operations key to day-to-day data analysis work. Once the reader is done with this chapter, they will understand how to perform solid summary data analysis, describe and visualize distributions, quantify hypotheses, and transform datasets with, for example, dimensionality reductions.

Chapter 3, Evaluation and Validation, covers evaluation and validation, which are key to measuring the performance of machine applications and ensuring that they generalize. Once the reader is done with this chapter, they will understand various metrics to gauge the performance of models (that is, to evaluate the model), as well as various techniques to validate the model more generally.

Chapter 4, Regression, covers regression, a widely used technique to model continuous variables, and a basis for other models. Regression produces models that are immediately interpretable. Thus, it can provide an excellent starting point when introducing predictive capabilities in an organization.

Chapter 5, Classification, covers classification, a machine learning technique distinct from regression in that the target variable is typically categorical or labeled. For example, a classification model may classify emails into spam and not-spam categories, or classify network traffic as fraudulent or not fraudulent.

Chapter 6, Clustering, covers clustering, an unsupervised machine learning technique used to form groupings of samples. At the end of this chapter, readers will be able to automatically form groupings of data points to better understand their structure.

Chapter 7, Time Series and Anomaly Detection, introduces techniques utilized to model time series data, such as stock prices and user events. After reading the chapter, the reader will understand how to evaluate various terms in a time series, build up a model of the time series, and detect anomalies in a time series.

Chapter 8, Neural Networks, introduces techniques utilized to perform regression, classification, and image processing with neural networks. After reading the chapter, the reader will understand how and when to apply these more complicated modeling techniques.

Chapter 9, Deep Learning, introduces deep learning techniques, along with the motivation behind them. After reading the chapter, the reader will understand how and when to apply these more complicated modeling techniques, and will understand the Go tooling available for building deep neural networks.

Chapter 10, Deploying and Distributing Analyses and Models, empowers readers to deploy the models, developed throughout the book, to production environments and distribute processing over production-scale data. The chapter will illustrate how both of these things can be done easily, without significant modifications to the code utilized throughout the book.

Appendix, Algorithms/Techniques Related to Machine Learning, can be referenced throughout the book and provides information about algorithms, optimizations, and techniques that are relevant to machine learning workflows.

To get the most out of this book

Prior understanding of linear algebra is required to fully benefit from this book.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

www.packt.com

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

Enter the name of the book in the

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-With-Go-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789619898_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1: Analysis in Machine Learning Workflows

In this section, we will get a solid understanding of how to parse and organize data within a Go program, with an emphasis on handling that data in a machine learning workflow.

This section will contain the following chapters:

Chapter 1

Gathering and Organizing Data

Chapter 2

Matrices, Probability, and Statistics

Chapter 3

Evaluating and Validating

Gathering and Organizing Data

Machine learning in general involves a series of steps, out of which the process of gathering and cleaning data consumes a lot of time. Polls have shown that 90% or more of a data scientist's time is spent gathering data, organizing it, and cleaning it—not training or tuning their sophisticated machine learning models. Why is this? Isn't the machine learning part the fun part? Why do we need to care so much about the state of our data?

Not all types of data are appropriate when using certain types of models. For example, certain models do not perform well when we have high-dimensional data (for example, text data), and other models assume that variables are normally distributed, which is definitely not always the case. Thus, we must take care to gather data that fits our use case and make sure that we understand how our data and models will interact.

Another reason why gathering and organizing data consumes so much of a data scientist's time is that data is often messy and hard to aggregate. In most organizations, data might be housed in various systems and formats, and have various access control policies.

To form a training or test set or to supply variables to a model for predictions, we will likely need to deal with various formats of data, such as CSV, JSON, and database tables, and we will likely need to transform individual values. Common transformations include missing value analysis, parsing date times, converting categorical data to numerical data, normalizing values, and applying some function across values. Scraping the data from the web has emerged as an important data source and a lot of data-driven institutions are relying on it for adding it to their repositories.

Even though much of this book will be focused on various modeling techniques, you should always consider data gathering, parsing, and organization as a – or maybe the – key component of a successful data science project. If this part of your project is not carefully developed with a high level of integrity, you are setting yourself up for trouble in the long run.

From this chapter, readers will be able to learn the different data handling techniques using Golang with guided code covering the following topics:

Handling varied data forms—CSV, JSON, and SQL databases

Web scraping

Caching

Data versioning

Best practices for gathering and organizing data with Go

As you can see in the preceding section, Go provides us with an opportunity to maintain high levels of integrity in our data gathering, parsing, and organization. We want to ensure that we leverage Go's unique properties whenever we are preparing our data for machine learning workflows.

Generally, Go data scientists/analysts should follow the following best practices when gathering and organizing data. These best practices are meant to help you maintain integrity in your applications and enable you to reproduce any analysis:

Check for and enforce expected types

: This might seem obvious, but it is too often overlooked when using dynamically typed languages. Although it is slightly verbose, explicitly parsing data into expected types and handling related errors can save you big headaches down the road.

Standardize and simplify your data ingress/egress

: There are many third-party packages for handling certain types of data or interactions with certain sources of data (some of which we will cover in this book). However, if you standardize the ways you are interacting with data sources, particularly centered around the use of

stdlib

, you can develop predictable patterns and maintain consistency within your team. A good example of this is a choice to utilize

database/sql

for database interactions rather than using various third-party

application program interfaces

(

APIs

) and

digital subscriber lines

(

DSLs

Version your data

: Machine learning models produce extremely different results depending on the training data you use, your choice of parameters, and input data. Thus, it is impossible to reproduce results without versioning both your code and data. We will discuss the appropriate techniques in the

Data versioning

section of this chapter.

If you start to stray from these general principles, you should stop immediately. You are likely to sacrifice integrity for the sake of convenience, which is a dangerous road. We will let these principles guide us through the book and as we consider various data formats/sources in the following section.

CSV files

CSV files might not be a go-to format for big data, but as a data scientist or developer working in machine learning, you are sure to encounter this format. You might need a mapping of zip codes to latitude/longitude and find this as a CSV file on the internet, or you may be given sales figures from your sales team in a CSV format. In any event, we need to understand how to parse these files.

The main package that we will utilize in parsing CSV files is encoding/csv from Go's standard library. However, we will also discuss a couple of packages that allow us to quickly manipulate or transform CSV data, github.com/go-gota/gota/dataframe and go-hep.org/x/hep/csvutil.

Web scraping

Web scrapingis a handy tool to have in a data scientist's skill set. It can be useful in a variety of situations to gather data, such as when a website does not provide an API, or you need to parse and extract web content programmatically, such as scraping Wikipedia tables. The following packages are used to scrape data from the web:

The

github.com/PuerkitoBio/goquery

package: A jQ

uery-like tool

The

http

package: To sc

rape information from an HTML web page on the internet

The

github.com/anaskhan96/soup

package: A Go package similar to the

BeautifulSoup

Python package

The following code snippet shows an example to scrape an xkcd comics image and its underlying text using the soup package:

fmt.Println("Enter the xkcd comic number :") var num int fmt.Scanf("%d", &num) url := fmt.Sprintf("https://xkcd.com/%d", num) resp, _ := soup.Get(url) doc := soup.HTMLParse(resp) title := doc.Find("div", "id", "ctitle").Text() fmt.Println("Title of the comic :", title) comicImg := doc.Find("div", "id", "comic").Find("img") fmt.Println("Source of the image :", comicImg.Attrs()["src"]) fmt.Println("Underlying text of the image :", comicImg.Attrs()["title"])

Scraping multiple Wikipedia tables from a single web page could be useful in the absence of authenticated information regarding certain topics. The following chunk of codes explains the various steps to scrape list of movies from a particular timeframe using the scrappy package.

The following code snippet shows the function that takes the desired URL as an input parameter and gives a table as an output:

func scrape(url string, selector string, ch chan []string) { scraper := scraper.NewScraper(url) selection := scraper.Find(selector) ch <- selection}

The next set of code creates a character string, and the scrape function is used to obtain the desired output:

years := []string{"2009", "2010", "2011", "2012", "2013"} channels := []chan []string{ make(chan []string), make(chan []string), make(chan []string), make(chan []string), make(chan []string), }for idx, year := range years { ch := channels[idx] go scrape("http://en.wikipedia.org/wiki/List_of_Bollywood_films_of_"+year, "table.wikitable i a", ch) }for i := 0; i < 5; i++ { select { case movies2009 := <-channels[0]: printMovies(movies2009) case movies2010 := <-channels[1]: printMovies(movies2010) case movies2011 := <-channels[2]: printMovies(movies2011) case movies2012 := <-channels[3]: printMovies(movies2012) case movies2013 := <-channels[4]: printMovies(movies2013) } }}

From this section, readers will get an idea of how to scrape data from the web by making use of simple functions. Further, scraping more sophisticated data in accordance with an individual's needs can be explored.

JSON

In a world in which the majority of data is accessed via the web, and most engineering organizations implement some number of microservices, we are going to encounter data in JSON format fairly frequently. We may only need to deal with it when pulling some random data from an API, or it might actually be the primary data format that drives our analytics and machine learning workflows.