Machine Learning Fundamentals - Hyatt Saleh - E-Book

Machine Learning Fundamentals E-Book

Hyatt Saleh

0,0
27,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains you how to use the syntax of scikit-learn. You'll study the difference between supervised and unsupervised models, as well as the importance of choosing the appropriate algorithm for each dataset. You'll apply unsupervised clustering algorithms over real-world datasets, to discover patterns and profiles, and explore the process to solve an unsupervised machine learning problem.

The focus of the book then shifts to supervised learning algorithms. You'll learn to implement different supervised algorithms and develop neural network structures using the scikit-learn package. You'll also learn how to perform coherent result analysis to improve the performance of the algorithm by tuning hyperparameters.

By the end of this book, you will have gain all the skills required to start programming machine learning algorithms.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 254

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Machine Learning Fundamentals

Use Python and scikit-learn to get up and running with the hottest developments in machine learning

Hyatt Saleh

Machine Learning Fundamentals

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Author: Hyatt Saleh

Managing Editor: Neha Nair

Acquisitions Editor: Aditya Date

Production Editor: Samita Warang

Editorial Board: David Barnes, Ewan Buckingham, Simon Cox, Manasa Kumar, Alex Mazonowicz, Douglas Paterson, Dominic Pereira, Shiny Poojary, Saman Siddiqui, Erol Staveley, Ankita Thakur, and Mohita Vyas

First Published: November 2018

Production Reference: 1291118

ISBN: 978-1-78980-355-6

Table of Contents

Preface   i

Introduction to Scikit-Learn   1

Introduction   2

Scikit-Learn   2

Advantages of Scikit-Learn   4

Disadvantages of Scikit-Learn   4

Data Representation   5

Tables of Data   5

Features and Target Matrices   7

Exercise 1: Loading a Sample Dataset and Creating the Features and Target Matrices   7

Activity 1: Selecting a Target Feature and Creating a Target Matrix   10

Data Preprocessing   12

Messy Data   12

Exercise 2: Dealing with Messy Data   17

Dealing with Categorical Features   22

Exercise 3: Applying Feature Engineering over Text Data   23

Rescaling Data   25

Exercise 4: Normalizing and Standardizing Data   26

Activity 2: Preprocessing an Entire Dataset   28

Scikit-Learn API   29

How Does It Work?   30

Supervised and Unsupervised Learning   33

Supervised Learning   33

Unsupervised Learning   35

Summary   37

Unsupervised Learning: Real-Life Applications   39

Introduction   40

Clustering   40

Clustering Types   40

Applications of Clustering   41

Exploring a Dataset: Wholesale Customers Dataset   42

Understanding the Dataset   43

Data Visualization   45

Loading the Dataset Using Pandas   45

Visualization Tools   46

Exercise 5: Plotting a Histogram of One Feature from the Noisy Circles Dataset   48

Activity 3: Using Data Visualization to Aid the Preprocessing Process   51

k-means Algorithm   52

Understanding the Algorithm   52

Exercise 6: Importing and Training the k-means Algorithm over a Dataset   55

Activity 4: Applying the k-means Algorithm to a Dataset   59

Mean-Shift Algorithm   59

Understanding the Algorithm   60

Exercise 7: Importing and Training the Mean-Shift Algorithm over a Dataset   61

Activity 5: Applying the Mean-Shift Algorithm to a Dataset   63

DBSCAN Algorithm   64

Understanding the Algorithm   64

Exercise 8: Importing and Training the DBSCAN Algorithm over a Dataset   65

Activity 6: Applying the DBSCAN Algorithm to the Dataset   67

Evaluating the Performance of Clusters   67

Available Metrics in Scikit-Learn   68

Exercise 9: Evaluating the Silhouette Coefficient Score and Calinski–Harabasz Index    69

Activity 7: Measuring and Comparing the Performance of the Algorithms   70

Summary   71

Supervised Learning: Key Steps   73

Introduction   74

Model Validation and Testing   74

Data Partition   74

Split Ratio   76

Exercise 10: Performing Data Partition over a Sample Dataset   78

Cross Validation   81

Exercise 11: Using Cross-Validation to Partition the Train Set into a Training and a Validation Set   82

Activity 8: Data Partition over a Handwritten Digit Dataset   84

Evaluation Metrics   84

Evaluation Metrics for Classification Tasks   84

Exercise 12: Calculating Different Evaluation Metrics over a Classification Task   88

Choosing an Evaluation Metric   90

Evaluation Metrics for Regression Tasks   90

Exercise 13: Calculating Evaluation Metrics over a Regression Task   92

Activity 9: Evaluating the Performance of the Model Trained over a Handwritten Dataset   93

Error Analysis   94

Bias, Variance, and Data Mismatch   95

Exercise 14: Calculating the Error Rate over Different Sets of Data    98

Activity 10: Performing Error Analysis over a Model Trained to Recognize Handwritten Digits   101

Summary   102

Supervised Learning Algorithms: Predict Annual Income   105

Introduction   106

Exploring the Dataset   106

Understanding the Dataset   107

Naïve Bayes Algorithm   111

How Does It Work?   111

Exercise 15: Applying the Naïve Bayes Algorithm    114

Activity 11: Training a Naïve Bayes Model for Our Census Income Dataset   116

Decision Tree Algorithm   117

How Does It Work?   117

Exercise 16: Applying the Decision Tree Algorithm    119

Activity 12: Training a Decision Tree Model for Our Census Income Dataset   120

Support Vector Machine Algorithm   120

How Does It Work?   120

Exercise 17: Applying the SVM Algorithm    124

Activity 13: Training an SVM Model for Our Census Income Dataset   125

Error Analysis   126

Accuracy, Precision, and Recall   126

Summary   129

Artificial Neural Networks: Predict Annual Income   131

Introduction   132

Artificial Neural Networks   132

How Do They Work?   133

Understanding the Hyperparameters   139

Applications   142

Limitations   142

Applying an Artificial Neural Network   143

Scikit-Learn's Multilayer Perceptron   143

Exercise 18: Applying the Multilayer Perceptron Classifier Class    144

Activity 14: Training a Multilayer Perceptron for Our Census Income Dataset   145

Performance Analysis   147

Error Analysis   147

Hyperparameter Fine-Tuning   148

Model Comparison   151

Activity 15: Comparing Different Models to Choose the Best Fit for the Census Income Data Problem   152

Summary   153

Building Your Own Program   155

Introduction   156

Program Definition   156

Building a Program: Key Stages   156

Understanding the Dataset   159

Activity 16: Performing the Preparation and Creation Stages for the Bank Marketing Dataset   163

Saving and Loading a Trained Model   165

Saving a Model   165

Exercise 19: Saving a Trained Model    166

Loading a Model   167

Exercise 20: Loading a Saved Model    167

Activity 17: Saving and Loading the Final Model for the Bank Marketing Dataset   168

Interacting with a Trained Model   170

Exercise 21: Creating a Class and a Channel to Interact with a Trained Model    171

Activity 18: Allowing Interaction with the Bank Marketing Dataset Model   173

Summary   174

Appendix   177

>

Preface

About

This section briefly introduces the author, the coverage of this book, the technical skills you'll need to get started, and the hardware and software required to complete all of the included activities and exercises.

1

Introduction to Scikit-Learn

Learning Objectives

By the end of this chapter, you will be able to:

Describe scikit-learn and its main advantagesUse the scikit-learn APIPerform data preprocessingExplain the difference between supervised and unsupervised models, as well as the importance of choosing the right algorithm for each dataset

This chapter gives an explanation of the scikit-learn syntax and features in order to be able to process and visualize data