E-Book
32,36 €

The Supervised Learning Workshop E-Book

Blaine Bateman

0,0

32,36 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Cut through the noise and get real results with a step-by-step approach to understanding supervised learning algorithms

Key Features

Ideal for those getting started with machine learning for the first time

A step-by-step machine learning tutorial with exercises and activities that help build key skills

Structured to let you progress at your own pace, on your own terms

Use your physical print copy to redeem free access to the online interactive edition

Book Description

You already know you want to understand supervised learning, and a smarter way to do that is to learn by doing. The Supervised Learning Workshop focuses on building up your practical skills so that you can deploy and build solutions that leverage key supervised learning algorithms. You'll learn from real examples that lead to real results.

Throughout The Supervised Learning Workshop, you'll take an engaging step-by-step approach to understand supervised learning. You won't have to sit through any unnecessary theory. If you're short on time you can jump into a single exercise each day or spend an entire weekend learning how to predict future values with auto regressors. It's your choice. Learning on your terms, you'll build up and reinforce key skills in a way that feels rewarding.

Every physical print copy of The Supervised Learning Workshop unlocks access to the interactive edition. With videos detailing all exercises and activities, you'll always have a guided solution. You can also benchmark yourself against assessments, track progress, and receive content updates. You'll even earn a secure credential that you can share and verify online upon completion. It's a premium learning experience that's included with your printed copy. To redeem, follow the instructions located at the start of your book.

Fast-paced and direct, The Supervised Learning Workshop is the ideal companion for those with some Python background who are getting started with machine learning. You'll learn how to apply key algorithms like a data scientist, learning along the way. This process means that you'll find that your new skills stick, embedded as best practice. A solid foundation for the years ahead.

What you will learn

Get to grips with the fundamental of supervised learning algorithms

Discover how to use Python libraries for supervised learning

Learn how to load a dataset in pandas for testing

Use different types of plots to visually represent the data

Distinguish between regression and classification problems

Learn how to perform classification using K-NN and decision trees

Who this book is for

Our goal at Packt is to help you be successful, in whatever it is you choose to do. The Supervised Learning Workshop is ideal for those with a Python background, who are just starting out with machine learning. Pick up a Workshop today, and let Packt help you develop skills that stick with you for life.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 411

Veröffentlichungsjahr: 2020

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

The Supervised Learning Workshop

Second Edition

A New, Interactive Approach to Understanding Supervised Learning Algorithms

Blaine Bateman, Ashish Ranjan Jha, Benjamin Johnston, and Ishita Mathur

The Supervised Learning Workshop

Second Edition

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Authors: Blaine Bateman, Ashish Ranjan Jha, Benjamin Johnston, and Ishita Mathur

Reviewers: Tiffany Ford, Sukanya Mandal, Ashish Pratik Patil, and Ratan Singh

Managing Editor: Snehal Tambe

Acquisitions Editor: Anindya Sil

Production Editor: Samita Warang

Editorial Board: Shubhopriya Banerjee, Bharat Botle, Ewan Buckingham, Megan Carlisle, Mahesh Dhyani, Manasa Kumar, Alex Mazonowicz, Bridget Neale, Dominic Pereira, Shiny Poojary, Abhishek Rane, Brendan Rodrigues, Mugdha Sawarkar, Erol Staveley, Ankita Thakur, Nitesh Thakur, and Jonathan Wray

First published: April 2019

Second edition: February 2020

Production reference: 1280220

ISBN 978-1-80020-904-6

Published by Packt Publishing Ltd.

Livery Place, 35 Livery Street

Birmingham B3 2PB, UK

Preface i

1. Fundamentals of Supervised Learning Algorithms 1

Introduction 2

When to Use Supervised Learning 3

Python Packages and Modules 4

Loading Data in Pandas 5

Exercise 1.01: Loading and Summarizing the Titanic Dataset 7

Exercise 1.02: Indexing and Selecting Data 10

Exercise 1.03: Advanced Indexing and Selection 14

Pandas Methods 17

Exercise 1.04: Splitting, Applying, and Combining Data Sources 21

Quantiles 23

Lambda Functions 24

Exercise 1.05: Creating Lambda Functions 25

Data Quality Considerations 28

Managing Missing Data 28

Class Imbalance 33

Low Sample Size 34

Activity 1.01: Implementing Pandas Functions 35

Summary 37

2. Exploratory Data Analysis and Visualization 39

Introduction 40

Exploratory Data Analysis (EDA) 40

Summary Statistics and Central Values 42

Exercise 2.01: Summarizing the Statistics of Our Dataset 43

Missing Values 48

Finding Missing Values 49

Exercise 2.02: Visualizing Missing Values 50

Imputation Strategies for Missing Values 54

Exercise 2.03: Performing Imputation Using Pandas 55

Exercise 2.04: Performing Imputation Using Scikit-Learn 56

Exercise 2.05: Performing Imputation Using Inferred Values 58

Activity 2.01: Summary Statistics and Missing Values 61

Distribution of Values 65

Target Variable 65

Exercise 2.06: Plotting a Bar Chart 65

Categorical Data 67

Exercise 2.07: Identifying Data Types for Categorical Variables 68

Exercise 2.08: Calculating Category Value Counts 70

Exercise 2.09: Plotting a Pie Chart 71

Continuous Data 73

Skewness 75

Kurtosis 75

Exercise 2.10: Plotting a Histogram 75

Exercise 2.11: Computing Skew and Kurtosis 77

Activity 2.02: Visually Representing the Distribution of Values 79

Relationships within the Data 84

Relationship between Two Continuous Variables 84

Pearson's Coefficient of Correlation 85

Exercise 2.12: Plotting a Scatter Plot 85

Exercise 2.13: Plotting a Correlation Heatmap 87

Using Pairplots 90

Exercise 2.14: Implementing a Pairplot 90

Relationship between a Continuous and a Categorical Variable 92

Exercise 2.15: Plotting a Bar Chart 92

Exercise 2.16: Visualizing a Box Plot 95

Relationship Between Two Categorical Variables 97

Exercise 2.17: Plotting a Stacked Bar Chart 97

Activity 2.03: Relationships within the Data 99

Summary 105

3. Linear Regression 107

Introduction 108

Regression and Classification Problems 108

The Machine Learning Workflow 109

Business Understanding 110

Data Understanding 110

Data Preparation 111

Modeling 111

Evaluation 112

Deployment 112

Exercise 3.01: Plotting Data with a Moving Average 112

Activity 3.01: Plotting Data with a Moving Average 123

Linear Regression 125

Least Squares Method 126

The Scikit-Learn Model API 126

Exercise 3.02: Fitting a Linear Model Using the Least Squares Method 127

Activity 3.02: Linear Regression Using the Least Squares Method 132

Linear Regression with Categorical Variables 137

Exercise 3.03: Introducing Dummy Variables 139

Activity 3.03: Dummy Variables 150

Polynomial Models with Linear Regression 152

Exercise 3.04: Polynomial Models with Linear Regression 154

Activity 3.04: Feature Engineering with Linear Regression 159

Generic Model Training 163

Gradient Descent 165

Exercise 3.05: Linear Regression with Gradient Descent 168

Exercise 3.06: Optimizing Gradient Descent 176

Activity 3.05: Gradient Descent 181

Multiple Linear Regression 183

Exercise 3.07: Multiple Linear Regression 185

Summary 193

4. Autoregression 195

Introduction 196

Autoregression Models 196

Exercise 4.01: Creating an Autoregression Model 197

Activity 4.01: Autoregression Model Based on Periodic Data 214

Summary 221

5. Classification Techniques 223

Introduction 224

Ordinary Least Squares as a Classifier 224

Exercise 5.01: Ordinary Least Squares as a Classifier 226

Logistic Regression 232

Exercise 5.02: Logistic Regression as a Classifier – Binary Classifier 236

Exercise 5.03: Logistic Regression – Multiclass Classifier 242

Activity 5.01: Ordinary Least Squares Classifier – Binary Classifier 248

Select K Best Feature Selection 249

Exercise 5.04: Breast Cancer Diagnosis Classification Using Logistic Regression 250

Classification Using K-Nearest Neighbors 254

Exercise 5.05: KNN Classification 257

Exercise 5.06: Visualizing KNN Boundaries 259

Activity 5.02: KNN Multiclass Classifier 266

Classification Using Decision Trees 267

Exercise 5.07: ID3 Classification 269

Classification and Regression Tree 280

Exercise 5.08: Breast Cancer Diagnosis Classification Using a CART Decision Tree 281

Activity 5.03: Binary Classification Using a CART Decision Tree 284

Artificial Neural Networks 286

Exercise 5.09: Neural Networks – Multiclass Classifier 288

Activity 5.04: Breast Cancer Diagnosis Classification Using Artificial Neural Networks 292

Summary 294

6. Ensemble Modeling 297

Introduction 298

One-Hot Encoding 299

Exercise 6.01: Importing Modules and Preparing the Dataset 300

Overfitting and Underfitting 302

Underfitting 304

Overfitting 305

Overcoming the Problem of Underfitting and Overfitting 306

Bagging 307

Bootstrapping 308

Exercise 6.02: Using the Bagging Classifier 310

Random Forest 312

Exercise 6.03: Building the Ensemble Model Using Random Forest 313

Boosting 314

Adaptive Boosting 316

Exercise 6.04: Implementing Adaptive Boosting 316

Gradient Boosting 319

Exercise 6.05: Implementing GradientBoostingClassifier to Build an Ensemble Model 320

Stacking 321

Exercise 6.06: Building a Stacked Model 324

Activity 6.01: Stacking with Standalone and Ensemble Algorithms 328

Summary 331

7. Model Evaluation 333

Introduction 334

Importing the Modules and Preparing Our Dataset 336

Evaluation Metrics 338

Regression Metrics 338

Exercise 7.01: Calculating Regression Metrics 341

Classification Metrics 342

Numerical Metrics 342

Curve Plots 346

Exercise 7.02: Calculating Classification Metrics 348

Splitting a Dataset 350

Hold-Out Data 350

K-Fold Cross-Validation 352

Sampling 353

Exercise 7.03: Performing K-Fold Cross-Validation with Stratified Sampling 354

Performance Improvement Tactics 355

Variation in Train and Test Errors 356

Learning Curve 356

Validation Curve 357

Hyperparameter Tuning 358

Exercise 7.04: Hyperparameter Tuning with Random Search 360

Feature Importance 364

Exercise 7.05: Feature Importance Using Random Forest 364

Activity 7.01: Final Test Project 366

Summary 369

Appendix 371 Preface

About

This section briefly introduces this book and software requirements in order to complete all of the included activities and exercises.

About the Book

You already know you want to learn about supervised learning, and a smarter way to do that is to learn by doing. The Supervised Learning Workshop focuses on building up your practical skills so that you can deploy and build solutions that leverage key supervised learning algorithms. You'll learn from real examples that lead to real results.

Throughout The Supervised Learning Workshop, you'll take an engaging step-by-step approach to understanding supervised learning. You won't have to sit through any unnecessary theory. If you're short on time, you can jump into a single exercise each day or spend an entire weekend learning how to predict future values with various regression and autoregression models. It's your choice. Learning on your terms, you'll build up and reinforce key skills in a way that feels rewarding.

Every physical print copy of The Supervised Learning Workshop unlocks access to the interactive edition. With videos detailing all exercises and activities, you'll always have a guided solution. You can also benchmark yourself against assessments, track your progress, and receive content updates. You'll even earn a secure credential that you can share and verify online upon completion. It's a premium learning experience that's included with your print copy. To redeem this, follow the instructions located at the start of the book.

Audience

Our goal at Packt is to help you be successful in whatever it is you choose to do. The Supervised Learning Workshop is ideal for those with a Python background who are just starting out with machine learning. Pick up a copy of The Supervised Learning Workshop today and let Packt help you develop skills that stick with you for life.

About the Chapters

Chapter 1, Fundamentals of Supervised Learning Algorithms, introduces you to supervised learning, Jupyter notebooks, and some of the most common pandas data methods.

Chapter 2, Exploratory Data Analysis and Visualization, teaches you how to perform exploration and analysis on a new dataset.

Chapter 3, Linear Regression, teaches you how to tackle regression problems and analysis, introducing you to linear regression as well as multiple linear regression and gradient descent.

Chapter 4, Autoregression, teaches you how to implement autoregression as a method to forecast values that depend on past values.

Chapter 5, Classification Techniques, introduces classification problems, classification using linear and logistic regression, k-nearest neighbors, and decision trees.

Chapter 6, Ensemble Modeling, teaches you how to examine the different ways of ensemble modeling, including their benefits and limitations.

Chapter 7, Model Evaluation, demonstrates how you can improve a model's performance by using hyperparameters and model evaluation metrics.

Conventions

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Use the pandas read_csv function to load the CSV file containing the synth_temp.csv dataset, and then display the first five lines of data."

Words that you see on screen, for example, in menus or dialog boxes, also appear in the text like this: "Open the titanic.csv file by clicking on it on the Jupyter notebook home page."

A block of code is set as follows:

print(data[pd.isnull(data.damage_millions_dollars)].shape[0])

print(data[pd.isnull(data.damage_millions_dollars) &

(data.damage_description != 'NA')].shape[0])

New terms and important words are shown like this: "Supervised means that the labels for the data are provided within the training, allowing the model to learn from these labels."

Before You Begin

Each great journey begins with a humble step. Before we can do awesome things with supervised learning, we need to be prepared with a productive environment. In this section, we will see how to do that.

Installation and Setup

Jupyter notebooks are available once you install Anaconda on your system. Anaconda can be installed for Windows systems using the steps available at https://packt.live/2P4XWqI.

For other systems, navigate to the respective installation guide from https://packt.live/32tU7Ro.

These installations will be executed in the 'C' drive of your system. You can choose to change the destination.

Installing the Code Bundle

Download the code files from GitHub at https://packt.live/2TlcKDf. Refer to these code files for the complete code bundle. Make sure to copy the code bundle to the same drive as your Anaconda installation.

If you have any issues or questions about installation, please email us at [email protected].

The high-quality color images used in this book can be found at https://packt.live/2T1BX6M.

1. Fundamentals of Supervised Learning Algorithms

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

The Supervised Learning Workshop E-Book

Blaine Bateman

The Supervised Learning Workshop

Second Edition

The Supervised Learning Workshop

Second Edition

Table of Contents

Preface i

1. Fundamentals of Supervised Learning Algorithms 1

Introduction 2

When to Use Supervised Learning 3

Python Packages and Modules 4

Loading Data in Pandas 5

Exercise 1.01: Loading and Summarizing the Titanic Dataset 7

Exercise 1.02: Indexing and Selecting Data 10

Exercise 1.03: Advanced Indexing and Selection 14

Pandas Methods 17

Exercise 1.04: Splitting, Applying, and Combining Data Sources 21

Quantiles 23

Lambda Functions 24

Exercise 1.05: Creating Lambda Functions 25

Data Quality Considerations 28

Managing Missing Data 28

Class Imbalance 33

Low Sample Size 34

Activity 1.01: Implementing Pandas Functions 35

Summary 37

2. Exploratory Data Analysis and Visualization 39

Introduction 40

Exploratory Data Analysis (EDA) 40

Summary Statistics and Central Values 42

Exercise 2.01: Summarizing the Statistics of Our Dataset 43

Missing Values 48

Finding Missing Values 49

Exercise 2.02: Visualizing Missing Values 50

Imputation Strategies for Missing Values 54

Exercise 2.03: Performing Imputation Using Pandas 55

Exercise 2.04: Performing Imputation Using Scikit-Learn 56

Exercise 2.05: Performing Imputation Using Inferred Values 58

Activity 2.01: Summary Statistics and Missing Values 61

Distribution of Values 65

Target Variable 65

Exercise 2.06: Plotting a Bar Chart 65

Categorical Data 67

Exercise 2.07: Identifying Data Types for Categorical Variables 68

Exercise 2.08: Calculating Category Value Counts 70

Exercise 2.09: Plotting a Pie Chart 71

Continuous Data 73

Skewness 75

Kurtosis 75

Exercise 2.10: Plotting a Histogram 75

Exercise 2.11: Computing Skew and Kurtosis 77

Activity 2.02: Visually Representing the Distribution of Values 79

Relationships within the Data 84

Relationship between Two Continuous Variables 84

Pearson's Coefficient of Correlation 85

Exercise 2.12: Plotting a Scatter Plot 85

Exercise 2.13: Plotting a Correlation Heatmap 87

Using Pairplots 90

Exercise 2.14: Implementing a Pairplot 90

Relationship between a Continuous and a Categorical Variable 92

Exercise 2.15: Plotting a Bar Chart 92

Exercise 2.16: Visualizing a Box Plot 95

Relationship Between Two Categorical Variables 97

Exercise 2.17: Plotting a Stacked Bar Chart 97

Activity 2.03: Relationships within the Data 99

Summary 105

3. Linear Regression 107

Introduction 108

Regression and Classification Problems 108

The Machine Learning Workflow 109

Business Understanding 110

Data Understanding 110

Data Preparation 111

Modeling 111

Evaluation 112

Deployment 112

Exercise 3.01: Plotting Data with a Moving Average 112

Activity 3.01: Plotting Data with a Moving Average 123

Linear Regression 125