The Supervised Learning Workshop - Blaine Bateman - E-Book

The Supervised Learning Workshop E-Book

Blaine Bateman

0,0
32,36 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Cut through the noise and get real results with a step-by-step approach to understanding supervised learning algorithms




Key Features



  • Ideal for those getting started with machine learning for the first time


  • A step-by-step machine learning tutorial with exercises and activities that help build key skills


  • Structured to let you progress at your own pace, on your own terms


  • Use your physical print copy to redeem free access to the online interactive edition



Book Description



You already know you want to understand supervised learning, and a smarter way to do that is to learn by doing. The Supervised Learning Workshop focuses on building up your practical skills so that you can deploy and build solutions that leverage key supervised learning algorithms. You'll learn from real examples that lead to real results.







Throughout The Supervised Learning Workshop, you'll take an engaging step-by-step approach to understand supervised learning. You won't have to sit through any unnecessary theory. If you're short on time you can jump into a single exercise each day or spend an entire weekend learning how to predict future values with auto regressors. It's your choice. Learning on your terms, you'll build up and reinforce key skills in a way that feels rewarding.







Every physical print copy of The Supervised Learning Workshop unlocks access to the interactive edition. With videos detailing all exercises and activities, you'll always have a guided solution. You can also benchmark yourself against assessments, track progress, and receive content updates. You'll even earn a secure credential that you can share and verify online upon completion. It's a premium learning experience that's included with your printed copy. To redeem, follow the instructions located at the start of your book.







Fast-paced and direct, The Supervised Learning Workshop is the ideal companion for those with some Python background who are getting started with machine learning. You'll learn how to apply key algorithms like a data scientist, learning along the way. This process means that you'll find that your new skills stick, embedded as best practice. A solid foundation for the years ahead.





What you will learn



  • Get to grips with the fundamental of supervised learning algorithms


  • Discover how to use Python libraries for supervised learning


  • Learn how to load a dataset in pandas for testing


  • Use different types of plots to visually represent the data


  • Distinguish between regression and classification problems


  • Learn how to perform classification using K-NN and decision trees



Who this book is for



Our goal at Packt is to help you be successful, in whatever it is you choose to do. The Supervised Learning Workshop is ideal for those with a Python background, who are just starting out with machine learning. Pick up a Workshop today, and let Packt help you develop skills that stick with you for life.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 411

Veröffentlichungsjahr: 2020

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



The Supervised Learning Workshop

Second Edition

A New, Interactive Approach to Understanding Supervised Learning Algorithms

Blaine Bateman, Ashish Ranjan Jha, Benjamin Johnston, and Ishita Mathur

The Supervised Learning Workshop

Second Edition

Copyright © 2020 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Authors: Blaine Bateman, Ashish Ranjan Jha, Benjamin Johnston, and Ishita Mathur

Reviewers: Tiffany Ford, Sukanya Mandal, Ashish Pratik Patil, and Ratan Singh

Managing Editor: Snehal Tambe

Acquisitions Editor: Anindya Sil

Production Editor: Samita Warang

Editorial Board: Shubhopriya Banerjee, Bharat Botle, Ewan Buckingham, Megan Carlisle, Mahesh Dhyani, Manasa Kumar, Alex Mazonowicz, Bridget Neale, Dominic Pereira, Shiny Poojary, Abhishek Rane, Brendan Rodrigues, Mugdha Sawarkar, Erol Staveley, Ankita Thakur, Nitesh Thakur, and Jonathan Wray

First published: April 2019

Second edition: February 2020

Production reference: 1280220

ISBN 978-1-80020-904-6

Published by Packt Publishing Ltd.

Livery Place, 35 Livery Street

Birmingham B3 2PB, UK

Table of Contents

Preface   i

1. Fundamentals of Supervised Learning Algorithms   1

Introduction   2

When to Use Supervised Learning   3

Python Packages and Modules   4

Loading Data in Pandas   5

Exercise 1.01: Loading and Summarizing the Titanic Dataset   7

Exercise 1.02: Indexing and Selecting Data   10

Exercise 1.03: Advanced Indexing and Selection   14

Pandas Methods   17

Exercise 1.04: Splitting, Applying, and Combining Data Sources   21

Quantiles   23

Lambda Functions   24

Exercise 1.05: Creating Lambda Functions   25

Data Quality Considerations   28

Managing Missing Data   28

Class Imbalance   33

Low Sample Size   34

Activity 1.01: Implementing Pandas Functions   35

Summary   37

2. Exploratory Data Analysis and Visualization   39

Introduction   40

Exploratory Data Analysis (EDA)   40

Summary Statistics and Central Values   42

Exercise 2.01: Summarizing the Statistics of Our Dataset   43

Missing Values   48

Finding Missing Values   49

Exercise 2.02: Visualizing Missing Values   50

Imputation Strategies for Missing Values   54

Exercise 2.03: Performing Imputation Using Pandas   55

Exercise 2.04: Performing Imputation Using Scikit-Learn   56

Exercise 2.05: Performing Imputation Using Inferred Values   58

Activity 2.01: Summary Statistics and Missing Values   61

Distribution of Values   65

Target Variable   65

Exercise 2.06: Plotting a Bar Chart   65

Categorical Data   67

Exercise 2.07: Identifying Data Types for Categorical Variables   68

Exercise 2.08: Calculating Category Value Counts   70

Exercise 2.09: Plotting a Pie Chart   71

Continuous Data   73

Skewness 75

Kurtosis 75

Exercise 2.10: Plotting a Histogram   75

Exercise 2.11: Computing Skew and Kurtosis   77

Activity 2.02: Visually Representing the Distribution of Values   79

Relationships within the Data   84

Relationship between Two Continuous Variables   84

Pearson's Coefficient of Correlation 85

Exercise 2.12: Plotting a Scatter Plot   85

Exercise 2.13: Plotting a Correlation Heatmap   87

Using Pairplots 90

Exercise 2.14: Implementing a Pairplot   90

Relationship between a Continuous and a Categorical Variable   92

Exercise 2.15: Plotting a Bar Chart   92

Exercise 2.16: Visualizing a Box Plot   95

Relationship Between Two Categorical Variables   97

Exercise 2.17: Plotting a Stacked Bar Chart   97

Activity 2.03: Relationships within the Data   99

Summary   105

3. Linear Regression   107

Introduction   108

Regression and Classification Problems   108

The Machine Learning Workflow   109

Business Understanding 110

Data Understanding 110

Data Preparation 111

Modeling 111

Evaluation 112

Deployment 112

Exercise 3.01: Plotting Data with a Moving Average   112

Activity 3.01: Plotting Data with a Moving Average   123

Linear Regression   125

Least Squares Method   126

The Scikit-Learn Model API   126

Exercise 3.02: Fitting a Linear Model Using the Least Squares Method   127

Activity 3.02: Linear Regression Using the Least Squares Method   132

Linear Regression with Categorical Variables   137

Exercise 3.03: Introducing Dummy Variables   139

Activity 3.03: Dummy Variables   150

Polynomial Models with Linear Regression   152

Exercise 3.04: Polynomial Models with Linear Regression   154

Activity 3.04: Feature Engineering with Linear Regression   159

Generic Model Training   163

Gradient Descent   165

Exercise 3.05: Linear Regression with Gradient Descent   168

Exercise 3.06: Optimizing Gradient Descent   176

Activity 3.05: Gradient Descent   181

Multiple Linear Regression   183

Exercise 3.07: Multiple Linear Regression   185

Summary   193

4. Autoregression   195

Introduction   196

Autoregression Models   196

Exercise 4.01: Creating an Autoregression Model   197

Activity 4.01: Autoregression Model Based on Periodic Data   214

Summary   221

5. Classification Techniques   223

Introduction   224

Ordinary Least Squares as a Classifier   224

Exercise 5.01: Ordinary Least Squares as a Classifier   226

Logistic Regression   232

Exercise 5.02: Logistic Regression as a Classifier – Binary Classifier   236

Exercise 5.03: Logistic Regression – Multiclass Classifier   242

Activity 5.01: Ordinary Least Squares Classifier – Binary Classifier   248

Select K Best Feature Selection 249

Exercise 5.04: Breast Cancer Diagnosis Classification Using Logistic Regression   250

Classification Using K-Nearest Neighbors   254

Exercise 5.05: KNN Classification   257

Exercise 5.06: Visualizing KNN Boundaries   259

Activity 5.02: KNN Multiclass Classifier   266

Classification Using Decision Trees   267

Exercise 5.07: ID3 Classification   269

Classification and Regression Tree 280

Exercise 5.08: Breast Cancer Diagnosis Classification Using a CART Decision Tree   281

Activity 5.03: Binary Classification Using a CART Decision Tree   284

Artificial Neural Networks   286

Exercise 5.09: Neural Networks – Multiclass Classifier   288

Activity 5.04: Breast Cancer Diagnosis Classification Using Artificial Neural Networks   292

Summary   294

6. Ensemble Modeling   297

Introduction   298

One-Hot Encoding   299

Exercise 6.01: Importing Modules and Preparing the Dataset   300

Overfitting and Underfitting   302

Underfitting   304

Overfitting   305

Overcoming the Problem of Underfitting and Overfitting   306

Bagging   307

Bootstrapping   308

Exercise 6.02: Using the Bagging Classifier   310

Random Forest   312

Exercise 6.03: Building the Ensemble Model Using Random Forest   313

Boosting   314

Adaptive Boosting   316

Exercise 6.04: Implementing Adaptive Boosting   316

Gradient Boosting   319

Exercise 6.05: Implementing GradientBoostingClassifier to Build an Ensemble Model   320

Stacking   321

Exercise 6.06: Building a Stacked Model   324

Activity 6.01: Stacking with Standalone and Ensemble Algorithms   328

Summary   331

7. Model Evaluation   333

Introduction   334

Importing the Modules and Preparing Our Dataset   336

Evaluation Metrics   338

Regression Metrics   338

Exercise 7.01: Calculating Regression Metrics   341

Classification Metrics   342

Numerical Metrics 342

Curve Plots 346

Exercise 7.02: Calculating Classification Metrics   348

Splitting a Dataset   350

Hold-Out Data   350

K-Fold Cross-Validation   352

Sampling   353

Exercise 7.03: Performing K-Fold Cross-Validation with Stratified Sampling   354

Performance Improvement Tactics   355

Variation in Train and Test Errors   356

Learning Curve 356

Validation Curve 357

Hyperparameter Tuning   358

Exercise 7.04: Hyperparameter Tuning with Random Search   360

Feature Importance   364

Exercise 7.05: Feature Importance Using Random Forest   364

Activity 7.01: Final Test Project   366

Summary   369

Appendix   371

Preface

About

This section briefly introduces this book and software requirements in order to complete all of the included activities and exercises.

About the Book

You already know you want to learn about supervised learning, and a smarter way to do that is to learn by doing. The Supervised Learning Workshop focuses on building up your practical skills so that you can deploy and build solutions that leverage key supervised learning algorithms. You'll learn from real examples that lead to real results.

Throughout The Supervised Learning Workshop, you'll take an engaging step-by-step approach to understanding supervised learning. You won't have to sit through any unnecessary theory. If you're short on time, you can jump into a single exercise each day or spend an entire weekend learning how to predict future values with various regression and autoregression models. It's your choice. Learning on your terms, you'll build up and reinforce key skills in a way that feels rewarding.

Every physical print copy of The Supervised Learning Workshop unlocks access to the interactive edition. With videos detailing all exercises and activities, you'll always have a guided solution. You can also benchmark yourself against assessments, track your progress, and receive content updates. You'll even earn a secure credential that you can share and verify online upon completion. It's a premium learning experience that's included with your print copy. To redeem this, follow the instructions located at the start of the book.

Fast-paced and direct, The Supervised Learning Workshop is the ideal companion for those with some Python background who are getting started with machine learning. You'll learn how to apply key algorithms like a data scientist, learning along the way. This process means that you'll find that your new skills stick, embedded as best practice, establishing a solid foundation for the years ahead.

Audience

Our goal at Packt is to help you be successful in whatever it is you choose to do. The Supervised Learning Workshop is ideal for those with a Python background who are just starting out with machine learning. Pick up a copy of The Supervised Learning Workshop today and let Packt help you develop skills that stick with you for life.

About the Chapters

Chapter 1, Fundamentals of Supervised Learning Algorithms, introduces you to supervised learning, Jupyter notebooks, and some of the most common pandas data methods.

Chapter 2, Exploratory Data Analysis and Visualization, teaches you how to perform exploration and analysis on a new dataset.

Chapter 3, Linear Regression, teaches you how to tackle regression problems and analysis, introducing you to linear regression as well as multiple linear regression and gradient descent.

Chapter 4, Autoregression, teaches you how to implement autoregression as a method to forecast values that depend on past values.

Chapter 5, Classification Techniques, introduces classification problems, classification using linear and logistic regression, k-nearest neighbors, and decision trees.

Chapter 6, Ensemble Modeling, teaches you how to examine the different ways of ensemble modeling, including their benefits and limitations.

Chapter 7, Model Evaluation, demonstrates how you can improve a model's performance by using hyperparameters and model evaluation metrics.

Conventions

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Use the pandas read_csv function to load the CSV file containing the synth_temp.csv dataset, and then display the first five lines of data."

Words that you see on screen, for example, in menus or dialog boxes, also appear in the text like this: "Open the titanic.csv file by clicking on it on the Jupyter notebook home page."

A block of code is set as follows:

print(data[pd.isnull(data.damage_millions_dollars)].shape[0])

print(data[pd.isnull(data.damage_millions_dollars) &

(data.damage_description != 'NA')].shape[0])

New terms and important words are shown like this: "Supervised means that the labels for the data are provided within the training, allowing the model to learn from these labels."

Before You Begin

Each great journey begins with a humble step. Before we can do awesome things with supervised learning, we need to be prepared with a productive environment. In this section, we will see how to do that.

Installation and Setup

Jupyter notebooks are available once you install Anaconda on your system. Anaconda can be installed for Windows systems using the steps available at https://packt.live/2P4XWqI.

For other systems, navigate to the respective installation guide from https://packt.live/32tU7Ro.

These installations will be executed in the 'C' drive of your system. You can choose to change the destination.

Installing the Code Bundle

Download the code files from GitHub at https://packt.live/2TlcKDf. Refer to these code files for the complete code bundle. Make sure to copy the code bundle to the same drive as your Anaconda installation.

If you have any issues or questions about installation, please email us at [email protected].

The high-quality color images used in this book can be found at https://packt.live/2T1BX6M.

1. Fundamentals of Supervised Learning Algorithms