Practical Machine Learning with R - Brindha Priyadarshini Jeyaraman - E-Book

Practical Machine Learning with R E-Book

Brindha Priyadarshini Jeyaraman

0,0
32,36 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Understand how machine learning works and get hands-on experience of using R to build algorithms that can solve various real-world problems




Key Features



  • Gain a comprehensive overview of different machine learning techniques


  • Explore various methods for selecting a particular algorithm


  • Implement a machine learning project from problem definition through to the final model



Book Description



With huge amounts of data being generated every moment, businesses need applications that apply complex mathematical calculations to data repeatedly and at speed. With machine learning techniques and R, you can easily develop these kinds of applications in an efficient way.







Practical Machine Learning with R begins by helping you grasp the basics of machine learning methods, while also highlighting how and why they work. You will understand how to get these algorithms to work in practice, rather than focusing on mathematical derivations. As you progress from one chapter to another, you will gain hands-on experience of building a machine learning solution in R. Next, using R packages such as rpart, random forest, and multiple imputation by chained equations (MICE), you will learn to implement algorithms including neural net classifier, decision trees, and linear and non-linear regression. As you progress through the book, you'll delve into various machine learning techniques for both supervised and unsupervised learning approaches. In addition to this, you'll gain insights into partitioning the datasets and mechanisms to evaluate the results from each model and be able to compare them.







By the end of this book, you will have gained expertise in solving your business problems, starting by forming a good problem statement, selecting the most appropriate model to solve your problem, and then ensuring that you do not overtrain it.




What you will learn



  • Define a problem that can be solved by training a machine learning model


  • Obtain, verify and clean data before transforming it into the correct format for use


  • Perform exploratory analysis and extract features from data


  • Build models for neural net, linear and non-linear regression, classification, and clustering


  • Evaluate the performance of a model with the right metrics


  • Implement a classification problem using the neural net package


  • Employ a decision tree using the random forest library



Who this book is for



If you are a data analyst, data scientist, or a business analyst who wants to understand the process of machine learning and apply it to a real dataset using R, this book is just what you need. Data scientists who use Python and want to implement their machine learning solutions using R will also find this book very useful. The book will also enable novice programmers to start their journey in data science. Basic knowledge of any programming language is all you need to get started.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 321

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Practical Machine Learning with R

Define, build, and evaluate machine learning models for real-world applications

Brindha Priyadarshini Jeyaraman

Ludvig Renbo Olsen

Monicah Wambugu

Practical Machine Learning with R

Copyright © 2019 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Authors: Brindha Priyadarshini Jeyaraman, Ludvig Renbo Olsen, and Monicah Wambugu

Technical Reviewers: Anil Kumar and Rohan Chikorde

Managing Editor: Steffi Monterio and Snehal Tambe

Acquisitions Editors: Koushik Sen

Production Editor: Samita Warang

Editorial Board: Shubhopriya Banerjee, Mayank Bhardwaj, Ewan Buckingham, Mahesh Dhyani, Taabish Khan, Manasa Kumar, Alex Mazonowicz, Pramod Menon, Bridget Neale, Dominic Pereira, Shiny Poojary, Erol Staveley, Ankita Thakur, Nitesh Thakur, and Jonathan Wray

First Published: August 2019

Production Reference: 1300819

ISBN: 978-1-83855-013-4

Published by Packt Publishing Ltd.

Livery Place, 35 Livery Street

Birmingham B3 2PB, UK

Table of Contents

Preface   i

Chapter 1: An Introduction to Machine Learning   1

Introduction   2

The Machine Learning Process   2

Raw Data   4

Data Pre-Processing   4

The Data Splitting Process   4

The Training Process   5

Evaluation Process   5

Deployment Process   7

Process Flow for Making Predictions    7

Introduction to R   8

Exercise 1: Reading from a CSV File in RStudio   8

Exercise 2: Performing Operations on a Dataframe   9

Exploratory Data Analysis (EDA)   10

View Built-in Datasets in R   10

Exercise 3: Loading Built-in Datasets   12

Exercise 4: Viewing Summaries of Data   18

Visualizing the Data   21

Activity 1: Finding the Distribution of Diabetic Patients in the PimaIndiansDiabetes Dataset   30

Activity 2: Grouping the PimaIndiansDiabetes Data   31

Activity 3: Performing EDA on the PimaIndiansDiabetes Dataset   32

Machine Learning Models   34

Types of Prediction   34

Supervised Learning   35

Unsupervised Learning   37

Applications of Machine Learning   37

Regression   38

Exercise 5: Building a Linear Classifier in R   40

Activity 4: Building Linear Models for the GermanCredit Dataset   42

Activity 5: Using Multiple Variables for a Regression Model for the Boston Housing Dataset   42

Summary   44

Chapter 2: Data Cleaning and Pre-processing   47

Introduction   48

Advanced Operations on Data Frames   49

Exercise 6: Sorting the Data Frame   49

Join Operations   52

Pre-Processing of Data Frames   55

Exercise 7: Centering Variables   55

Exercise 8: Normalizing the Variables   57

Exercise 9: Scaling the Variables   58

Activity 6: Centering and Scaling the Variables   60

Extracting the Principle Components   60

Exercise 10: Extracting the Principle Components   61

Subsetting Data   63

Exercise 11: Subsetting a Data Frame   63

Data Transposes   65

Identifying the Input and Output Variables   66

Identifying the Category of Prediction   67

Handling Missing Values, Duplicates, and Outliers   67

Handling Missing Values   67

Exercise 12: Identifying the Missing Values   67

Techniques for Handling Missing Values   70

Exercise 13: Imputing Using the MICE Package   70

Exercise 14: Performing Predictive Mean Matching   72

Handling Duplicates   74

Exercise 15: Identifying Duplicates   74

Techniques Used to Handle Duplicate Values   76

Handling Outliers   76

Exercise 16: Identifying Outlier Values   76

Techniques Used to Handle Outliers   78

Exercise 17: Predicting Values to Handle Outliers   78

Handling Missing Data   79

Exercise 18: Handling Missing Values   80

Activity 7: Identifying Outliers   81

Pre-Processing Categorical Data   82

Handling Imbalanced Datasets   82

Undersampling   84

Exercise 19: Undersampling a Dataset   84

Oversampling   85

Exercise 20: Oversampling   85

ROSE   85

Exercise 21: Oversampling using ROSE   86

SMOTE   86

Exercise 22: Implementing the SMOTE Technique   87

Activity 8: Oversampling and Undersampling using SMOTE   88

Activity 9: Sampling and Oversampling using ROSE   89

Summary   90

Chapter 3: Feature Engineering   93

Introduction   94

Types of Features   95

Datatype-Based Features   95

Date and Time Features   96

Exercise 23: Creating Date Features   96

Exercise 24: Creating Time Features   98

Time Series Features   99

Exercise 25: Binning   100

Activity 10: Creating Time Series Features – Binning   102

Summary Statistics   104

Exercise 26: Finding Description of Features   104

Standardizing and Rescaling   106

Handling Categorical Variables   107

Skewness   108

Exercise 27: Computing Skewness   108

Activity 11: Identifying Skewness   109

Reducing Skewness Using Log Transform   111

Exercise 28: Using Log Transform   111

Derived Features or Domain-Specific Features   112

Adding Features to a Data Frame   112

Exercise 29: Adding a New Column to an R Data Frame   112

Handling Redundant Features   114

Exercise 30: Identifying Redundant Features   114

Text Features    116

Exercise 31: Automatically Generating Text Features    118

Feature Selection   121

Correlation Analysis   121

Exercise 32: Plotting Correlation between Two Variables   122

P-Value   124

Exercise 33: Calculating the P-Value   124

Recursive Feature Elimination   126

Exercise 34: Implementing Recursive Feature Elimination   126

PCA   129

Exercise 35: Implementing PCA    129

Activity 12: Generating PCA   130

Ranking Features   131

Variable Importance Approach with Learning Vector Quantization   131

Exercise 36: Implementing LVQ   131

Variable Importance Approach Using Random Forests   134

Exercise 37: Finding Variable Importance in the PimaIndiansDiabetes Dataset   134

Activity 13: Implementing the Random Forest Approach   136

Variable Importance Approach Using a Logistic Regression Model   137

Exercise 38: Implementing the Logistic Regression Model   137

Determining Variable Importance Using rpart   138

Exercise 39: Variable Importance Using rpart for the PimaIndiansDiabetes Data   138

Activity 14: Selecting Features Using Variable Importance   140

Summary   143

Chapter 4: Introduction to neuralnet and Evaluation Methods   145

Introduction   146

Classification   146

Binary Classification   147

Exercise 40: Preparing the Dataset   147

Balanced Partitioning Using the groupdata2 Package   148

Exercise 41: Partitioning the Dataset   149

Exercise 42: Creating Balanced Partitions   152

Leakage   154

Exercise 43: Ensuring an Equal Number of Observations Per Class   155

Standardizing   157

Neural Networks with neuralnet   158

Activity 15: Training a Neural Network   160

Model Selection   162

Evaluation Metrics   162

Accuracy   162

Precision   162

Recall   163

Exercise 44: Creating a Confusion Matrix   164

Exercise 45: Creating Baseline Evaluations   166

Over and Underfitting   170

Adding Layers and Nodes in neuralnet   171

Cross-Validation   173

Creating Folds   176

Exercise 46: Writing a Cross-Validation Training Loop   177

Activity 16: Training and Comparing Neural Network Architectures   179

Activity 17: Training and Comparing Neural Network Architectures with Cross-Validation   182

Multiclass Classification Overview   184

Summary   186

Chapter 5: Linear and Logistic Regression Models   189

Introduction   190

Regression   190

Linear Regression   193

Exercise 47: Training Linear Regression Models   196

R2   201

Exercise 48: Plotting Model Predictions   201

Exercise 49: Incrementally Adding Predictors   205

Comparing Linear Regression Models   210

Evaluation Metrics   210

MAE   211

RMSE   211

Differences between MAE and RMSE   212

Exercise 50: Comparing Models with the cvms Package   214

Interactions   217

Exercise 51: Adding Interaction Terms to Our Model   221

Should We Standardize Predictors?   226

Repeated Cross-Validation   228

Exercise 52: Running Repeated Cross-Validation   228

Exercise 53: Validating Models with validate()   232

Activity 18: Implementing Linear Regression   234

Log-Transforming Predictors   236

Exercise 54: Log-Transforming Predictors   237

Logistic Regression   241

Exercise 55: Training Logistic Regression Models   242

Exercise 56: Creating Binomial Baseline Evaluations with cvms   252

Exercise 57: Creating Gaussian Baseline Evaluations with cvms   254

Regression and Classification with Decision Trees   256

Exercise 58: Training Random Forest Models   257

Model Selection by Multiple Disagreeing Metrics   259

Pareto Dominance   259

Exercise 59: Plotting the Pareto Front   259

Activity 19: Classifying Room Types   264

Summary   267

Chapter 6: Unsupervised Learning   269

Introduction   270

Overview of Unsupervised Learning (Clustering)   271

Hard versus Soft Clusters   272

Flat versus Hierarchical Clustering   273

Monothetic versus Polythetic Clustering   276

Exercise 60: Monothetic and Hierarchical Clustering on a Binary Dataset   276

DIANA   279

Exercise 61: Implement Hierarchical Clustering Using DIANA   279

AGNES   283

Exercise 62: Agglomerative Clustering Using AGNES   284

Distance Metrics in Clustering   286

Exercise 63: Calculate Dissimilarity Matrices Using Euclidean and Manhattan Distance   287

Correlation-Based Distance Metrics   290

Exercise 64: Apply Correlation-Based Metrics   292

Applications of Clustering   294

k-means Clustering   295

Exploratory Data Analysis Using Scatter Plots   295

The Elbow Method   296

Exercise 65: Implementation of k-means Clustering in R   297

Activity 20: Perform DIANA, AGNES, and k-means on the Built-In Motor Car Dataset   305

Summary   308

Appendix   311

Preface

About

This section briefly introduces the authors, the coverage of this book, the technical skills you'll need to get started, and the hardware and software requirements required to complete all of the included activities and exercises.

Chapter 1

An Introduction to Machine Learning

Learning Objectives

By the end of this chapter, you will be able to:

Explain the concept of machine learning.Outline the process involved in building models in machine learning.Identify the various algorithms available in machine learning.Identify the applications of machine learning.Use the R command to load R packages.Perform exploratory data analysis and visualize the datasets.

This chapter explains the concept of machine learning and the series of steps involved in analyzing the data to prepare it for building a machine learning model.