E-Book
32,36 €

Practical Machine Learning with R E-Book

Brindha Priyadarshini Jeyaraman

0,0

32,36 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Understand how machine learning works and get hands-on experience of using R to build algorithms that can solve various real-world problems

Key Features

Gain a comprehensive overview of different machine learning techniques

Explore various methods for selecting a particular algorithm

Implement a machine learning project from problem definition through to the final model

Book Description

With huge amounts of data being generated every moment, businesses need applications that apply complex mathematical calculations to data repeatedly and at speed. With machine learning techniques and R, you can easily develop these kinds of applications in an efficient way.

Practical Machine Learning with R begins by helping you grasp the basics of machine learning methods, while also highlighting how and why they work. You will understand how to get these algorithms to work in practice, rather than focusing on mathematical derivations. As you progress from one chapter to another, you will gain hands-on experience of building a machine learning solution in R. Next, using R packages such as rpart, random forest, and multiple imputation by chained equations (MICE), you will learn to implement algorithms including neural net classifier, decision trees, and linear and non-linear regression. As you progress through the book, you'll delve into various machine learning techniques for both supervised and unsupervised learning approaches. In addition to this, you'll gain insights into partitioning the datasets and mechanisms to evaluate the results from each model and be able to compare them.

By the end of this book, you will have gained expertise in solving your business problems, starting by forming a good problem statement, selecting the most appropriate model to solve your problem, and then ensuring that you do not overtrain it.

What you will learn

Define a problem that can be solved by training a machine learning model

Obtain, verify and clean data before transforming it into the correct format for use

Perform exploratory analysis and extract features from data

Build models for neural net, linear and non-linear regression, classification, and clustering

Evaluate the performance of a model with the right metrics

Implement a classification problem using the neural net package

Employ a decision tree using the random forest library

Who this book is for

If you are a data analyst, data scientist, or a business analyst who wants to understand the process of machine learning and apply it to a real dataset using R, this book is just what you need. Data scientists who use Python and want to implement their machine learning solutions using R will also find this book very useful. The book will also enable novice programmers to start their journey in data science. Basic knowledge of any programming language is all you need to get started.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 321

Veröffentlichungsjahr: 2019

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Practical Machine Learning with R

Define, build, and evaluate machine learning models for real-world applications

Brindha Priyadarshini Jeyaraman

Ludvig Renbo Olsen

Monicah Wambugu

Practical Machine Learning with R

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Authors: Brindha Priyadarshini Jeyaraman, Ludvig Renbo Olsen, and Monicah Wambugu

Technical Reviewers: Anil Kumar and Rohan Chikorde

Managing Editor: Steffi Monterio and Snehal Tambe

Acquisitions Editors: Koushik Sen

Production Editor: Samita Warang

Editorial Board: Shubhopriya Banerjee, Mayank Bhardwaj, Ewan Buckingham, Mahesh Dhyani, Taabish Khan, Manasa Kumar, Alex Mazonowicz, Pramod Menon, Bridget Neale, Dominic Pereira, Shiny Poojary, Erol Staveley, Ankita Thakur, Nitesh Thakur, and Jonathan Wray

First Published: August 2019

Production Reference: 1300819

ISBN: 978-1-83855-013-4

Published by Packt Publishing Ltd.

Livery Place, 35 Livery Street

Birmingham B3 2PB, UK

Preface i

Chapter 1: An Introduction to Machine Learning 1

Introduction 2

The Machine Learning Process 2

Raw Data 4

Data Pre-Processing 4

The Data Splitting Process 4

The Training Process 5

Evaluation Process 5

Deployment Process 7

Process Flow for Making Predictions 7

Introduction to R 8

Exercise 1: Reading from a CSV File in RStudio 8

Exercise 2: Performing Operations on a Dataframe 9

Exploratory Data Analysis (EDA) 10

View Built-in Datasets in R 10

Exercise 3: Loading Built-in Datasets 12

Exercise 4: Viewing Summaries of Data 18

Visualizing the Data 21

Activity 1: Finding the Distribution of Diabetic Patients in the PimaIndiansDiabetes Dataset 30

Activity 2: Grouping the PimaIndiansDiabetes Data 31

Activity 3: Performing EDA on the PimaIndiansDiabetes Dataset 32

Machine Learning Models 34

Types of Prediction 34

Supervised Learning 35

Unsupervised Learning 37

Applications of Machine Learning 37

Regression 38

Exercise 5: Building a Linear Classifier in R 40

Activity 4: Building Linear Models for the GermanCredit Dataset 42

Activity 5: Using Multiple Variables for a Regression Model for the Boston Housing Dataset 42

Summary 44

Chapter 2: Data Cleaning and Pre-processing 47

Introduction 48

Advanced Operations on Data Frames 49

Exercise 6: Sorting the Data Frame 49

Join Operations 52

Pre-Processing of Data Frames 55

Exercise 7: Centering Variables 55

Exercise 8: Normalizing the Variables 57

Exercise 9: Scaling the Variables 58

Activity 6: Centering and Scaling the Variables 60

Extracting the Principle Components 60

Exercise 10: Extracting the Principle Components 61

Subsetting Data 63

Exercise 11: Subsetting a Data Frame 63

Data Transposes 65

Identifying the Input and Output Variables 66

Identifying the Category of Prediction 67

Handling Missing Values, Duplicates, and Outliers 67

Handling Missing Values 67

Exercise 12: Identifying the Missing Values 67

Techniques for Handling Missing Values 70

Exercise 13: Imputing Using the MICE Package 70

Exercise 14: Performing Predictive Mean Matching 72

Handling Duplicates 74

Exercise 15: Identifying Duplicates 74

Techniques Used to Handle Duplicate Values 76

Handling Outliers 76

Exercise 16: Identifying Outlier Values 76

Techniques Used to Handle Outliers 78

Exercise 17: Predicting Values to Handle Outliers 78

Handling Missing Data 79

Exercise 18: Handling Missing Values 80

Activity 7: Identifying Outliers 81

Pre-Processing Categorical Data 82

Handling Imbalanced Datasets 82

Undersampling 84

Exercise 19: Undersampling a Dataset 84

Oversampling 85

Exercise 20: Oversampling 85

ROSE 85

Exercise 21: Oversampling using ROSE 86

SMOTE 86

Exercise 22: Implementing the SMOTE Technique 87

Activity 8: Oversampling and Undersampling using SMOTE 88

Activity 9: Sampling and Oversampling using ROSE 89

Summary 90

Chapter 3: Feature Engineering 93

Introduction 94

Types of Features 95

Datatype-Based Features 95

Date and Time Features 96

Exercise 23: Creating Date Features 96

Exercise 24: Creating Time Features 98

Time Series Features 99

Exercise 25: Binning 100

Activity 10: Creating Time Series Features – Binning 102

Summary Statistics 104

Exercise 26: Finding Description of Features 104

Standardizing and Rescaling 106

Handling Categorical Variables 107

Skewness 108

Exercise 27: Computing Skewness 108

Activity 11: Identifying Skewness 109

Reducing Skewness Using Log Transform 111

Exercise 28: Using Log Transform 111

Derived Features or Domain-Specific Features 112

Adding Features to a Data Frame 112

Exercise 29: Adding a New Column to an R Data Frame 112

Handling Redundant Features 114

Exercise 30: Identifying Redundant Features 114

Text Features 116

Exercise 31: Automatically Generating Text Features 118

Feature Selection 121

Correlation Analysis 121

Exercise 32: Plotting Correlation between Two Variables 122

P-Value 124

Exercise 33: Calculating the P-Value 124

Recursive Feature Elimination 126

Exercise 34: Implementing Recursive Feature Elimination 126

PCA 129

Exercise 35: Implementing PCA 129

Activity 12: Generating PCA 130

Ranking Features 131

Variable Importance Approach with Learning Vector Quantization 131

Exercise 36: Implementing LVQ 131

Variable Importance Approach Using Random Forests 134

Exercise 37: Finding Variable Importance in the PimaIndiansDiabetes Dataset 134

Activity 13: Implementing the Random Forest Approach 136

Variable Importance Approach Using a Logistic Regression Model 137

Exercise 38: Implementing the Logistic Regression Model 137

Determining Variable Importance Using rpart 138

Exercise 39: Variable Importance Using rpart for the PimaIndiansDiabetes Data 138

Activity 14: Selecting Features Using Variable Importance 140

Summary 143

Chapter 4: Introduction to neuralnet and Evaluation Methods 145

Introduction 146

Classification 146

Binary Classification 147

Exercise 40: Preparing the Dataset 147

Balanced Partitioning Using the groupdata2 Package 148

Exercise 41: Partitioning the Dataset 149

Exercise 42: Creating Balanced Partitions 152

Leakage 154

Exercise 43: Ensuring an Equal Number of Observations Per Class 155

Standardizing 157

Neural Networks with neuralnet 158

Activity 15: Training a Neural Network 160

Model Selection 162

Evaluation Metrics 162

Accuracy 162

Precision 162

Recall 163

Exercise 44: Creating a Confusion Matrix 164

Exercise 45: Creating Baseline Evaluations 166

Over and Underfitting 170

Adding Layers and Nodes in neuralnet 171

Cross-Validation 173

Creating Folds 176

Exercise 46: Writing a Cross-Validation Training Loop 177

Activity 16: Training and Comparing Neural Network Architectures 179

Activity 17: Training and Comparing Neural Network Architectures with Cross-Validation 182

Multiclass Classification Overview 184

Summary 186

Chapter 5: Linear and Logistic Regression Models 189

Introduction 190

Regression 190

Linear Regression 193

Exercise 47: Training Linear Regression Models 196

R2 201

Exercise 48: Plotting Model Predictions 201

Exercise 49: Incrementally Adding Predictors 205

Comparing Linear Regression Models 210

Evaluation Metrics 210

MAE 211

RMSE 211

Differences between MAE and RMSE 212

Exercise 50: Comparing Models with the cvms Package 214

Interactions 217

Exercise 51: Adding Interaction Terms to Our Model 221

Should We Standardize Predictors? 226

Repeated Cross-Validation 228

Exercise 52: Running Repeated Cross-Validation 228

Exercise 53: Validating Models with validate() 232

Activity 18: Implementing Linear Regression 234

Log-Transforming Predictors 236

Exercise 54: Log-Transforming Predictors 237

Logistic Regression 241

Exercise 55: Training Logistic Regression Models 242

Exercise 56: Creating Binomial Baseline Evaluations with cvms 252

Exercise 57: Creating Gaussian Baseline Evaluations with cvms 254

Regression and Classification with Decision Trees 256

Exercise 58: Training Random Forest Models 257

Model Selection by Multiple Disagreeing Metrics 259

Pareto Dominance 259

Exercise 59: Plotting the Pareto Front 259

Activity 19: Classifying Room Types 264

Summary 267

Chapter 6: Unsupervised Learning 269

Introduction 270

Overview of Unsupervised Learning (Clustering) 271

Hard versus Soft Clusters 272

Flat versus Hierarchical Clustering 273

Monothetic versus Polythetic Clustering 276

Exercise 60: Monothetic and Hierarchical Clustering on a Binary Dataset 276

DIANA 279

Exercise 61: Implement Hierarchical Clustering Using DIANA 279

AGNES 283

Exercise 62: Agglomerative Clustering Using AGNES 284

Distance Metrics in Clustering 286

Exercise 63: Calculate Dissimilarity Matrices Using Euclidean and Manhattan Distance 287

Correlation-Based Distance Metrics 290

Exercise 64: Apply Correlation-Based Metrics 292

Applications of Clustering 294

k-means Clustering 295

Exploratory Data Analysis Using Scatter Plots 295

The Elbow Method 296

Exercise 65: Implementation of k-means Clustering in R 297

Activity 20: Perform DIANA, AGNES, and k-means on the Built-In Motor Car Dataset 305

Summary 308

Appendix 311 Preface

About

This section briefly introduces the authors, the coverage of this book, the technical skills you'll need to get started, and the hardware and software requirements required to complete all of the included activities and exercises.

Chapter 1 An Introduction to Machine Learning

Learning Objectives

By the end of this chapter, you will be able to:

Explain the concept of machine learning.Outline the process involved in building models in machine learning.Identify the various algorithms available in machine learning.Identify the applications of machine learning.Use the R command to load R packages.Perform exploratory data analysis and visualize the datasets.

This chapter explains the concept of machine learning and the series of steps involved in analyzing the data to prepare it for building a machine learning model.

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

Practical Machine Learning with R E-Book

Brindha Priyadarshini Jeyaraman

Practical Machine Learning with R

Practical Machine Learning with R

Table of Contents

Preface i

Chapter 1: An Introduction to Machine Learning 1

Introduction 2

The Machine Learning Process 2

Raw Data 4

Data Pre-Processing 4

The Data Splitting Process 4

The Training Process 5

Evaluation Process 5

Deployment Process 7

Process Flow for Making Predictions 7

Introduction to R 8

Exercise 1: Reading from a CSV File in RStudio 8

Exercise 2: Performing Operations on a Dataframe 9

Exploratory Data Analysis (EDA) 10

View Built-in Datasets in R 10

Exercise 3: Loading Built-in Datasets 12

Exercise 4: Viewing Summaries of Data 18

Visualizing the Data 21

Activity 1: Finding the Distribution of Diabetic Patients in the PimaIndiansDiabetes Dataset 30

Activity 2: Grouping the PimaIndiansDiabetes Data 31

Activity 3: Performing EDA on the PimaIndiansDiabetes Dataset 32

Machine Learning Models 34

Types of Prediction 34

Supervised Learning 35

Unsupervised Learning 37

Applications of Machine Learning 37

Regression 38

Exercise 5: Building a Linear Classifier in R 40

Activity 4: Building Linear Models for the GermanCredit Dataset 42

Activity 5: Using Multiple Variables for a Regression Model for the Boston Housing Dataset 42

Summary 44

Chapter 2: Data Cleaning and Pre-processing 47

Introduction 48

Advanced Operations on Data Frames 49

Exercise 6: Sorting the Data Frame 49

Join Operations 52

Pre-Processing of Data Frames 55

Exercise 7: Centering Variables 55

Exercise 8: Normalizing the Variables 57

Exercise 9: Scaling the Variables 58

Activity 6: Centering and Scaling the Variables 60

Extracting the Principle Components 60

Exercise 10: Extracting the Principle Components 61

Subsetting Data 63

Exercise 11: Subsetting a Data Frame 63

Data Transposes 65

Identifying the Input and Output Variables 66

Identifying the Category of Prediction 67

Handling Missing Values, Duplicates, and Outliers 67

Handling Missing Values 67

Exercise 12: Identifying the Missing Values 67

Techniques for Handling Missing Values 70

Exercise 13: Imputing Using the MICE Package 70

Exercise 14: Performing Predictive Mean Matching 72

Handling Duplicates 74

Exercise 15: Identifying Duplicates 74

Techniques Used to Handle Duplicate Values 76

Handling Outliers 76

Exercise 16: Identifying Outlier Values 76

Techniques Used to Handle Outliers 78

Exercise 17: Predicting Values to Handle Outliers 78

Handling Missing Data 79

Exercise 18: Handling Missing Values 80

Activity 7: Identifying Outliers 81

Pre-Processing Categorical Data 82

Handling Imbalanced Datasets 82

Undersampling 84

Exercise 19: Undersampling a Dataset 84

Oversampling 85

Exercise 20: Oversampling 85

ROSE 85

Exercise 21: Oversampling using ROSE 86

SMOTE 86

Exercise 22: Implementing the SMOTE Technique 87