Ensemble Machine Learning - Ankit Dixit - E-Book

Ensemble Machine Learning E-Book

Ankit Dixit

0,0
45,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

An effective guide to using ensemble techniques to enhance machine learning models

About This Book

  • Learn how to maximize popular machine learning algorithms such as random forests, decision trees, AdaBoost, K-nearest neighbor, and more
  • Get a practical approach to building efficient machine learning models using ensemble techniques with real-world use cases
  • Implement concepts such as boosting, bagging, and stacking ensemble methods to improve your model prediction accuracy

Who This Book Is For

This book is for data scientists, machine learning practitioners, and deep learning enthusiasts who want to implement ensemble techniques and make a deep dive into the world of machine learning algorithms. You are expected to understand Python code and have a basic knowledge of probability theories, statistics, and linear algebra.

What You Will Learn

  • Understand why bagging improves classification and regression performance
  • Get to grips with implementing AdaBoost and different variants of this algorithm
  • See the bootstrap method and its application to bagging
  • Perform regression on Boston housing data using scikit-learn and NumPy
  • Know how to use Random forest for IRIS data classification
  • Get to grips with the classification of sonar dataset using KNN, Perceptron, and Logistic Regression
  • Discover how to improve prediction accuracy by fine-tuning the model parameters
  • Master the analysis of a trained predictive model for over-fitting/under-fitting cases

In Detail

Ensembling is a technique of combining two or more similar or dissimilar machine learning algorithms to create a model that delivers superior prediction power. This book will show you how you can use many weak algorithms to make a strong predictive model. This book contains Python code for different machine learning algorithms so that you can easily understand and implement it in your own systems.

This book covers different machine learning algorithms that are widely used in the practical world to make predictions and classifications. It addresses different aspects of a prediction framework, such as data pre-processing, model training, validation of the model, and more. You will gain knowledge of different machine learning aspects such as bagging (decision trees and random forests), Boosting (Ada-boost) and stacking (a combination of bagging and boosting algorithms).

Then you'll learn how to implement them by building ensemble models using TensorFlow and Python libraries such as scikit-learn and NumPy. As machine learning touches almost every field of the digital world, you'll see how these algorithms can be used in different applications such as computer vision, speech recognition, making recommendations, grouping and document classification, fitting regression on data, and more.

By the end of this book, you'll understand how to combine machine learning algorithms to work behind the scenes and reduce challenges and common problems.

Style and approach

This comprehensive guide offers the perfect blend of theory, examples, and implementations of real-world use cases.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 474

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Ensemble Machine Learning

 

 

 

 

 

 

A beginner's guide that combines powerful machine learning algorithms to build optimized models

 

 

 

 

 

 

 

 

 

 

Ankit Dixit

 

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Ensemble Machine Learning

 

Copyright © 2017 Packt Publishing

 

 

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

 

First published: December 2017

Production reference: 1191217

 

 

Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.

 

ISBN 978-1-78829-775-2

www.packtpub.com

Credits

Author

 Ankit Dixit

 

Copy Editor

Vikrant Phadkay

Reviewers

Apeksha Jain

Radovan Kavicky

Project Coordinator

Nidhi Joshi

 

Commissioning Editor

Sunith Shetty

Proofreader

Safis Editing 

Acquisition Editor

Viraj Madhav

Indexer

Tejal Daruwale Soni

Content Development Editor

Aishwarya Pandere

Graphics

Tania Dutta

Technical Editor

Suwarna Patil

Production Coordinator

Nilesh Mohite

About the Author

Ankit Dixit is a data scientist and computer vision engineer from Mumbai. Ankit has studied BTech in biomedical engineering and has a master's degree in computer vision specialization. He has worked in the field of computer vision and machine learning for the past 6 years. He has worked with various software and hardware platforms for the design and development of machine vision algorithms. Ankit has experience with a wide variety of machine learning algorithms. Currently, he is focusing on designing computer vision and machine learning algorithms for medical imaging data, with the use of various advanced technologies such as ensemble methods and deep learning-based models.

About the Reviewers

Apeksha Jain is a data scientist and computer vision engineer from Mumbai, India. She holds a BTech in biomedical engineering and has a master's degree in computer vision specialization. She has been working in the field of computer vision and machine learning for more than 6 years. She has used various software and hardware platforms for the design and development of machine vision algorithms, and has experience on various machine learning algorithms, including deep learning. Currently, she is working on designing computer vision and machine learning algorithms for medical imaging data for Aditya Imaging and Information Technologies (part of the Sun Pharmaceutical advanced research center), Mumbai. She does this with the use of various advanced technologies such as ensemble methods and deep learning-based models.

Radovan Kavicky is the principal data scientist and president at GapData Institute, based in Bratislava, Slovakia, where he harnesses the power of data and wisdom of economics for public good. He is a macroeconomist by education, and consultant and analyst by profession (8+ years of experience in consulting for clients from the public and private sectors), with strong mathematical and analytical skills. He is able to deliver top-level research and analytical work. From MATLAB, SAS, and Stata, he switched to Python, R, and Tableau.

Radovan is an evangelist of open data and a member of the Slovak Economic Association (SEA), Open Budget Initiative, Open Government Partnership, and the global Tableau #DataLeader network (2017). He is the founder of PyData Bratislava, R <- Slovakia, and the SK/CZ Tableau User Group (skczTUG). He has been a speaker at @TechSummit (Bratislava, 2017) and @PyData (Berlin, 2017).

You can follow him on Twitter at @radovankavicky, @GapDataInst, or @PyDataBA. His full profile and experience are available at https://www.linkedin.com/in/radovankavicky/ and https://github.com/radovankavicky.

GapData Institute: https://www.gapdata.org.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/178829775X.

If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Table of Contents

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Introduction to Ensemble Learning

What is ensemble machine learning?

The purpose of ensemble machine learning

How to create an ensemble system

Quantification of performance

Bias and variance errors

 Methods to create ensemble systems

Bagging

Boosting

Stacking

Summary

Decision Trees

How do decision trees work?

ID3 algorithm for decision tree building

Root node

Salary

The Sex attribute

Marital status

Parent node

Choosing between the Sex and Marital attributes for the low salary group

Choosing between the Sex and Marital attributes for the Med salary group

Marital status

Case study – car evaluation problem

Summary

Random Forest

Classification and regression trees

Gini index for impurity check

Node selection

Creating a split

Tree building

At depth – 1 (root node)

At depth – 2 (left branch)

At depth – 2 (right branch)

Case study – breast cancer type prediction

Decision tree bagging

From bagging to random forest

Summary

Random Subspace and KNN Bagging

Subspace bagging

Case study – subspace bagging

 More information about the dataset

KNN classification 

KNN for spam filtering

Dataset

Dataset information

Attribute information

KNN bagging with random subspaces

Summary

AdaBoost Classifier

Boosting

AdaBoost in a nutshell

Weak classifier

AdaBoost in action

Application of the AdaBoost classifier in face detection

Face detection using Haar cascades

Integral image

Implementation using OpenCV

Summary

Gradient Boosting Machines

Gradient Boosting Machines

What is the difference?

Create split

Node selection

Build tree

Regression tree as a classifier

GBM implementation

Algorithm

Improvements to basic gradient boosting

Tree constraints

Weighted updates

Stochastic gradient boosting

Penalized gradient boosting

Summary

XGBoost – eXtreme Gradient Boosting

XGBoost – supervised learning

Models and parameters

Objective function – training loss + regularization

Why introduce the general principle?

XGBoost features

Model features

System features

Algorithm features

Why use XGBoost?

XGBoost execution speed

Model performance

How to install

Building the shared library

Building on Ubuntu/Debian

Building on Windows

A trick for easy installation on a Windows machine

XGBoost in action

Dataset information

Attribute information

XGBoost parameters

General parameters

Booster parameters

Learning task parameters

Parameter tuning – number and size of decision trees

Problem description – Otto dataset

Tune the number of decision trees in XGBoost

Tuning the size of decision trees in XGBoost

Tuning the number of trees and max depth in XGBoost

Summary

Stacked Generalization

Stacked generalization

Submodel training

KNN classification

Distance calculation (Euclidean)

Estimating the neighbors

Making predictions using voting

Perceptron

Training the perceptron

Gradient descent

Stochastic gradient descent

Implementation of perceptron

Logistic regression

The logistic function

Representation of logistic regression

Modeling probability using logistic regression

Learning the model

Prediction using logistic regression

Implementation of algorithm

Stacked generalization implementation

Practical application – Sonar dataset (Mine and Rock prediction)

More information about the dataset

Summary

Stacked Generalization – Part 2

Feature selection

Why feature selection?

Simplification of models

Dataset information

Predicted attribute

Attribute information

Shorter training time

To avoid the curse of dimensionality

Enhanced generalization by reducing overfitting

Feature selection for machine learning

 Univariate selection

Recursive Feature Elimination

Principle Component Analysis

Choosing important features (feature importance)

Understanding the SVM

How does SVM work?

 Hyperplane – separation between the data points 

Implementation of an SVM

Objective function

Function optimization

Handling a nonlinear dataset

Stacking of nonlinear algorithms

Spam classification with stacking

Dataset information

Attribute information

How to choose classifiers?

Summary

Modern Day Machine Learning

Artificial Neural Networks (feed-forward)

How does ANN work?

Training of ANNs

Learning by backpropagation

ANN implementation using Keras and TensorFlow

TensorFlow for machine learning

Keras for machine learning

Digit classification using Keras and TensorFlow

Deep learning

Convolutional Neural Networks

Local receptive fields

Shared weights and biases

Pooling layers

Combining all the layers

Implementation of CNN in Python

Recurrent Neural Networks

How RNN works (unrolling RNN)

Unrolling the forward pass

Unrolling the backward pass

Backpropagation Through Time 

Backpropagation training algorithm

Backpropagation Through Time

Long Short-Term Memory networks

The idea behind LSTMs

Step-by-step LSTM walkthrough

Text generation using LSTM

Problem description – project Gutenberg

LSTM model

Generating text with an LSTM Network

Summary

Troubleshooting

Full code of the implemented algorithm ID3

Code of the CART algorithm

Code for random forest 

Code for KNN and subspace bagging

 KNN subspace bagging code

Code of the AdaBoost classifier

Code of GBMs

Full code of implementation

Full code of LSTM implementation

Preface

Science has given us its biggest gift: Computers. This invention is as significant as Fire. It has changed the history of mankind. Tell me any field of work where computers are not being used; I bet you cannot. Computers are special kind of species that only eats electricity and one precious thing in which all of the world is interested, information a.k.a. DATA. Yes, without data, there is no use of a computer; it is just a television-like screen and nothing more. So the next question arises: What to do with this data? Believe me, every chapter of this book will give you a perspective to utilize your data and extract useful results from it.  

What this book covers

Chapter 1, Introduction to Ensemble Learning, is our introductory chapter to the world of ensembles. So we will see how ensembles can be useful for getting high accuracy from classifiers, and how to quantify the performance of a classifier by analyzing variance and bias errors. We will discuss three important aspects of ensemble algorithms: bagging, boosting, and stacking. We will see decision tree bagging in this chapter. We will also see how boosting works and how to use it. At the end, we will discuss what stacking is and how to implement stacked generalization.