36,59 €
Combine popular machine learning techniques to create ensemble models using Python
Key Features
Book Description
Ensembling is a technique of combining two or more similar or dissimilar machine learning algorithms to create a model that delivers superior predictive power. This book will demonstrate how you can use a variety of weak algorithms to make a strong predictive model.
With its hands-on approach, you'll not only get up to speed on the basic theory but also the application of various ensemble learning techniques. Using examples and real-world datasets, you'll be able to produce better machine learning models to solve supervised learning problems such as classification and regression. Furthermore, you'll go on to leverage ensemble learning techniques such as clustering to produce unsupervised machine learning models. As you progress, the chapters will cover different machine learning algorithms that are widely used in the practical world to make predictions and classifications. You'll even get to grips with the use of Python libraries such as scikit-learn and Keras for implementing different ensemble models.
By the end of this book, you will be well-versed in ensemble learning, and have the skills you need to understand which ensemble method is required for which problem, and successfully implement them in real-world scenarios.
What you will learn
Who this book is for
This book is for data analysts, data scientists, machine learning engineers and other professionals who are looking to generate advanced models using ensemble techniques. An understanding of Python code and basic knowledge of statistics is required to make the most out of this book.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 273
Veröffentlichungsjahr: 2019
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Sunith ShettyAcquisition Editor:Devika BattikeContent Development Editor:Athikho Sapuni RishanaSenior Editor: Martin WhittemoreTechnical Editor: Utkarsha S. KadamCopy Editor: Safis EditingProject Coordinator:Kirti PisatProofreader: Safis EditingIndexer:Manju ArasanProduction Designer:Alishon Mendonsa
First published: July 2019
Production reference: 1180719
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78961-285-1
www.packtpub.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
George Kyriakides is a Ph.D. researcher, studying distributed neural architecture search. His interests and experience include the automated generation and optimization of predictive models for a wide array of applications, such as image recognition, time series analysis, and financial applications. He holds an M.Sc. in computational methods and applications, and a B.Sc. in applied informatics, both from the University of Macedonia, Thessaloniki, Greece.
Konstantinos G. Margaritis has been a teacher and researcher in computer science for more than 30 years. His research interests include parallel and distributed computing, as well as computational intelligence and machine learning. He holds an M.Eng. in electrical engineering (Aristotle University of Thessaloniki, Greece), as well as an M.Sc. and a Ph.D. in computer science (Loughborough University, UK). He is a professor at the Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece.
Greg Walters has been involved with computers and computer programming since 1972. Currently, he is extremely well versed in Visual Basic, Visual Basic .NET, Python, and SQL using MySQL, SQLite, Microsoft SQL Server, Oracle, C++, Delphi, Modula-2, Pascal, C, 80x86 Assembler, COBOL, and Fortran. He is a programming trainer and has trained numerous people on many pieces of computer software, including MySQL, Open Database Connectivity, Quattro Pro, Corel Draw!, Paradox, Microsoft Word, Excel, DOS, Windows 3.11, Windows for Workgroups, Windows 95, Windows NT, Windows 2000, Windows XP, and Linux. He is currently retired and, in his spare time, is a musician and loves to cook, but he is also open to working as a freelancer on various projects.
Bhavesh Bhatt is a technology postgraduate at BITS Pilani with a keen interest in machine learning, data science, and computer vision. He currently works as a data scientist at Fractal Analytics. He has taught data science using the Python programming language to hundreds of students in the classroom. Additionally, Bhavesh hosts a machine learning-based educational YouTube channel with over 4,400 subscribers.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On Ensemble Learning with Python
About Packt
Why subscribe?
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Code in action
Conventions used
Get in touch
Reviews
Section 1: Introduction and Required Software Tools
A Machine Learning Refresher
Technical requirements
Learning from data
Popular machine learning datasets
Diabetes
Breast cancer
Handwritten digits
Supervised and unsupervised learning
Supervised learning
Unsupervised learning
Dimensionality reduction
Performance measures
Cost functions
Mean absolute error
Mean squared error
Cross entropy loss
Metrics
Classification accuracy
Confusion matrix
Sensitivity, specificity, and area under the curve
Precision, recall, and the F1 score
Evaluating models
Machine learning algorithms
Python packages
Supervised learning algorithms
Regression
Support vector machines
Neural networks
Decision trees
K-Nearest Neighbors
K-means
Summary
Getting Started with Ensemble Learning
Technical requirements
Bias, variance, and the trade-off
What is bias?
What is variance?
Trade-off
Ensemble learning
Motivation
Identifying bias and variance
Validation curves
Learning curves
Ensemble methods
Difficulties in ensemble learning
Weak or noisy data
Understanding interpretability
Computational cost
Choosing the right models
Summary
Section 2: Non-Generative Methods
Voting
Technical requirements
Hard and soft voting
Hard voting
Soft voting
​Python implementation
Custom hard voting implementation
Analyzing our results using Python
Using scikit-learn
Hard voting implementation
Soft voting implementation
Analyzing our results
Summary
Stacking
Technical requirements
Meta-learning
Stacking
Creating metadata
Deciding on an ensemble's composition
Selecting base learners
Selecting the meta-learner
Python implementation
Stacking for regression
Stacking for classification
Creating a stacking regressor class for scikit-learn
Summary
Section 3: Generative Methods
Bagging
Technical requirements
Bootstrapping
Creating bootstrap samples
Bagging
Creating base learners
Strengths and weaknesses
Python implementation
Implementation
Parallelizing the implementation
Using scikit-learn 
Bagging for classification
Bagging for regression
Summary
Boosting
Technical requirements
AdaBoost
Weighted sampling
Creating the ensemble
Implementing AdaBoost in Python
Strengths and weaknesses
Gradient boosting
Creating the ensemble
Further reading
Implementing gradient boosting in Python
Using scikit-learn
Using AdaBoost
Using gradient boosting
XGBoost
Using XGBoost for regression
Using XGBoost for classification
Other boosting libraries
Summary
Random Forests
Technical requirements
Understanding random forest trees
Building trees
Illustrative example
Extra trees
Creating forests
Analyzing forests
Strengths and weaknesses
Using scikit-learn
Random forests for classification
Random forests for regression
Extra trees for classification
Extra trees regression
Summary
Section 4: Clustering
Clustering
Technical requirements
Consensus clustering
Hierarchical clustering
K-means clustering
Strengths and weaknesses
Using scikit-learn
Using voting
Using OpenEnsembles
Using graph closure and co-occurrence linkage
Graph closure
Co-occurrence matrix linkage
Summary
Section 5: Real World Applications
Classifying Fraudulent Transactions
Technical requirements
Getting familiar with the dataset
Exploratory analysis
Evaluation methods
Voting
Testing the base learners
Optimizing the decision tree
Creating the ensemble
Stacking
Bagging
Boosting
XGBoost
Using random forests
Comparative analysis of ensembles
Summary
Predicting Bitcoin Prices
Technical requirements
Time series data
Bitcoin data analysis
Establishing a baseline
The simulator
Voting
Improving voting
Stacking
Improving stacking
Bagging
Improving bagging
Boosting
Improving boosting
Random forests
Improving random forest
Summary
Evaluating Sentiment on Twitter
Technical requirements
Sentiment analysis tools 
Stemming
Getting Twitter data
Creating a model
Classifying tweets in real time
Summary
Recommending Movies with Keras
Technical requirements
Demystifying recommendation systems
Neural recommendation systems
Using Keras for movie recommendations
Creating the dot model
Creating the dense model
Creating a stacking ensemble
Summary
Clustering World Happiness
Technical requirements
Understanding the World Happiness Report
Creating the ensemble
Gaining insights
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
Ensembling is a technique for combining two or more similar or dissimilar machine learning algorithms to create a model that delivers superior predictive power. This book will demonstrate how you can use a variety of weak algorithms to make a strong predictive model.With its hands-on approach, you'll not only get up to speed on the basic theory, but also the application of various ensemble learning techniques. Using examples and real-world datasets, you'll be able to produce better machine learning models to solve supervised learning problems such as classification and regression. Later in the book, you'll go on to leverage ensemble learning techniques such as clustering to produce unsupervised machine learning models. As you progress, the chapters will cover different machine learning algorithms that are widely used in the practical world to make predictions and classifications. You'll even get to grips with using Python libraries such as scikit-learn and Keras to implement different ensemble models.By the end of this book, you will be well versed in ensemble learning and have the skills you need to understand which ensemble method is required for which problem, in order to successfully implement them in real-world scenarios.
This book is for data analysts, data scientists, machine learning engineers, and other professionals who are looking to generate advanced models using ensemble techniques.
Chapter 1, A Machine Learning Refresher, presents an overview of machine learning, including basic concepts such as training/test sets, performance measures, supervised and unsupervised learning, machine learning algorithms, and benchmark datasets.
Chapter 2, Getting Started with Ensemble Learning, introduces the concept of ensemble learning, highlighting the problems that it solves as well as the problems that it poses.
Chapter 3, Voting, introduces the most simple ensemble learning technique, voting, while explaining the difference between hard and soft voting. You will learn how to implement a custom classifier, as well as use scikit-learn's implementation of hard/soft voting.
Chapter 4, Stacking, covers meta learning (stacking) a more advanced ensemble learning method. After reading this chapter, you will be able to implement a stacking classifier in Python to use with scikit-learn classifiers.
Chapter 5, Bagging, introduces bootstrap resampling and the first generative ensemble learning technique, bagging. Furthermore, this chapter guides you through the process of implementing the technique in Python, as well as how to use the scikit-learn implementation.
Chapter 6, Boosting, touches on more advanced subjects in ensemble learning. This chapter explains how popular boosting algorithms work and are implemented. Furthermore, it presents XGBoost, a highly successful distributed boosting library.
Chapter 7, Random Forests, goes through the process of creating random decision trees by subsampling the instances and features of a dataset. Moreover, this chapter explains how to utilize an ensemble of random trees to create a random forest. Finally, this chapter presents scikit-learn's implementations and how to use them.
Chapter 8, Clustering, introduces to the possibility of using ensembles for unsupervised learning tasks, such as clustering. Furthermore, the OpenEnsembles Python library is introduced, along with guidance on using it.
Chapter 9, Classifying Fraudulent Transactions, presents an application for the classification of a real-world dataset, using ensemble learning techniques presented in earlier chapters. The dataset concerns fraudulent credit card transactions.
Chapter 10, Predicting Bitcoin Prices, presents an application for the regression of a real-world dataset, using ensemble learning techniques presented in earlier chapters. The dataset concerns the price of the popular cryptocurrency Bitcoin.
Chapter 11, Evaluating Sentiment on Twitter, presents an application for evaluating the sentiment of various tweets using a real-world dataset.
Chapter 12, Recommending Movies with Keras, presents the process of creating a recommender system using ensembles of neural networks.
Chapter 13, Clustering World Happiness, presents the process of using an ensemble learning approach to cluster data from the World Happiness Report 2018.
This book is aimed at analysts, data scientists, engineers, and other professionals who have an interest in generating advanced models that describe and generalize datasets of interest to them. It is assumed that the reader has basic experience of programming in Python and is familiar with elementary machine learning models. Furthermore, a basic understanding of statistics is assumed, although key points and more advanced concepts are briefly presented. Familiarity with Python's scikit-learn module would be greatly beneficial, although it is not strictly required. A standard Python installation is required. Anaconda Distribution (https://www.anaconda.com/distribution/) greatly simplifies the task of installing and managing the various Python packages, although it is not necessary. Finally, a good Integrated Development Environment (IDE) is extremely useful for managing your code and debugging. In our examples, we usually utilize the Spyder IDE, which can be easily installed through Anaconda.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest versions of the following:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for macOS
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Ensemble-Learning-with-Python. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781789612851_ColorImages.pdf.
Visit the following link to check out videos of the code being run: http://bit.ly/2GfnRrv.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."
A block of code is set as follows:
# --- SECTION 6 ---# Accuracy of hard votingprint('-'*30)print('Hard Voting:', accuracy_score(y_test, hard_predictions))
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Thus, the preferred approach is to utilize K-fold cross validation."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
This section is a refresher on basic machine learning concepts and an introduction to ensemble learning. We will have an overview of machine learning and various concepts pertaining to it, such as train and test sets, supervised and unsupervised learning, and more. We will also learn about the concept of ensemble learning.
This section comprises the following chapters:
Chapter 1
,
A Machine Learning Refresher
Chapter 2
,
Getting Started with Ensemble Learning
Machine learning is a sub field of artificial intelligence (AI) focused on the aim of developing algorithms and techniques that enable computers to learn from massive amounts of data. Given the increasing rate at which data is produced, machine learning has played a critical role in solving difficult problems in recent years. This success was the main driving force behind the funding and development of many great machine learning libraries that make use of data in order to build predictive models. Furthermore, businesses have started to realize the potential of machine learning, driving the demand for data scientists and machine learning engineers to new heights, in order to design better-performing predictive models.
This chapter serves as a refresher on the main concepts and terminology, as well as an introduction to the frameworks that will be used throughout the book, in order to approach ensemble learning with a solid foundation.
The main topics covered in this chapter are the following:
The various machine learning problems and datasets
How to evaluate the performance of a predictive model
Machine learning algorithms
Python environment setup and the required libraries
You will require basic knowledge of machine learning techniques and algorithms. Furthermore, a knowledge of python conventions and syntax is required. Finally, familiarity with the NumPy library will greatly help the reader to understand some custom algorithm implementations.
The code files of this chapter can be found on GitHub:
https://github.com/PacktPublishing/Hands-On-Ensemble-Learning-with-Python/tree/master/Chapter01
Check out the following video to see the Code in Action: http://bit.ly/30u8sv8.
Data is the raw ingredient of machine learning. Processing data can produce information; for example, measuring the height of a portion of a school's students (data) and calculating their average (processing) can give us an idea of the whole school's height (information). If we process the data further, for example, by grouping males and females and calculating two averages – one for each group, we will gain more information, as we will have an idea about the average height of the school's males and females. Machine learning strives to produce the most information possible from any given data. In this example, we produced a very basic predictive model. By calculating the two averages, we can predict the average height of any student just by knowing whether the student is male or female.
The set of data that a machine learning algorithm is tasked with processing is called the problem's dataset. In our example, the dataset consists of height measurements (in centimeters) and the child's sex (male/female). In machine learning, input variables are called features and output variables are called targets. In this dataset, the features of our predictive model consist solely of the students' sex, while our target is the students' height in centimeters. The predictive model that is produced and maps features to targets will be referred to as simply the model from now on, unless otherwise specified. Each data point is called an instance. In this problem, each student is an instance of the dataset.
When the target is a continuous variable (a number), it presents a regression problem, as the aim is to regress the target on the features. When the target is a set of categories, it presents a classification problem, as we try to assign each instance to a category or class.
Note that, in classification problems, the target class can be represented by a number; this does not mean that it is a regression problem. The most useful way to determine whether it is a regression problem is to think about whether the instances can be ordered by their targets. In our example, the target is height, so we can order the students from tallest to shortest, as 100 cm is less than 110 cm. As a counter example, if the target was their favorite color, we could represent each color by a number, but we could not order them. Even if we represented red as one and blue as two, we could not say that red is "before" or "less than" blue. Thus, this counter example is a classification problem.
Machine learning relies on data in order to produce high-performing models. Without data, it's not even possible to create models. In this section, we'll present some popular machine learning datasets, which we will utilize throughout this book.
The diabetes dataset concerns 442 individual diabetes patients and the progression of the disease one year after a baseline measurement. The dataset consists of 10 features, which are the patient's age, sex, body mass index (bmi), average blood pressure (bp), and six measurements of their blood serum. The dataset target is the progression of the disease one year after the baseline measurement. This is a regression dataset, as the target is a number.
In this book, the dataset features are mean-centered and scaled such that the dataset sum of squares for each feature equals one. The following table depicts a sample of the diabetes dataset:
age
sex
bmi
bp
s1
s2
s3
s4
s5
s6
target
0.04
0.05
0.06
0.02
-0.04
-0.03
-0.04
0.00
0.02
-0.02
151
0.00
-0.04
-0.05
-0.03
-0.01
-0.02
0.07
-0.04
-0.07
-0.09
75
0.09
0.05
0.04
-0.01
-0.05
-0.03
-0.03
0.00
0.00
-0.03
141
-0.09
-0.04
-0.01
-0.04
0.01
0.02
-0.04
0.03
0.02
-0.01
206
The breast cancer dataset concerns 569 biopsies of malignant and benign tumors. The dataset provides 30 features extracted from images of fine-needle aspiration biopsies that describe cell nuclei. The images provide information about the shape, size, and texture of each cell nucleus. Furthermore, for each characteristic, three distinct values are provided. The mean, the standard error, and the worst or largest value. This ensures that, for each image, the cell population is adequately described.
The dataset target concerns the diagnosis, that is, whether a tumor is malignant or benign. Thus, this is a classification dataset. The available features are listed as follows:
Mean radius
Mean texture
Mean perimeter
Mean area
Mean smoothness
Mean compactness
Mean concavity
Mean concave points
Mean symmetry
Mean fractal dimension
Radius error
Texture error
Perimeter error
Area error
Smoothness error
Compactness error
Concavity error
Concave points error
Symmetry error
Fractal dimension error
Worst radius
Worst texture
Worst perimeter
Worst area
Worst smoothness
Worst compactness
Worst concavity
Worst concave points
Worst symmetry
Worst fractal dimension
The MNIST handwritten digit dataset is one of the most famous image recognition datasets. It consists of square images, 8 x 8 pixels, each containing a single handwritten digit. Thus, the dataset features are an 8 by 8 matrix, containing each pixel's color in grayscale. The target consists of 10 classes, one for each digit from 0 to 9. This is a classification dataset. The following figure is a sample from the handwritten digit dataset:
Machine learning can be divided into many subcategories; two broad categories are supervised and unsupervised learning. These categories contain some of the most popular and widely used machine learning methods. In this section, we present them, as well as some toy example uses of supervised and unsupervised learning.
In examples such as those in the previous section, the data consisted of some features and a target; no matter whether the target was quantitative (regression) or categorical (classification). Under these circumstances, we call the dataset a labeled dataset. When we try to produce a model from a labeled dataset in order to make predictions about unseen or future data (for example, to diagnose a new tumor case), we make use of supervised learning. In simple cases, supervised learning models can be visualized as a line. This line's purpose is to either separate the data based on the target (in classification) or to closely follow the data (in regression).
The following figure illustrates a simple regression example. Here, y is the target and x is the dataset feature. Our model consists of the simple equation y=2x-5. As is evident, the line closely follows the data. In order to estimate the y value of a new unseen point, we calculate its value using the preceding formula. The following figure shows a simple regression with y=2x-5 as the predictive model:
In the following figure, a simple classification problem is depicted. Here, the dataset features are x and y, while the target is the instance color. Again, the dotted line is y=2x-5, but this time we test whether the point is above or below the line. If the point's y value is lower than expected (smaller), then we expect it to be orange. If it is higher (greater), we expect it to be blue. The following figure is a simple classification with y=2x-5 as the boundary:
In both regression and classification, we have a clear understanding of how the data is structured or how it behaves. Our goal is to simply model that structure or behavior. In some cases, we do not know how the data is structured. In those cases, we can utilize unsupervised learning in order to discover the structure, and thus information, within the data. The simplest form of unsupervised learning is clustering. As the name implies, clustering techniques attempt to group (or cluster) data instances. Thus, instances that belong to the same cluster share many similarities in their features, while they are dissimilar to instances that belong in separate clusters. A simple example with three clusters is depicted in the following figure. Here, the dataset features are x and y, while there is no target.
The clustering algorithm discovered three distinct groups, centered around the points (0, 0), (1, 1), and (2, 2):
Another form of unsupervised learning is dimensionality reduction. The number of features present in a dataset equals the dataset's dimensions. Often, many features can be correlated, noisy, or simply not provide much information. Nonetheless, the cost of storing and processing data is correlated with a dataset's dimensionality. Thus, by reducing the dataset's dimensions, we can help the algorithms to better model the data.
Another use of dimensionality reduction is for the visualization of high-dimensional datasets. For example, using the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm, we can reduce the breast cancer dataset to two dimensions or components. Although it is not easy to visualize 30 dimensions, it is quite easy to visualize two.
Furthermore, we can visually test whether the information contained within the dataset can be utilized to separate the dataset's classes or not. The next figure depicts the two components on the y and x axis, while the color represents the instance's class. Although we cannot plot all of the dimensions, by plotting the two components, we can conclude that a degree of separability between the classes exists:
Machine learning is a highly quantitative field. Although we can gauge the performance of a model by plotting how it separates classes and how closely it follows data, more quantitative performance measures are needed in order to evaluate models. In this section, we present cost functions and metrics. Both of them are used in order to assess a model's performance.
A machine learning model's objective is to model our dataset. In order to assess each model's performance, we define an objective function. These functions usually express a cost, or how far from perfect a model is. These cost functions usually utilize a loss function to assess how well the model performed on each individual dataset instance.
Some of the most widely used cost functions are described in the following sections, assuming that the dataset has n instances, the target's true value for instance i is ti and the model's output is yi .
Mean absolute error (MAE) or L1 loss is the mean absolute distance between the target's real values and the model's outputs. It is calculated as follows:
Mean squared error (MSE) or L2 loss is the mean squared distance between the target's real values and the model's output. It is calculated as follows:
Cross entropy loss is used in models that output probabilities between 0 and 1, usually to express the probability that an instance is a member of a specific class. As the output probability diverges from the actual label, the loss increases. For a simple case where the dataset consists of two classes, it is calculated as follows:
Cost functions are useful when we try to numerically optimize our models. But as humans, we need metrics that are useful and intuitive to understand and report. As such, there are a number of metrics available that give insight into a model's performance. The most common metrics are presented in the following sections.
The simplest and easiest to grasp of all, classification accuracy refers to the percentage of correct predictions. In order to calculate accuracy, we divide the number of correct predictions by the total number of instances:
