41,99 €
Use scikit-learn to apply machine learning to real-world problems
This book is intended for software engineers who want to understand how common machine learning algorithms work and develop an intuition for how to use them, and for data scientists who want to learn about the scikit-learn API. Familiarity with machine learning fundamentals and Python are helpful, but not required.
Machine learning is the buzzword bringing computer science and statistics together to build smart and efficient models. Using powerful algorithms and techniques offered by machine learning you can automate any analytical model.
This book examines a variety of machine learning models including popular machine learning algorithms such as k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. It discusses data preprocessing, hyperparameter optimization, and ensemble methods. You will build systems that classify documents, recognize images, detect ads, and more. You will learn to use scikit-learn's API to extract features from categorical variables, text and images; evaluate model performance, and develop an intuition for how to improve your model's performance.
By the end of this book, you will master all required concepts of scikit-learn to build efficient models at work to carry out advanced tasks with the practical approach.
This book is motivated by the belief that you do not understand something until you can describe it simply. Work through toy problems to develop your understanding of the learning algorithms and models, then apply your learnings to real-life problems.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 273
Veröffentlichungsjahr: 2017
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: October 2014
second published: July 2017
Production reference: 1200717
ISBN 978-1-78829-987-9
www.packtpub.com
Author
Gavin Hackeling
Copy Editors
Safis Editing
Vikrant Phadkay
Reviewer
Oleg Okun
Project Coordinator
Nidhi Joshi
Commissioning Editor
Amey Varangaonkar
Proofreader
Safis Editing
Acquisition Editor
Aman Singh
Indexer
Tejal Daruwale Soni
Content Development Editor
Aishwarya Pandere
Graphics
Tania Dutta
Technical Editor
Suwarna Patil
Production Coordinator
Arvindkumar Gupta
Gavin Hackeling is a data scientist and author. He was worked on a variety of machine learning problems, including automatic speech recognition, document classification, object recognition, and semantic segmentation. An alumnus of the University of North Carolina and New York University, he lives in Brooklyn with his wife and cat.
Oleg Okun is a machine learning expert and an author/editor of four books, numerous journal articles, and conference papers. His career spans more than a quarter of a century. He was employed in both academia and industry in his motherland, Belarus, and abroad (Finland, Sweden, and Germany). His work experience includes document image analysis, fingerprint biometrics, bioinformatics, online/offline marketing analytics, credit scoring analytics, and text analytics.
He is interested in all aspects of distributed machine learning and the Internet of Things. Oleg currently lives and works in Hamburg, Germany.
For support files and downloads related to your book, please visitwww.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.comand as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1788299876.
If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
The Fundamentals of Machine Learning
Defining machine learning
Learning from experience
Machine learning tasks
Training data, testing data, and validation data
Bias and variance
An introduction to scikit-learn
Installing scikit-learn
Installing using pip
Installing on Windows
Installing on Ubuntu 16.04
Installing on Mac OS
Installing Anaconda
Verifying the installation
Installing pandas, Pillow, NLTK, and matplotlib
Summary
Simple Linear Regression
Simple linear regression
Evaluating the fitness of the model with a cost function
Solving OLS for simple linear regression
Evaluating the model
Summary
Classification and Regression with k-Nearest Neighbors
K-Nearest Neighbors
Lazy learning and non-parametric models
Classification with KNN
Regression with KNN
Scaling features
Summary
Feature Extraction
Extracting features from categorical variables
Standardizing features
Extracting features from text
The bag-of-words model
Stop word filtering
Stemming and lemmatization
Extending bag-of-words with tf-idf weights
Space-efficient feature vectorizing with the hashing trick
Word embeddings
Extracting features from images
Extracting features from pixel intensities
Using convolutional neural network activations as features
Summary
From Simple Linear Regression to Multiple Linear Regression
Multiple linear regression
Polynomial regression
Regularization
Applying linear regression
Exploring the data
Fitting and evaluating the model
Gradient descent
Summary
From Linear Regression to Logistic Regression
Binary classification with logistic regression
Spam filtering
Binary classification performance metrics
Accuracy
Precision and recall
Calculating the F1 measure
ROC AUC
Tuning models with grid search
Multi-class classification
Multi-class classification performance metrics
Multi-label classification and problem transformation
Multi-label classification performance metrics
Summary
Naive Bayes
Bayes' theorem
Generative and discriminative models
Naive Bayes
Assumptions of Naive Bayes
Naive Bayes with scikit-learn
Summary
Nonlinear Classification and Regression with Decision Trees
Decision trees
Training decision trees
Selecting the questions
Information gain
Gini impurity
Decision trees with scikit-learn
Advantages and disadvantages of decision trees
Summary
From Decision Trees to Random Forests and Other Ensemble Methods
Bagging
Boosting
Stacking
Summary
The Perceptron
The perceptron
Activation functions
The perceptron learning algorithm
Binary classification with the perceptron
Document classification with the perceptron
Limitations of the perceptron
Summary
From the Perceptron to Support Vector Machines
Kernels and the kernel trick
Maximum margin classification and support vectors
Classifying characters in scikit-learn
Classifying handwritten digits
Classifying characters in natural images
Summary
From the Perceptron to Artificial Neural Networks
Nonlinear decision boundaries
Feed-forward and feedback ANNs
Multi-layer perceptrons
Training multi-layer perceptrons
Backpropagation
Training a multi-layer perceptron to approximate XOR
Training a multi-layer perceptron to classify handwritten digits
Summary
K-means
Clustering
K-means
Local optima
Selecting K with the elbow method
Evaluating clusters
Image quantization
Clustering to learn features
Summary
Dimensionality Reduction with Principal Component Analysis
Principal component analysis
Variance, covariance, and covariance matrices
Eigenvectors and eigenvalues
Performing PCA
Visualizing high-dimensional data with PCA
Face recognition with PCA
Summary
In recent years, popular imagination has become fascinated by machine learning. The discipline has found a variety of applications. Some of these applications, such as spam filtering, are ubiquitous and have been rendered mundane by their successes. Many other applications have only recently been conceived, and hint at machine learning's potential.
In this book, we will examine several machine learning models and learning algorithms. We will discuss tasks that machine learning is commonly applied to, and we will learn to measure the performance of machine learning systems. We will work with a popular library for the Python programming language called scikit-learn, which has assembled state-of-the-art implementations of many machine learning algorithms under an intuitive and versatile API.
Chapter 1, The Fundamentals of Machine Learning, defines machine learning as the study and design of programs that improve their performance of a task by learning from experience. This definition guides the other chapters; in each, we will examine a machine learning model, apply it to a task, and measure its performance.
Chapter 2, Simple Linear Regression, discusses a model that relates a single feature to a continuous response variable. We will learn about cost functions and use the normal equation to optimize the model.
Chapter 3, Classification and Regression with K-Nearest Neighbors, introduces a simple, nonlinear model for classification and regression tasks.
Chapter 4, Feature Extraction, describes methods for representing text, images, and categorical variables as features that can be used in machine learning models.
Chapter 5, From Simple Linear Regression to Multiple Linear Regression, discusses a generalization of simple linear regression that regresses a continuous response variable onto multiple features.
Chapter 6, From Linear Regression to Logistic Regression, further generalizes multiple linear regression and introduces a model for binary classification tasks.
Chapter 7, Naive Bayes, discusses Bayes’ theorem and the Naive Bayes family of classifiers, and compares generative and discriminative models.
Chapter 8, Nonlinear Classification and Regression with Decision Trees, introduces the decision tree, a simple, nonlinear model for classification and regression tasks.
Chapter 9, From Decision Trees to Random Forests and other Ensemble Methods, discusses three methods for combining models called bagging, boosting, and stacking.
Chapter 10, The Perceptron, introduces a simple online model for binary classification.
Chapter 11, From the Perceptron to Support Vector Machines, discusses a powerful, discriminative model for classification and regression called the support vector machine, and a technique for efficiently projecting features to higher dimensional spaces.
Chapter 12, From the Perceptron to Artificial Neural Networks, introduces powerful nonlinear models for classification and regression built from graphs of artificial neurons.
Chapter 13, K-means, discusses an algorithm that can be used to find structures in unlabeled data.
Chapter 14, Dimensionality Reduction with Principal Component Analysis, describes a method for reducing the dimensions of data that can mitigate the curse of dimensionality.
The examples in this book require Python >= 2.7 or >= 3.3 and pip, the PyPA recommended tool for installing Python packages. The examples are intended to be executed in a Jupyter notebook or an IPython interpreter. Chapter 1, The Fundamentals of Machine Learning shows how to install scikit-learn 0.18.1, its dependencies, and other libraries on Ubuntu, Mac OS, and Windows.
This book is intended for software engineers who want to understand how common machine learning algorithms work and develop an intuition for how to use them. It is also for data scientists who want to learn about the scikit-learn API. Familiarity with machine learning fundamentals and Python is helpful but not required.
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The package is named sklearn because scikit-learn is not a valid Python package name."
# In[1]:import sklearn sklearn.__version__ # Out[1]:'0.18.1'
New terms and important words are shown in bold.
Feedback from our readers is always welcome. Let us know what you thought about this book-what you liked or disliked. Reader feedback is important for us as it helps us to develop titles that you will really get the most out of. To send us general feedback, simply email [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. You can download the code files by following these steps:
Log in or register to our website using your e-mail address and password.
Hover the mouse pointer on the
SUPPORT
tab at the top.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on
Code Download
.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Machine-Learning-with-scikit-learn-Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Although we have taken care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us to improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspects of this book, you can contact us at [email protected], and we will do our best to address it.
In this chapter, we will review fundamental concepts in machine learning. We will compare supervised and unsupervised learning; discuss the uses of training, testing, and validation data; and describe applications of machine learning. Finally, we will introduce scikit-learn, and install the tools required in subsequent chapters.
Our imaginations have long been captivated by visions of machines that can learn and imitate human intelligence. While machines capable of general artificial intelligence-like Arthur C. Clarke's HAL and Isaac Asimov's Sonny-have yet to be realized, software programs that can acquire new knowledge and skills through experience are becoming increasingly common. We use such machine learning programs to discover new music that we might enjoy, and to find exactly the shoes we want to purchase online. Machine learning programs allow us to dictate commands to our smart phones, and allow our thermostats to set their own temperatures. Machine learning programs can decipher sloppily-written mailing addresses better than humans, and can guard credit cards from fraud more vigilantly. From investigating new medicines to estimating the page views for versions of a headline, machine learning software is becoming central to many industries. Machine learning has even encroached on activities that have long been considered uniquely human, such as writing the sports column recapping the Duke basketball team's loss to UNC.
Machine learning is the design and study of software artifacts that use past experience to inform future decisions; machine learning is the study of programs that learn from data. The fundamental goal of machine learning is to generalize, or to induce an unknown rule from examples of the rule's application. The canonical example of machine learning is spam filtering. By observing thousands of emails that have been previously labeled as either spam or ham, spam filters learn to classify new messages. Arthur Samuel, a computer scientist who pioneered the study of artificial intelligence, said that machine learning is the "study that gives computers the ability to learn without being explicitly programmed". Throughout the 1950s and 1960s, Samuel developed programs that played checkers. While the rules of checkers are simple, complex strategies are required to defeat skilled opponents. Samuel never explicitly programmed these strategies, but through the experience of playing thousands of games, the program learned complex behaviors that allowed it to beat many human opponents.
A popular quote from computer scientist Tom Mitchell defines machine learning more formally: "A program can be said to learn from experience 'E' with respect to some class of tasks 'T' and performance measure 'P', if its performance at tasks in 'T', as measured by 'P', improves with experience 'E'." For example, assume that you have a collection of pictures. Each picture depicts either a dog or a cat. A task could be sorting the pictures into separate collections of dog and cat photos. A program could learn to perform this task by observing pictures that have already been sorted, and it could evaluate its performance by calculating the percentage of correctly classified pictures.
We will use Mitchell's definition of machine learning to organize this chapter. First, we will discuss types of experience, including supervised learning and unsupervised learning. Next, we will discuss common tasks that can be performed by machine learning systems. Finally, we will discuss performance measures that can be used to assess machine learning systems.
Machine learning systems are often described as learning from experience either with or without supervision from humans. In supervised learning problems, a program predicts an output for an input by learning from pairs of labeled inputs and outputs. That is, the program learns from examples of the "right answers". In unsupervised learning, a program does not learn from labeled data. Instead, it attempts to discover patterns in data. For example, assume that you have collected data describing the heights and weights of people. An example of an unsupervised learning problem is dividing the data points into groups. A program might produce groups that correspond to men and women, or children and adults. Now assume that the data is also labeled with the person's sex. An example of a supervised learning problem is to induce a rule for predicting whether a person is male or female based on his or her height and weight. We will discuss algorithms and examples of supervised and unsupervised learning in the following chapters.
Supervised learning and unsupervised learning can be thought of as occupying opposite ends of a spectrum. Some types of problem, called semi-supervised learning problems, make use of both supervised and unsupervised data; these problems are located on the spectrum between supervised and unsupervised learning. Reinforcement learning is located near the supervised end of the spectrum. Unlike supervised learning, reinforcement learning programs do not learn from labeled pairs of inputs and outputs. Instead, they receive feedback for their decisions, but errors are not explicitly corrected. For example, a reinforcement learning program that is learning to play a side-scrolling video game like Super Mario Bros may receive a reward when it completes a level or exceeds a certain score, and a punishment when it loses a life. However, this supervised feedback is not associated with specific decisions to run, avoid Goombas, or pick up fire flowers. We will focus primarily on supervised and unsupervised learning, as these categories include most common machine learning problems. In the next sections, we will review supervised and unsupervised learning in more detail.
A supervised learning program learns from labeled examples of the outputs that should be produced for an input. There are many names for the output of a machine learning program. Several disciplines converge in machine learning, and many of those disciplines use their own terminology. In this book, we will refer to the output as the response variable. Other names for response variables include "dependent variables", "regressands", "criterion variables", "measured variables", "responding variables", "explained variables", "outcome variables", "experimental variables", "labels", and "output variables". Similarly, the input variables have several names. In this book, we will refer to inputs as features, and the phenomena they represent as explanatory variables. Other names for explanatory variables include "predictors", "regressors", "controlled variables", and "exposure variables". Response variables and explanatory variables may take real or discrete values.
The collection of examples that comprise supervised experience is called a training set. A collection of examples that is used to assess the performance of a program is called a test set. The response variable can be thought of as the answer to the question posed by the explanatory variables; supervised learning problems learn from a collection of answers to different questions. That is, supervised learning programs are provided with the correct answers and must learn to respond correctly to unseen, but similar, questions.
Two of the most common supervised machine learning tasks are classification and regression. In classification tasks, the program must learn to predict discrete values for one or more response variables from one or more features. That is, the program must predict the most probable category, class, or label for new observations. Applications of classification include predicting whether a stock's price will rise or fall, or deciding whether a news article belongs to the politics or leisure sections. In regression problems, the program must predict the values of one more or continuous response variables from one or more features. Examples of regression problems include predicting the sales revenue for a new product, or predicting the salary for a job based on its description. Like classification, regression problems require supervised learning.
A common unsupervised learning task is to discover groups of related observations, called clusters, within the dataset. This task, called clustering or cluster analysis, assigns observations into groups such that observations within a groups are more similar to each other based on some similarity measure than they are to observations in other groups. Clustering is often used to explore a dataset. For example, given a collection of movie reviews, a clustering algorithm might discover the sets of positive and negative reviews. The system will not be able to label the clusters as positive or negative; without supervision, it will only have knowledge that the grouped observations are similar to each other by some measure. A common application of clustering is discovering segments of customers within a market for a product. By understanding what attributes are common to particular groups of customers, marketers can decide what aspects of their campaigns to emphasize. Clustering is also used by internet radio services; given a collection of songs, a clustering algorithm might be able to group the songs according to their genres. Using different similarity measures, the same clustering algorithm might group the songs by their keys, or by the instruments they contain.
Dimensionality reduction is another task that is commonly accomplished using unsupervised learning. Some problems may contain thousands or millions of features, which can be computationally costly to work with. Additionally, the program's ability to generalize may be reduced if some of the features capture noise or are irrelevant to the underlying relationship. Dimensionality reduction is the process of discovering the features that account for the greatest changes in the response variable. Dimensionality reduction can also be used to visualize data. It is easy to visualize a regression problem such as predicting the price of a home from its size; the size of the home can be plotted on the graph's x axis, and the price of the home can be plotted on the y axis. It is similarly easy to visualize the housing price regression problem when a second feature is added; the number of bathrooms in the house could be plotted on the z axis, for instance. A problem with thousands of features, however, becomes impossible to visualize.
As mentioned previously, a training set is a collection of observations. These observations comprise the experience that the algorithm uses to learn. In supervised learning problems, each observation consists of an observed response variable and features of one or more observed explanatory variables. The test set is a similar collection of observations. The test set is used to evaluate the performance of the model using some performance metric. It is important that no observations from the training set are included in the test set. If the test set does contain examples from the training set, it will be difficult to assess whether the algorithm has learned to generalize from the training set or has simply memorized it. A program that generalizes well will be able to effectively perform a task with new data. In contrast, a program that memorizes the training data by learning an overly-complex model could predict the values of the response variable for the training set accurately, but will fail to predict the value of the response variable for new examples. Memorizing the training set is called overfitting. A program that memorizes its observations may not perform its task well, as it could memorize relations and structure that are coincidental in the training data. Balancing generalization and memorization is a problem common to many machine learning algorithms. In later chapters we will discuss regularization, which can be applied to many models to reduce over-fitting.
In addition to the training and test data, a third set of observations, called a validation or hold-out set, is sometimes required. The validation set is used to tune variables called hyperparameters that control how the algorithm learns from the training data. The program is still evaluated on the test set to provide an estimate of its performance in the real world. The validation set should not be used to estimate real-world performance because the program has been tuned to learn from the training data in a way that optimizes its score on the validation data; the program will not have this advantage in the real world.
It is common to partition a single set of supervised observations into training, validation, and test sets. There are no requirements for the sizes of the partitions, and they may vary according to the amount of data available. It is common to allocate between fifty and seventy-five percent of the data to the training set, ten to twenty-five percent of the data to the test set, and the remainder to the validation set.
Some training sets may contain only a few hundred observations; others may include millions. Inexpensive storage, increased network connectivity, and the ubiquity of sensor-packed smartphones have contributed to the contemporary state of big data, or training sets with millions or billions of examples. While this book will not work with datasets that require parallel processing on tens or hundreds of computers, the predictive power of many machine learning algorithms improves as the amount of training data increases. However, machine learning algorithms also follow the maxim "garbage in, garbage out". A student who studies for a test by reading a large, confusing textbook that contains many errors likely will not score better than a student who reads a short but well-written textbook. Similarly, an algorithm trained on a large collection of noisy, irrelevant, or incorrectly-labeled data will not perform better than an algorithm trained on a smaller set of data that is more representative of the problem in the real-world.
Many supervised training sets are prepared manually or by semi-automated processes. Creating a large collection of supervised data can be costly in some domains. Fortunately, several datasets are bundled with scikit-learn, allowing developers to focus on experimenting with models instead. During development, and particularly when training data is scarce, a practice called cross-validation can be used to train and validate a model on the same data. In cross-validation, the training data is partitioned. The model is trained using all but one of the partitions, and tested on the remaining partition. The partitions are then rotated several times so that the model is trained and evaluated on all of the data. The mean of the model's scores on each of the partitions is a better estimate of performance in the real world than an evaluation using a single training/testing split. The following diagram depicts cross validation with five partitions, or folds.
The original dataset is partitioned into five subsets of equal size labeled A through E. Initially the model is trained on partitions B through E, and tested on partition A. In the next iteration, the model is trained on partitions A, C, D, and E, and tested on partition B. The partitions are rotated until models have been trained and tested on all of the partitions. Cross-validation provides a more accurate estimate of the model's performance than testing a single partition of the data.
Many metrics can be used to measure whether or not a program is learning to perform its task more effectively. For supervised learning problems, many performance metrics measure the amount of prediction error. There are two fundamental causes of prediction error: a model's bias, and its variance. Assume that you have many training sets that are all unique, but equally representative of the population. A model with high bias will produce similar errors for an input regardless of the training set it used to learn; the model biases its own assumptions about the real relationship over the relationship demonstrated in the training data. A model with high variance, conversely, will produce different errors for an input depending on the training set that it used to learn. A model with high bias is inflexible, but a model with high variance may be so flexible that it models the noise in the training set. That is, a model with high variance over-fits the training data, while a model with high bias under-fits the training data. It can be helpful to visualize bias and variance as darts thrown at a dartboard. Each dart is analogous to a prediction, and is thrown by a model trained on a different dataset every time. A model with high bias but low variance will throw darts that will be tightly clustered, but could be far from the bulls-eye. A model with high bias and high variance will throw darts all over the board; the darts are far from the bulls-eye and from each other. A model with low bias and high variance will throw darts that could be poorly clustered but close to the bulls-eye. Finally, a model with low bias and low variance will throw darts that are tightly clustered around the bulls-eye.
Ideally, a model will have both low bias and variance, but efforts to decrease one will frequently increase the other. This is known as the bias-variance trade-off. We will discuss the biases and variances of many of the models introduced in this book.
Unsupervised learning problems do not have an error signal to measure; instead, performance metrics for unsupervised learning problems measure some attribute of the structure discovered in the data, such as the distances within and between clusters.
