Hands-On Neural Networks with TensorFlow 2.0 - Paolo Galeone - E-Book

Hands-On Neural Networks with TensorFlow 2.0 E-Book

Paolo Galeone

0,0
36,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A comprehensive guide to developing neural network-based solutions using TensorFlow 2.0




Key Features



  • Understand the basics of machine learning and discover the power of neural networks and deep learning


  • Explore the structure of the TensorFlow framework and understand how to transition to TF 2.0


  • Solve any deep learning problem by developing neural network-based solutions using TF 2.0



Book Description



TensorFlow, the most popular and widely used machine learning framework, has made it possible for almost anyone to develop machine learning solutions with ease. With TensorFlow (TF) 2.0, you'll explore a revamped framework structure, offering a wide variety of new features aimed at improving productivity and ease of use for developers.







This book covers machine learning with a focus on developing neural network-based solutions. You'll start by getting familiar with the concepts and techniques required to build solutions to deep learning problems. As you advance, you'll learn how to create classifiers, build object detection and semantic segmentation networks, train generative models, and speed up the development process using TF 2.0 tools such as TensorFlow Datasets and TensorFlow Hub.







By the end of this TensorFlow book, you'll be ready to solve any machine learning problem by developing solutions using TF 2.0 and putting them into production.




What you will learn



  • Grasp machine learning and neural network techniques to solve challenging tasks


  • Apply the new features of TF 2.0 to speed up development


  • Use TensorFlow Datasets (tfds) and the tf.data API to build high-efficiency data input pipelines


  • Perform transfer learning and fine-tuning with TensorFlow Hub


  • Define and train networks to solve object detection and semantic segmentation problems


  • Train Generative Adversarial Networks (GANs) to generate images and data distributions


  • Use the SavedModel file format to put a model, or a generic computational graph, into production



Who this book is for



If you're a developer who wants to get started with machine learning and TensorFlow, or a data scientist interested in developing neural network solutions in TF 2.0, this book is for you. Experienced machine learning engineers who want to master the new features of the TensorFlow framework will also find this book useful.






Basic knowledge of calculus and a strong understanding of Python programming will help you grasp the topics covered in this book.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 432

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Hands-On Neural Networks with TensorFlow 2.0

 

 

 

 

 

 

 

 

Understand TensorFlow, from static graph to eager execution, and design neural networks

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Paolo Galeone

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Hands-On Neural Networks with TensorFlow 2.0

Copyright © 2019 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

Commissioning Editor: Sunith ShettyAcquisition Editor:Yogesh DeokarContent Development Editor:Athikho Sapuni RishanaSenior Editor: Sofi RogersTechnical Editor: Utkarsha S. KadamCopy Editor: Safis EditingProject Coordinator:Kirti PisatProofreader: Safis EditingIndexer:Tejal Daruwale SoniProduction Designer:Shraddha Falebhai

First published: September 2019

Production reference: 1170919

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78961-555-5

www.packt.com

 

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Fully searchable for easy access to vital information

Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. 

Contributors

About the author

Paolo Galeone is a computer engineer with strong practical experience. After getting his MSc degree, he joined the Computer Vision Laboratory at the University of Bologna, Italy, as a research fellow, where he improved his computer vision and machine learning knowledge working on a broad range of research topics. Currently, he leads the Computer Vision and Machine Learning laboratory at ZURU Tech, Italy. 

In 2019, Google recognized his expertise by awarding him the title of Google Developer Expert (GDE) in Machine Learning. As a GDE, he shares his passion for machine learning and the TensorFlow framework by blogging, speaking at conferences, contributing to open-source projects, and answering questions on Stack Overflow.

 

About the reviewer

Luca Massaron is a data scientist, with over 15 years of experience in analytical roles, who interprets big data and transforms it into smart data by means of both the simplest and the most effective data mining and machine learning techniques. He is the author of 10 books on machine learning, deep learning, algorithms, and AI, and is a Google Developer Expert (GDE) in machine learning.

My sincerest thanks to my family, Yukiko and Amelia, for their support and loving patience.

 

 

 

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Hands-On Neural Networks with TensorFlow 2.0

About Packt

Why subscribe?

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Neural Network Fundamentals

What is Machine Learning?

The importance of the dataset

n-dimensional spaces

The curse of dimensionality

Supervised learning

Distances and similarities – the k-NN algorithm

Parametric models

Measuring model performance – metrics

Using accuracy

Using the confusion matrix

Precision

Recall

Classifier regime

F1 score

Using the area under the ROC curve

Mean absolute error

Mean squared error

Unsupervised learning

Semi-supervised learning

Summary

Exercises

Neural Networks and Deep Learning

Neural networks

Biological neurons

Artificial neurons

Fully connected layers

Activation functions

Loss function

Parameter initialization

Optimization

Gradient descent

Stochastic gradient descent

Mini-batch gradient descent

Gradient descent optimization algorithms

Vanilla

Momentum

ADAM

Backpropagation and automatic differentiation

Convolutional neural networks

The convolution operator

2D convolution

2D convolutions among volumes

1 x 1 x D convolutions

Regularization

Dropout

How dropout works

Inverted dropout

Dropout and L2 regularization

Data augmentation

Early stopping

Batch normalization

Summary

Exercises

Section 2: TensorFlow Fundamentals

TensorFlow Graph Architecture

Environment setup

TensorFlow 1.x environment

TensorFlow 2.0 environment

Dataflow graphs

The main structure – tf.Graph

Graph definition – from tf.Operation to tf.Tensor

Graph placement – tf.device

Graph execution – tf.Session

Variables in static graphs

tf.Variable

tf.get_variable

Model definition and training

Defining models with tf.layers

Automatic differentiation – losses and optimizers

Interacting with the graph using Python

Feeding placeholders

Writing summaries

Saving model parameters and model selection

Summary

Exercises

TensorFlow 2.0 Architecture

Relearning the framework

The Keras framework and its models

The Sequential API

The Functional API

The subclassing method

Eager execution and new features

Baseline example

Functions, not sessions

No more globals

Control flow

GradientTape

Custom training loop

Saving and restoring the model's status

Summaries and metrics

AutoGraph

Codebase migration

Summary

Exercises

Efficient Data Input Pipelines and Estimator API

Efficient data input pipelines

Input pipeline structure

The tf.data.Dataset object

Performance optimizations

Prefetching

Cache elements

Using TFRecords

Building your dataset

Data augmentation

TensorFlow Datasets – tfds

Installation

Usage

Keras integration

Eager integration

Estimator API

Data input pipeline

Custom estimators

Premade estimators

Using a Keras model

Using a canned estimator

Summary

Exercises

Section 3: The Application of Neural Networks

Image Classification Using TensorFlow Hub

Getting the data

Transfer learning

TensorFlow Hub

Using Inception v3 as a feature extractor

Adapting data to the model

Building the model – hub.KerasLayer

Training and evaluating

Training speed

Fine-tuning

When to fine-tune

TensorFlow Hub integration

Train and evaluate

Training speed

Summary

Exercises

Introduction to Object Detection

Getting the data

Object localization 

Localization as a regression problem

Intersection over Union

Average precision

Mean Average Precision

Improving the training script

Classification and localization

Multitask learning

Double-headed network

Anchor-based detectors

Anchor-boxes

Summary

Exercises

Semantic Segmentation and Custom Dataset Builder

Semantic segmentation

Challenges

Deconvolution – transposed convolution

The U-Net architecture

Create a TensorFlow DatasetBuilder

Hierarchical organization

The dataset class and DatasetInfo

Creating the dataset splits

Generating the example

Use the builder

Model training and evaluation

Data preparation

Training loop and Keras callbacks

Evaluation and inference

Summary

Exercises

Generative Adversarial Networks

Understanding GANs and their applications

Value function

Non-saturating value function

Model definition and training phase

Applications of GANs

Unconditional GANs

Preparing the data

Defining the Generator

Defining the Discriminator

Defining the loss functions

Adversarial training process in unconditional GANs

Conditional GANs

Getting the data for a conditional GAN

Defining the Generator in a conditional GAN

Defining the Discriminator in a conditional GAN

Adversarial training process 

Summary

Exercises

Bringing a Model to Production

The SavedModel serialization format

Features

Creating a SavedModel from a Keras model

Converting a SavedModel from a generic function

Python deployment

Generic computational graph

Keras models

Flat graphs

Supported deployment platforms

TensorFlow.js

Converting a SavedModel into model.json format

Converting a Keras Model into model.json format

Go Bindings and tfgo

Setup

Go bindings

Working with tfgo

Summary

Exercises

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Technology leaders are adopting neural networks to enhance their products, making them smarter or, in marketing words, AI-powered. This book is a handy guide to TensorFlow, its inner structure, the new features of version 2.0 and how to use them to create neural-networks-based applications. By the end of this book, you will be well-versed in the TensorFlow architecture and its new features. You will be able to solve machine learning problems easily, using the power of neural networks.

This book starts with a theoretical overview of machine learning and neural networks, followed by a description of the TensorFlow library, in both its 1.x and 2.0 versions. Reading this book, you will become well-versed in the required theory for understanding how neural networks work, using easy-to-follow examples. Next, you will learn how to master optimization techniques and algorithms to build a wide range of neural network architectures using the new modules offered by TensorFlow 2.0. Furthermore, after having analyzed the TensorFlow structure, you will learn how to implement more complex neural network architectures such as CNNs for classification, semantic segmentation networks, generative adversarial networks, and others in your research work and projects.

By the end of this book, you will master the TensorFlow structure and will be able to leverage the power of this machine learning framework to train and use neural networks of varying complexities without much effort.

Who this book is for

This book is meant for data scientists, machine learning developers, deep learning researchers, and developers with a basic statistical background who want to work with neural networks and discover the TensorFlow structure and its new features. A working knowledge of the Python programming language is required to get the most out of the book.

What this book covers

Chapter 1, What is Machine Learning?, covers the fundamentals of machine learning: what supervised, unsupervised, and semi-supervised learning is and why these distinctions are important. Moreover, you will start to understand how to create a data pipeline, how to measure the performance of an algorithm, and how to validate your results.

Chapter 2, Neural Networks and Deep Learning, focuses on neural networks. You will learn about the strengths of machine learning models, how it is possible to make a network learn, and how, in practice, the model parameter update is performed. By the end of this chapter, you will understand the intuition behind backpropagation and network parameter updates. Moreover, you'll learn why deep neural network architectures are required to solve challenging tasks.

Chapter 3, TensorFlow Graph Architecture, covers the structure of TensorFlow – the structure that's shared between the 1.x and 2.x versions.

Chapter 4, TensorFlow 2.0 Architecture, demonstrates the difference between TensorFlow 1.x and TensorFlow 2.x. You'll start to develop some simple machine learning models using both these versions. You will also gain an understanding of all the common features of the two versions.

Chapter 5, Efficient Data Input Pipelines and Estimator API, shows how to define a complete data input pipeline using the tf.data API together with the use of the tf.estimator API to define experiments. By the end of this chapter, you'll be able to create complex and highly efficient input pipelines leveraging all the power of the tf.data and tf.io.gfile APIs.

Chapter 6, Image Classification Using TensorFlow Hub, covers how to use TensorFlow Hub to do transfer learning and fine-tuning easily by leveraging its tight integration with the Keras API.

Chapter 7, Introduction to Object Detection, shows how to extend your classifier, making it an object detector that regresses the coordinates of a bounding box, and also gives you an introduction to more complex object detection architectures.

Chapter 8, Semantic Segmentation and Custom Dataset Builder, covers how to implement a semantic segmentation network, how to prepare a dataset for this kind of task, and how to train and measure the performance of a model. You will solve a semantic segmentation problem using U-Net.

Chapter 9, Generative Adversarial Networks, covers GANs from a theoretical and practical point of view. You will gain an understanding of the structure of generative models and how the adversarial training can be easily implemented using TensorFlow 2.0.

Chapter 10, Bringing a Model to Production, shows how to go from a trained model to a complete application. This chapter also covers how to export a trained model to an indicated representation (SavedModel) and use it in a complete application. By the end of this chapter, you will be able to export a trained model and use it inside Python, TensorFlow.js, and also in Go using the tfgo library.

To get the most out of this book

You need to have a basic understanding of neural networks, but this is not mandatory since the topics will be covered from both a theoretical and a practical point of view. Working knowledge of basic machine learning algorithms is a plus. You need a good working knowledge of Python 3.

You should already know how to install packages using pip, how to set up your working environment to work with TensorFlow, and how to enable (if available) GPU acceleration. Moreover, a good background knowledge of programming concepts, such as imperative language versus descriptive language and object-oriented programming, is required.

The environment setup will be covered in Chapter 3, TensorFlow Graph Architecture, after the first two chapters on machine learning and neural network theory.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

Support

tab.

Click on

Code Downloads

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Neural-Networks-with-TensorFlow-2.0. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781789615555_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1: Neural Network Fundamentals

This section provides a basic introduction to machine learning and the important concepts of neural networks and deep learning.

This section comprises the following chapters:

Chapter 1

,

What is Machine Learning?

Chapter 2

,

Neural Networks and Deep Learning

What is Machine Learning?

Machine learning (ML) is an artificial intelligence branch where we define algorithms, with the aim of learning about a model that describes and extracts meaningful information from data.

Exciting applications of ML can be found in fields such as predictive maintenance in industrial environments, image analysis for medical applications, time series forecasting for finance and many other sectors, face detection and identification for security purposes, autonomous driving, text comprehension, speech recognition, recommendation systems, and many other applications of ML are countless, and we probably use them daily without even knowing it!

Just think about the camera application on your smartphone— when you open the app and you point the camera toward a person, you see a square around the person's face. How is this possible? For a computer, an image is just a set of three stacked matrices. How can an algorithm detect that a specific subset of those pixels represents a face?

There's a high chance that the algorithm (also called a model) used by the camera application has been trained to detect that pattern. This task is known as face detection. This face detection task can be solved using a ML algorithm that can be classified into the broad category of supervised learning.

ML tasks are usually classified into three broad categories, all of which we are going to analyze in the following sections:

Supervised learning

Unsupervised learning

Semi-supervised learning

Every group has its peculiarities and set of algorithms, but all of them share the same goal: learning from data. Learning from data is the goal of every ML algorithm and, in particular, learning about an unknown function that maps data to the (expected) response.

The dataset is probably the most critical part of the entire ML pipeline; its quality, structure, and size are key to the success of deep learning algorithms, as we will see in upcoming chapters. 

For instance, the aforementioned face detection task can be solved by training a model, making it look at thousands and thousands of labeled examples so that the algorithm learns that a specific input corresponds with what we call a face. 

The same algorithm can achieve a different performance if it's trained on a different dataset of faces, and the more high-quality data we have, the better the algorithm's performance will be.

In this chapter, we will cover the following topics:

The importance of the dataset

Supervised learning

Unsupervised learning

Semi-supervised learning

The importance of the dataset

Since the concept of the dataset is essential in ML, let's look at it in detail, with a focus on how to create the required splits for building a complete and correct ML pipeline. 

A dataset is nothing more than a collection of data. Formally, we can describe a dataset as a set of pairs, , where  is the i-th example and  is its label, with a finite cardinality, :

A dataset has a finite number of elements, and our ML algorithm will loop over this dataset several times, trying to understand the data structure, until it solves the task it is asked to address. As shown in Chapter 2, Neural Networks and Deep Learning, some algorithms will consider all the data at once, while other algorithms will iteratively look at a small subset of the data at each training iteration.

A typical supervised learning task is the classification of the dataset. We train a model on the data, making it learn that a specific set of features extracted from the example  (or the example, , itself) corresponds to a label, . 

It is worth familiarizing yourself with the concept of datasets, dataset splits, and epochs from the beginning of your journey into the ML world so that you are already familiar with these concepts when we talk about them in the chapters that follow.

Right now, you already know, at a very high level, what a dataset is. But let's dig into the basic concepts of a dataset split. A dataset contains all the data that's at your disposal. As we mentioned previously, the ML algorithm needs to loop over the dataset several times and look at the data in order to learn how to solve a task (for example, the classification task).

If we use the same dataset to train and test the performance of our algorithm, how can we guarantee that our algorithm performs well, even on unseen data? Well, we can't.

The most common practice is to split the dataset into three parts:

Training set

: The subset to use to train the model.

Validation set

: The subset to measure the model's performance during the training and also to perform hyperparameter tuning/searches.

Test set

: The subset to never touch

 

during the training or validation phases. This is used only to run the final performance evaluation.

All three parts are disjoint subsets of the dataset, as shown in the following Venn diagram:

Venn diagram representing how a dataset should be divided no overlapping among the training, validation, and test sets is required

The training set is usually the bigger subset since it must be a meaningful representation of the whole dataset. The validation and test sets are smaller and generally the same size—of course, this is just something general; there are no constraints about the dataset's cardinality. In fact, the only thing that matters is that they're big enough for the algorithm to be trained on and represented.

We will make our model learn from the training set, evaluate its performance during the training process using the validation set, and run the final performance evaluation on the test set: this allows us to correctly define and train supervised learning algorithms that could generalize well, and therefore work well even on unseen data.

An epoch is the processing of the entire training set that's done by the learning algorithm. Hence, if our training set has 60,000 examples, once the ML algorithm uses all of them to learn, then an epoch is passed.

One of the most well-known datasets in the ML domain is the MNIST dataset. MNIST is a dataset of labeled pairs, where every example is a 28 x 28 binary image of a handwritten digit, and the label is the digit represented in the image.

However, we are not going to use the MNIST dataset in this book, for several reasons:

MNIST is too easy.

 Both traditional and recent ML algorithms can classify every digit of the dataset almost perfectly (> 97% accuracy).

MNIST is overused.

 We're not going to make the same applications with the same datasets as everyone else.

MNIST cannot represent modern computer vision tasks.

The preceding reasons come from the description of a new dataset, called fashion-MNIST, which was released in 2017 by the researchers at Zalando Research. This is one of the datasets we are going to use throughout this book.

Fashion-MNIST is a drop-in replacement for the MNIST dataset, which means that they both have the same structure. For this reason, any source code that uses MNIST can be started using fashion-MNIST by changing the dataset path.

It consists of a training set of 60,000 examples and a test set of 10,000 examples, just like the original MNIST dataset; even the image format (28 x 28) is the same. The main difference is in the subjects: there are no binary images of handwritten digits; this time, there's grayscale images of clothing. Since they are grayscale and not binary, their complexity is higher (binary means only 0 for background and 255 for the foreground, while grayscale is the whole range [0,255]):

Images sampled from the fashion-MNIST dataset on the left and from the MNIST dataset on the right. It's worth noting how the MNIST dataset is simpler since it's a binary images dataset, while the fashion-MNIST dataset is more complex because of the grayscale palette and the inherent complexity of the dataset elements.

A dataset such as fashion-MNIST is a perfect candidate to be used in supervised learning algorithms since they need annotated examples to be trained on.

Before describing the different types of ML algorithms, it is worth becoming familiar with the concept of n-dimensional spaces, which are the daily bread of every ML practitioner.

The curse of dimensionality

Let's take a hypercube unitary  with a center of  in a -dimensional space.

Let's also take a -dimensional hypersphere, with  centered on the origin of the space, . Intuitively, the center of the hypercube, , is inside the sphere. Is this true for every value of ?

We can verify this by measuring the Euclidean distance between the hypercube center and the origin:

Since the radius of the sphere is 1 in any dimension, we can conclude that, for a value of D greater than 4, the hypercube center is outside the hypersphere.

With the curse of dimensionality, we refer to the various phenomena that arise only when we're working with data in high-dimensional spaces that do not occur in low-dimensional settings such as the 2D or 3D space.

In practice, as the number of dimensions increases, some counterintuitive things start happening; this is the curse of dimensionality.

Now, it should be clearer that working within high-dimensional spaces is not easy and not intuitive at all. One of the greatest strengths of deep neural networks—which is also one of the reasons for their widespread use—is that they make tractable problems in high dimensional spaces, thereby reducing dimensionality layer by layer.

The first class of ML algorithms we are going to describe is the supervised learning family. These kinds of algorithms are the right tools to use when we aim to find a function that's able to separate elements of different classes in an n-dimensional space.

Supervised learning

Supervised learning algorithms work by extracting knowledge from a knowledge base (KB), that is, the dataset that contains labeled instances of the concept we need to learn about.

Supervised learning algorithms are two-phase algorithms. Given a supervised learning problem—let's say, a classification problem—the algorithm tries to solve it during the first phase, called the training phase, and its performance is measured in the second phase, called the testing phase.

The three dataset splits (train, validation, and test), as defined in the previous section, and the two-phase algorithm should sound an alarm: why do we have a two-phase algorithm and three dataset splits?

Because the first phase (should—in a well-made pipeline) uses two datasets. In fact, we can define the stages:

Training and validation

: The algorithm analyzes the dataset to generate

a

theory that is valid for the data it has been trained on, but also for items it has never seen. The algorithm, therefore, tries to discover and generalize a concept that bonds the examples with the same label, with the examples themselves. Intuitively, if you have a labeled dataset of cats and dogs, you want your algorithm to distinguish between them while being able to be robust to the variations that the examples with the same label can have (cats with different colors, positions, backgrounds, and so on). At the end of every training epoch, a performance evaluation using a metric on the validation set should be performed to select the model that reached the best performance on the validation set and to tune the algorithm hyperparameters to achieve the best possible result.

Testing

: The learned theory is applied to labeled examples that were never seen during the training and validation phases. This allows us to test how the algorithm performs on data that has never been used to train or select the model hyperparameters—a real-life scenario.

Supervised learning algorithms are a broad category, and all of them share the need for having a labeled dataset. Don't be fooled by the concept of a label: it is not mandatory for the label to be a discrete value (cat, dog, house, horse); in fact, it can also be a continuous value. What matters is the existence of the association (example, value) in the dataset. More formally, the example is a predictor variable, while the value is the dependent variable, outcome, or target variable.

Depending on the type of the desired outcome, supervised learning algorithms can be classified into two different families:

The supervised learning family—the target variable defines the problem to solve

Classification

: Where the label is discrete, and the aim is to classify the example and predict the label. The classification algorithm's aim is to learn about classification boundaries. These boundaries are functions that divide the space where the examples live into regions.

Regression

: Where the target variable is continuous, and the aim is to learn to regress a continuous value given an example. A regression problem that we will see in the upcoming chapters is the regression of the bounding box corner coordinates around a face. The face can be anywhere in the input image, and the algorithm has learned to regress the eight coordinates of the bounding box.

Parametric and non-parametric algorithms are used to solve classification and regression problems; the most common non-parametric algorithm is the k-NN algorithm. This is used to introduce the fundamental concepts of distances and similarities: concepts that are at the basis of every ML application. We will cover the k-NN algorithm in the next section.

Distances and similarities – the k-NN algorithm

The k-NN algorithm's goal is to find elements similar to a given one, rank them using a similarity score, and return the top-k similar elements (the first k elements, sorted by similarity) found. 

To do this, you need to measure the similarity that's required for a function that assigns a numerical score to two points: the higher the score, the more similar the elements should be.

Since we are modeling our dataset as a set of points in an n-dimensional space, we can use any  norm, or any other score function, even if it's not a metric, to measure the distance between two points and consider similar elements that are close together and dissimilar elements that are far away. The choice of the norm/distance function is entirely arbitrary, and it should depend on the topology of the n-dimensional space (that is why we usually reduce the dimensionality of the input data, and we try to measure the distances in lower dimensional space—so the curse of dimensionality gives us less trouble).

Thus, if we want to measure the similarity of elements in a dataset with dimensionality D, given a point, p, we have to measure and collect the distance from p to every other point, q:

The preceding example shows the general scenario of computing the generic p norm on the distance vector that connects p and q. In practice, setting p=1 gives us the Manhattan distance, while setting p=2 gives us the Euclidean distance. No matter what distance is chosen, the algorithm works by computing the distance function and sorting by closeness as a measure of similarity.

When k-NN is applied to a classification problem, the point, p, is classified by the vote of its k neighbors, where the vote is their class. Thus, an object that is classified with a particular class depends on the class of the elements that surround it.

When k-NN is applied to regression problems, the output of the algorithm is the average of the values of the  k-NN.

k-NN is only one among the various non-parametric models that has been developed over the years; however, parametric models usually show better performance. We'll look at these in the next section.

Parametric models

The ML models we are going to describe in this book are all parametric models: this means that a model can be described using a function, where the input and output are known (in the case of supervised learning, it is clear), and the aim is to change the model parameters so that, given a particular input, the model produces the expected output.

Given an input sample, , and the desired outcome, , an ML model is a parametric function, , where  is the set of model parameters to change during the training in order to fit the data (or in other words, generating a hypothesis).

The most intuitive and straightforward example we can give to clarify the concept of model parameters is linear regression.

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data.

Linear regression models have the following equation:

Here,  is the independent variable and  is the dependent one. The parameter, , is the scale factor, coefficient, or slope, and  is the bias coefficient or intercept.

Hence, the model parameters that must change during the training phase are .

We're talking about a single example in the training set, but the line should be the one that fits all the points of the training set the best. Of course, we are making a strong assumption about the dataset: we are using a model that, due to its nature, models a line. Due to this, before attempting to fit a linear model to the data, we should first determine whether or not there is a linear relationship between the dependent and independent variables (using a scatter plot is usually useful).

The most common method for fitting a regression line is the method of least squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line (if a point lies on the fitted line exactly, then its vertical deviation is 0). This relationship between observed and predicted data is what we call the loss function, as we will see in Chapter 2, Neural Networks and Deep Learning.

The goal of the supervised learning algorithm is, therefore, to iterate over the data and to adjust the  parameters iteratively so that  correctly models the observed phenomena.

However, when using more complex models (with a considerable number of adjustable parameters, as in the case of neural networks), adjusting the parameters can lead to undesired results.

If our model is composed of just two parameters and we are trying to model a linear phenomenon, there are no problems. But if we are trying to classify the Iris dataset, we can't use a simple linear model since it is easy to see that the function we have to learn about to separate the different classes is not a simple line.

In cases like that, we can use models with a higher number of trainable parameters that can adjust their variables to almost perfectly fit the dataset. This may sound perfect, but in practice, it is not something that's desirable. In fact, the model is adapting its parameters only to fit the training data, almost memorizing the dataset and thus losing every generalization capability.

This pathological phenomenon is called overfitting, and it happens when we are using a model that's too complex to model a simple event. There's also an opposite scenario, called underfitting, that occurs when our model is too simple for the dataset and therefore is not able to capture all the complexity of the data.

Every ML model aims to learn, and will adapt its parameters so that it's robust to noise and generalize, which means to find a suitable approximate function representing the relationship between the predictors and the response:

The dashed line represents the model's prediction. Underfitting, on the left, is a model with very poor generalization performance and therefore is unable to learn a good dataset approximation. The center image represents a good model that can generalize well in contrast to the model on the right that memorized the training set, overfitting it.

Several supervised learning algorithms have been developed over the years. This book, however, will focus on the ML model that demonstrated to be more versatile and that can be used to solve almost any supervised, unsupervised, and semi-supervised learning task: neural networks.

During the explanation of the training and validation phases, we talked about two concepts we haven't introduced yet—hyperparameters and metrics:

Hyperparameters

: We talk about hyperparameters when our algorithm, which is to be fully defined, requires values to be assigned to a set of parameters. We call the parameters that define the algorithm itself hyperparameters. For example, the number of neurons in a neural network is a hyperparameter.

Metrics

: The functions that give the model prediction. The expected output produces a numerical score that measures the goodness of the model.

Metrics are crucial components in every ML pipeline; they are so useful and powerful that they deserve their own section.

Measuring model performance – metrics

Evaluating a supervised learning algorithm during the evaluation and testing phases is an essential part of any well-made ML pipeline.

Before we describe the various metrics that are available, there's one last thing that's worth noting: measuring the performance of a model is something that we can always do on every dataset split. During the training phase, usually at the end of every training epoch, we can measure the performance of the algorithm on the training set itself, as well as the validation set. Plotting how the curves change during the training and analyzing the relationships between the validation and training curve allow us to quickly identify the previously described pathological conditions of an ML model—overfitting and underfitting. 

Supervised learning algorithms have the significant advantage of having the expected outcome of the algorithm inside the dataset, and all the metrics hereby presented use the label information to evaluate "how well" the model performs.

There are metrics to measure the performance of classifiers and metrics to measure the performance of regressors; it is clear that it wouldn't make any sense to treat a classifier in the same way as a regressor, even if both are members of the supervised learning algorithm family.

The first metric and the most used metric for evaluating a supervised learning algorithm's performance is accuracy.

Using accuracy

Accuracy is the ratio of the number of correct predictions made to the number of all predictions made.

Accuracy is used to measure classification performance on multiclass classification problems.

Given  as the label and  as the prediction, we can define the accuracy of the i-th example as follows:

Therefore, for a whole dataset with N elements, the mean accuracy over all the samples is as follows:

We have to pay attention to the structure of the dataset, D, when using this metric: in fact, it works well only when there is an equal number of samples belonging to each class (we need to be using a balanced dataset).

In the case of an unbalanced dataset or when the error in predicting that an incorrect class is higher/lower than predicting another class, accuracy is not the best metric to use. To understand why, think about the case of a dataset with two classes only, where 80% of samples are of class 1, and 20% of samples are of class 2.

If the classifier predicts only class 1, the accuracy that's measured in this dataset is 0.8, but of course, this is not a good measure of the performance of the classifier, since it always predicts the same class, no matter what the input is. If the same model is tested on a test set with 40% of samples from class 1 and the remaining ones of class 2, the measurement will drop down to 0.4.

Remembering that metrics can be used during the training phase to measure the model's performance, we can monitor how the training is going by looking at the validation accuracy and the training accuracy to detect if our model is overfitting or underfitting the training data.

If the model can model the relationships present in the data, the training accuracy increases; if it doesn't, the model is too simple and we are underfitting the data. In this case, we have to use a complex model with a higher learning capacity (with a more significant number of trainable parameters).

If our training accuracy increases, we can start looking at the validation accuracy (always at the end of every training epoch): if the validation accuracy stops growing or even starts decreasing, the model is overfitting the training data and we should stop the training (this is called an early stop and is a regularization technique).

Using the confusion matrix

The confusion matrix is a tabular way of representing a classifier's performance. It can be used to summarize how the classifier behaved on the test set, and it can be used only in the case of multi-class classification problems. Each row of the matrix represents the instances in a predicted class, while each column represents the instances in an actual class. For example, in a binary classification problem, we can have the following:

Samples: 320

Actual: YES

Actual: NO

Predicted: YES

98

120

Predicted: NO

150

128

 

It is worth noting that the confusion matrix is not a metric; in fact, the matrix alone does not measure the model's performance, but is the basis for computing several useful metrics, all of them based on the concepts of true positives, true negatives, false positives, and false negatives.

These terms all refer to a single class; this means you have to consider a multiclass classification problem as a binary classification problem when computing these terms. Given a multiclass classification problem, whose classes are A, B, ..., Z, we have, for example, the following:

(

TP

)

True positives of A

: All A instances that are classified as A

(

TN

)

True negatives of A

: All non-A instances that are not classified as A

(

FP

)

False positives of A

: All non-A instances that are classified as A

(

FN

)

False negatives of A

: All A instances that are not classified as A

This, of course, can be applied to every class in the dataset so that we get these four values for every class.

The most important metrics we can compute that have the TP, TN, FP, and FN values are precision, recall, and the F1 score.

Precision

Precision is the number of correct positives results, divided by the number of positive results predicted: