32,39 €
Dive deeper into neural networks and get your models trained, optimized with this quick reference guide
Key FeaturesA quick reference to all important deep learning concepts and their implementationsEssential tips, tricks, and hacks to train a variety of deep learning models such as CNNs, RNNs, LSTMs, and moreSupplemented with essential mathematics and theory, every chapter provides best practices and safe choices for training and fine-tuning your models in Keras and Tensorflow.Book Description
Deep learning has become an essential necessity to enter the world of artificial intelligence. With this book deep learning techniques will become more accessible, practical, and relevant to practicing data scientists. It moves deep learning from academia to the real world through practical examples.
You will learn how Tensor Board is used to monitor the training of deep neural networks and solve binary classification problems using deep learning. Readers will then learn to optimize hyperparameters in their deep learning models. The book then takes the readers through the practical implementation of training CNN's, RNN's, and LSTM's with word embeddings and seq2seq models from scratch. Later the book explores advanced topics such as Deep Q Network to solve an autonomous agent problem and how to use two adversarial networks to generate artificial images that appear real. For implementation purposes, we look at popular Python-based deep learning frameworks such as Keras and Tensorflow, Each chapter provides best practices and safe choices to help readers make the right decision while training deep neural networks.
By the end of this book, you will be able to solve real-world problems quickly with deep neural networks.
What you will learn Solve regression and classification challenges with TensorFlow and Keras Learn to use Tensor Board for monitoring neural networks and its training Optimize hyperparameters and safe choices/best practices Build CNN's, RNN's, and LSTM's and using word embedding from scratch Build and train seq2seq models for machine translation and chat applications. Understanding Deep Q networks and how to use one to solve an autonomous agent problem. Explore Deep Q Network and address autonomous agent challenges.Who this book is for
If you are a Data Scientist or a Machine Learning expert, then this book is a very useful read in training your advanced machine learning and deep learning models. You can also refer this book if you are stuck in-between the neural network modeling and need immediate assistance in getting accomplishing the task smoothly. Some prior knowledge of Python and tight hold on the basics of machine learning is required.
Mike Bernico is a Lead Data Scientist at State Farm Mutual Insurance Companies. He also works as an adjunct for the University of Illinois at Springfield, where he teaches Essentials of Data Science, and Advanced Neural Networks and Deep Learning. Mike earned his MSCS from the University of Illinois at Springfield. He's an advocate for open source software and the good it can bring to the world. As a lifelong learner with umpteen hobbies, Mike also enjoys cycling, travel photography, and wine making.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 318
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Amey VarangaonkarAcquisition Editor: Viraj MadhavContent Development Editor: Varun SonyTechnical Editor: Dharmendra YadavCopy Editors: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Pratik ShirodkarGraphics: Tania DuttaProduction Coordinator: Deepika Naik
First published: March 2018
Production reference: 1070318
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78883-799-6
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
I first met Mike Bernico when we were two of the founding members of a new data science team at a Fortune 50 company. Then, it was a heady time; there wasn't such a thing as formal data science education, so we were all self-taught. We were a collection of adventurous people from diverse backgrounds, who identified and learned data science techniques because we needed them to solve the problems that we were interested in. We built a team with an optimistic hacker approach—the belief that we could find and apply techniques "from the wild" to build interesting, useful things.
It is in this practical, scrappy spirit that Mike wrote Deep Learning Quick Reference book. Deep learning is frequently made out to be mysterious and difficult; however, in this guide, Mike breaks down major deep learning techniques, making them approachable and applicable. With this book, you (yes, you!) can quickly get started with using deep learning for your own projects in a variety of different modalities.
Mike has been practising data science since before the discipline was named, and he has been specifically teaching the topic to university students for 3 years. Prior to this, he spent many years as a working computer scientist with a specialization in networks and security, and he also has a knack for engaging with people and communicating with nonspecialists. He is currently the Lead Data Scientist for a large financial services company, where he designs systems for data science, builds machine learning models with direct applications and for research publications, mentors junior data scientists, and teaches stakeholders about data science. He knows his stuff!
With Deep Learning Quick Reference book, you'll benefit from Mike's deep experience, humor, and down-to-earth manner as you build example networks alongside him. After you complete Mike's book, you'll have the confidence and knowledge to understand and apply deep learning to the problems of your own devising, for both fun and function.
Bon voyage, and good hacking!
- J. Malia Andrus, Ph.D.
Data Scientist Seattle Washington
Mike Bernico is a Lead Data Scientist at State Farm Mutual Insurance Companies. He also works as an adjunct for the University of Illinois at Springfield, where he teaches Essentials of Data Science, and Advanced Neural Networks and Deep Learning. Mike earned his MSCS from the University of Illinois at Springfield. He's an advocate for open source software and the good it can bring to the world. As a lifelong learner with umpteen hobbies, Mike also enjoys cycling, travel photography, and wine making.
Vitor Bianchi Lanzetta has a master’s degree in Applied Economics from the University of São Paulo, one of the most reputable universities in Latin America. He has done a lot of research in economics using neural networks. He has also authored R Data Visualization Recipes, Packt Publishing. Vitor is very passionate about data science in general, and he walks the earth with a personal belief that he is just as cool as he is geek. He thinks that you will learn a lot from this book, and that TensorFlow may be the greatest deep learning tool currently available.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Deep Learning Quick Reference
Dedication
Packt Upsell
Why subscribe?
PacktPub.com
Foreword
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
The Building Blocks of Deep Learning
The deep neural network architectures
Neurons
The neuron linear function
Neuron activation functions
The loss and cost functions in deep learning
The forward propagation process
The back propagation function
Stochastic and minibatch gradient descents
Optimization algorithms for deep learning
Using momentum with gradient descent
The RMSProp algorithm
The Adam optimizer
Deep learning frameworks
What is TensorFlow?
What is Keras?
Popular alternatives to TensorFlow
GPU requirements for TensorFlow and Keras
Installing Nvidia CUDA Toolkit and cuDNN
Installing Python
Installing TensorFlow and Keras
Building datasets for deep learning
Bias and variance errors in deep learning
The train, val, and test datasets
Managing bias and variance in deep neural networks
K-Fold cross-validation
Summary
Using Deep Learning to Solve Regression Problems
Regression analysis and deep neural networks
Benefits of using a neural network for regression
Drawbacks to consider when using a neural network for regression
Using deep neural networks for regression
How to plan a machine learning problem
Defining our example problem
Loading the dataset
Defining our cost function
Building an MLP in Keras
Input layer shape
Hidden layer shape
Output layer shape
Neural network architecture
Training the Keras model
Measuring the performance of our model
Building a deep neural network in Keras
Measuring the deep neural network performance
Tuning the model hyperparameters
Saving and loading a trained Keras model
Summary
Monitoring Network Training Using TensorBoard
A brief overview of TensorBoard
Setting up TensorBoard
Installing TensorBoard
How TensorBoard talks to Keras/TensorFlow
Running TensorBoard
Connecting Keras to TensorBoard
Introducing Keras callbacks
Creating a TensorBoard callback
Using TensorBoard
Visualizing training
Visualizing network graphs
Visualizing a broken network
Summary
Using Deep Learning to Solve Binary Classification Problems
Binary classification and deep neural networks
Benefits of deep neural networks
Drawbacks of deep neural networks
Case study – epileptic seizure recognition
Defining our dataset
Loading data
Model inputs and outputs
The cost function
Using metrics to assess the performance
Building a binary classifier in Keras
The input layer
The hidden layers
What happens if we use too many neurons?
What happens if we use too few neurons?
Choosing a hidden layer architecture
Coding the hidden layers for our example
The output layer
Putting it all together
Training our model
Using the checkpoint callback in Keras
Measuring ROC AUC in a custom callback
Measuring precision, recall, and f1-score
Summary
Using Keras to Solve Multiclass Classification Problems
Multiclass classification and deep neural networks
Benefits
Drawbacks
Case study - handwritten digit classification
Problem definition
Model inputs and outputs
Flattening inputs
Categorical outputs
Cost function
Metrics
Building a multiclass classifier in Keras
Loading MNIST
Input layer
Hidden layers
Output layer
Softmax activation
Putting it all together
Training
Using scikit-learn metrics with multiclass models
Controlling variance with dropout
Controlling variance with regularization
Summary
Hyperparameter Optimization
Should network architecture be considered a hyperparameter?
Finding a giant and then standing on his shoulders
Adding until you overfit, then regularizing
Practical advice
Which hyperparameters should we optimize?
Hyperparameter optimization strategies
Common strategies
Using random search with scikit-learn
Hyperband
Summary
Training a CNN from Scratch
Introducing convolutions
How do convolutional layers work?
Convolutions in three dimensions
A layer of convolutions
Benefits of convolutional layers
Parameter sharing
Local connectivity
Pooling layers
Batch normalization
Training a convolutional neural network in Keras
Input
Output
Cost function and metrics
Convolutional layers
Fully connected layers
Multi-GPU models in Keras
Training
Using data augmentation
The Keras ImageDataGenerator
Training with a generator
Summary
Transfer Learning with Pretrained CNNs
Overview of transfer learning
When transfer learning should be used
Limited data
Common problem domains
The impact of source/target volume and similarity
More data is always beneficial
Source/target domain similarity
Transfer learning in Keras
Target domain overview
Source domain overview
Source network architecture
Transfer network architecture
Data preparation
Data input
Training (feature extraction)
Training (fine-tuning)
Summary
Training an RNN from scratch
Introducing recurrent neural networks
What makes a neuron recurrent?
Long Short Term Memory Networks
Backpropagation through time
A refresher on time series problems
Stock and flow
ARIMA and ARIMAX forecasting
Using an LSTM for time series prediction
Data preparation
Loading the dataset
Slicing train and test by date
Differencing a time series
Scaling a time series
Creating a lagged training set
Input shape
Data preparation glue
Network output
Network architecture
Stateful versus stateless LSTMs
Training
Measuring performance
Summary
Training LSTMs with Word Embeddings from Scratch
An introduction to natural language processing
Semantic analysis
Document classification
Vectorizing text
NLP terminology
Bag of Word models
Stemming, lemmatization, and stopwords
Count and TF-IDF vectorization
Word embedding
A quick example
Learning word embeddings with prediction
Learning word embeddings with counting
Getting from words to documents
Keras embedding layer
1D CNNs for natural language processing
Case studies for document classifications
Sentiment analysis with Keras embedding layers and LSTMs
Preparing the data
Input and embedding layer architecture
LSTM layer
Output layer
Putting it all together
Training the network
Performance
Document classification with and without GloVe
Preparing the data
Loading pretrained word vectors
Input and embedding layer architecture
Without GloVe vectors
With GloVe vectors
Convolution layers
Output layer
Putting it all together
Training
Performance
Summary
Training Seq2Seq Models
Sequence-to-sequence models
Sequence-to-sequence model applications
Sequence-to-sequence model architecture
Encoders and decoders
Characters versus words
Teacher forcing
Attention
Translation metrics
Machine translation
Understanding the data
Loading data
One hot encoding
Training network architecture
Network architecture (for inference)
Putting it all together
Training
Inference
Loading data
Creating reverse indices
Loading models
Translating a sequence
Decoding a sequence
Example translations
Summary
Using Deep Reinforcement Learning
Reinforcement learning overview
Markov Decision Processes
Q Learning
Infinite state space
Deep Q networks
Online learning
Memory and experience replay
Exploitation versus exploration 
DeepMind
The Keras reinforcement learning framework
Installing Keras-RL
Installing OpenAI gym
Using OpenAI gym
Building a reinforcement learning agent in Keras
CartPole
CartPole neural network architecture
Memory
Policy
Agent
Training
Results
Lunar Lander 
Lunar Lander network architecture
Memory and policy
Agent
Training
Results
Summary
Generative Adversarial Networks
An overview of the GAN
Deep Convolutional GAN architecture
Adversarial training architecture
Generator architecture
Discriminator architecture
Stacked training
Step 1 – train the discriminator
Step 2 – train the stack 
How GANs can fail
Stability
Mode collapse
Safe choices for GAN
Generating MNIST images using a Keras GAN
Loading the dataset
Building the generator
Building the discriminator
Building the stacked model
The training loop
Model evaluation
Generating CIFAR-10 images using a Keras GAN
Loading CIFAR-10
Building the generator
Building the discriminator
The training loop
Model evaluation
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
Deep Learning Quick Reference demonstrates a fast and practical approach to using deep learning. It's focused on real-life problems, and it provides just enough theory and math to reinforce the readers' understanding of the topic. Deep learning is an exciting, fast paced branch of machine learning, but it's also a field that can be broken into. It's a field where a flood of detailed, complicated research is created every day, and this can be overwhelming. In this book, I focus on teaching you the skills to apply deep learning on a variety of practical problems. My greatest hope for this book is that it will provide you with the tools you need to use deep learning techniques to solve your machine learning problems.
I'm a practicing data scientist, and I'm writing this book keeping other practicing data scientists and machine learning engineers in mind. If you're a software engineer applying deep learning, this book is also for you.
If you're a deep learning researcher, then this book isn't really for you; however, you should still pick up a copy so that you can criticize the lack of proofs and mathematical rigor in this book.
If you're an academic or educator, then this book is definitely for you. I've taught a survey source in data science at the University of Illinois at Springfield (go Prairie Stars!) for the past 3 years, and in doing so, I've had the opportunity to inspire a number of future machine learning people. This experience has inspired me to create this book. I think a book like this is a great way to help students build interest in a very complex topic.
Chapter 1, The Building Blocks of Deep Learning, reviews some basics around the operation of neural networks, touches on optimization algorithms, talks about model validation, and goes over setting up a development environment suitable for building deep neural networks.
Chapter 2, Using Deep Learning to Solve Regression Problems, enables you build very simple neural networks to solve regression problems and explore the impact of deeper more complex models on those problems.
Chapter 3, Monitoring Network Training Using TensorBoard, lets you get started right away with TensorBoard, which is a wonderful application for monitoring and debugging your future models.
Chapter 4, Using Deep Learning to Solve Binary Classification Problems, helps you solve binary classification problems using deep learning.
Chapter 5, Using Keras to Solve Multiclass Classification Problems, takes you to multiclass classification and explores the differences. It also talks about managing overfitting and the safest choices for doing so.
Chapter 6, Hyperparameter Optimization, shows two separate methods for model tuning—one, well-known and battle tested, while the other is a state-of-the-art method.
Chapter 7, Training a CNN From Scratch, teaches you how to use convolutional networks to do classification with images.
Chapter 8, Transfer Learning with Pretrained CNNs, describes how to apply transfer learning to get amazing performance from an image classifier, even with very little data.
Chapter 9, Training an RNN from scratch, discusses RNNs and LSTMS, and how to use them for time series forecasting problems.
Chapter 10, Training LSTMs with Word Embeddings From Scratch, continues our conversation on LSTMs, this time talking about natural language classification tasks.
Chapter 11, Training Seq2Seq Models, helps us use sequence to sequence models to do machine translation.
Chapter 12, Using Deep Reinforcement Learning, introduces deep reinforcement learning and builds a deep Q network that can power autonomous agents.
Chapter 13, Generative Adversarial Networks, explains how to use generative adversarial networks to generate convincing images.
I assume that you're already experienced with more traditional data science and predictive modeling techniques such as Linear/Logistic Regression and Random Forest. If this is your first experience with machine learning, this may be a little difficult for you.
I also assume that you have at least some experience in programming with Python, or at least another programming language such as Java or C++.
Deep learning is computationally intensive, and some of the models we build here require an NVIDIA GPU to run in a reasonable amount of time. If you don't own a fast GPU, you may wish to use a GPU-based cloud instance on either Amazon Web Services or Google Cloud Platform.
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packtpub.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, make sure that you unzip or extract the folder using the latest version of any of these:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for macOS
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Deep-Learning-Quick-Reference. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Visit www.packtpub.com/submit-errata, select your book, click on the Errata Submission Form link, and enter the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions; also, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
Welcome to Deep Learning Quick Reference! In this book, I am going to attempt to make deep learning techniques more accessible, practical, and consumable to data scientists, machine learning engineers, and software engineers who need to solve problems with deep learning. If you want to train your own deep neural network and you're stuck somewhere, there is a good chance this guide will help.
This book is hands on and is intended to be a practical guide that can help you solve your problems fast. It is primarily intended for experienced machine learning engineers and data scientists who need to use deep learning to solve a problem. Aside from this chapter, which provides some of the terminology, frameworks, and background that we will need to get started, it's not meant to be read in order. Each chapter contains a practical example, complete with code and a few best practices and safe choices. We expect you to flip to the chapter you need and get started.
This book won't go deeply into the theory of deep learning and neural networks. There are many wonderful books that can provide that background, and I highly recommend that you read at least one of them (maybe a bibliography or just recommendations). We hope to provide just enough theory and mathematical intuition to get you started.
We will cover the following topics in this chapter:
Deep neural network architectures
Optimization algorithms for deep learning
Deep learning frameworks
Building datasets for deep learning
The deep neural network architectures can vary greatly in structure depending on the network's application, but they all have some basic components. In this section, we will talk briefly about those components.
In this book, I'll define a deep neural network as a network with more than a single hidden layer. Beyond that we won't attempt to limit the membership to the Deep Learning Club. As such, our networks might have less than 100 neurons, or possibly millions. We might use special layers of neurons, including convolutions and recurrent layers, but we will refer to all of these as neurons nonetheless.
A neuron is the atomic unit of a neural network. This is sometimes inspired by biology; however, that's a topic for a different book. Neurons are typically arranged into layers. In this book, if I'm referring to a specific neuron, I'll use the notation where l is the layer the neuron is in and k is the neuron number. As we will be using programming languages that observe 0th notation, my notation will also be 0th based.
At their core, most neurons are composed of two functions that work together: a linear function and an activation function. Let us take a high-level look at those two components.
The first component of the neuron is a linear function whose output is the sum of the inputs, each multiplied by a coefficient. This function is really more or less a linear regression. These coefficients are typically referred to as weights in neural network speak. For example, given some neuron with the input features of x1, x2, and x3, and output z, this linear component or the neuron linear function would simply be:
Where are weights or coefficients that we will need to learn given the data and b is a bias term.
The second function of the neuron is the activation function, which is tasked with introducing a nonlinearity between neurons. A commonly used activation is the sigmoid activation, which you may be familiar with from logistic regression. It squeezes the output of the neuron into an output space where very large values of z are driven to 1 and very small values of z are driven to 0.
The sigmoid function looks like this:
It turns out that the activation function is very important for intermediate neurons. Without it one could prove that a stack of neurons with linear activation's (which is really no activation, or more formally an activation function where z=z) is really just a single linear function.
A single linear function is undesirable in this case because there are many scenarios where our network may be under specified for the problem at hand. That is to say that the network can't model the data well because of non-linear relationships present in the data between the input features and target variable (what we're predicting).
The canonical example of a function that cannot be modeled with a linear function is the exclusive OR function, which is shown in the following figure:
Other common activation functions are the tanh function and the ReLu or Rectilinear Activation.
The hyperbolic tangent or the tanh function looks like this:
The tanh usually works better than sigmoid for intermediate layers. As you can probably see, the output of tanh will be between [-1, 1], whereas the output of sigmoid is [0, 1]. This additional width provides some resilience from a phenomenon known as the vanishing/exploding gradient problem, which we will cover in more detail later. For now, it's enough to know that the vanishing gradient problem can cause networks to converge very slowly in the early layers, if at all. Because of that, networks using tanh will tend to converge somewhat faster than networks that use sigmoid activation. That said, they are still not as fast as ReLu.
ReLu, or Rectilinear Activation, is defined simply as:
It's a safe bet and we will use it most of the time throughout this book. Not only is ReLu easy to compute and differentiate, it's also resilient against the vanishing gradient problem. The only drawback to ReLu is that it's first derivative is undefined at exactly 0. Variants including leaky ReLu, are computationally harder, but more robust against this issue.
For completeness, here's a somewhat obvious graph of ReLu:
Every machine learning model really starts with a cost function. Simply, a cost function allows you to measure how well your model is fitting the training data. In this book, we will define the loss function as the correctness of fit for a single observation within the training set. The cost function will then most often be an average of the loss across the training set. We will revisit loss functions later when we introduce each type of neural network; however, quickly consider the cost function for linear regression as an example:
In this case, the loss function would be , which is really the squared error. So then J, our cost function, is really just the mean squared error, or an average of the squared error across the entire dataset. The term 1/2 is added to make some of the calculus cleaner by convention.
Forward propagation is the process by which we attempt to predict our target variable using the features present in a single observation. Imagine we had a two-layer neural network. In the forward propagation process, we would start with the features present within that observation and then multiply those features by their associated coefficients within layer 1 and add a bias term for each neuron. After that, we would send that output to the activation for the neuron. Following that, the output would be sent to the next layer, and so on, until we reach the end of the network where we are left with our network's prediction:
Once forward propagation is complete, we have the network's prediction for each data point. We also know that data point's actual value. Typically, the prediction is defined as while the actual value of the target variable is defined as y.
Once both y and are known, the network's error can be computed using the cost function. Recall that the cost function is the average of the loss function.
In order for learning to occur within the network, the network's error signal must be propagated backwards through the network layers from the last layer to the first. Our goal in back propagation is to propagate this error signal backwards through the network while using it to update the network weights as the signal travels. Mathematically, to do so we need to minimize the cost function by nudging the weights towards values that make the cost function the smallest. This process is called gradient descent.
The gradient is the partial derivative of the error function with respect to each weight within the network. The gradient of each weight can be calculated, layer by layer, using the chain rule and the gradients of the layers above.
Once the gradients of each layer are known, we can use the gradient descent algorithm to minimize the cost function.
The Gradient Descent will repeat this update until the network's error is minimized and the process has converged:
The gradient descent algorithm multiples the gradient by a learning rate called alpha and subtracts that value from the current value of each weight. The learning rate is a hyperparameter.
The algorithm describe in the previous section assumes a forward and corresponding backwards pass over the entire dataset and as such it's called batch gradient descent.
Another possible way to do gradient descent would be to use a single data point at a time, updating the network weights as we go. This method might help speed up convergence around saddle points where the network might stop converging. Of course, the error estimation of only a single point may not be a very good approximation of the error of the entire dataset.
The best solution to this problem is using mini batch gradient descent, in which we will take some random subset of the data called a mini batch to compute our error and update our network weights. This is almost always the best option. It has the additional benefit of naturally splitting a very large dataset into chunks that are more easily managed in the memory of a machine, or even across machines.
The gradient descent algorithm is not the only optimization algorithm available to optimize our network weights, however it's the basis for most other algorithms. While understanding every optimization algorithm out there is likely a PhD worth of material, we will devote a few sentences to some of the most practical.
Using gradient descent with momentum speeds up gradient descent by increasing the speed of learning in directions the gradient has been constant in direction while slowing learning in directions the gradient fluctuates in direction. It allows the velocity of gradient descent to increase.
Momentum works by introducing a velocity term, and using a weighted moving average of that term in the update rule, as follows:
Most typically is set to 0.9 in the case of momentum, and usually this is not a hyper-parameter that needs to be changed.
RMSProp is another algorithm that can speed up gradient descent by speeding up learning in some directions, and dampening oscillations in other directions, across the multidimensional space that the network weights represent:
This has the effect of reducing oscillations more in directions where is large.
Adam is one of the best performing known optimizer and it's my first choice. It works well across a wide variety of problems. It combines the best parts of both momentum and RMSProp into a single update rule:
Where is some very small number to prevent division by 0.
While it's most certainly possible to build and train deep neural networks from scratch using just Python's numpy, that would take a great deal of time and code. It's far more practical, in almost every case, to use a deep learning framework.
Throughout this book we will be using TensorFlow and Keras to make developing deep neural networks much easier and faster.
TensorFlow is a library that can be used to quickly build deep neural networks. In TensorFlow, the mathematical operations that we've covered thus far are expressed as nodes. The edges between these nodes are tensors, or multidimensional data arrays. TensorFlow can, given a neural network defined as a graph and a loss function, automatically compute gradients for the network and optimize the graph to minimize the loss function.
TensorFlow was released as an open source project by Google in 2015. Since then it has gained a very large following and enjoys a large user community. While TensorFlow provides APIs in Java, C++, Go, and Python, we will only be covering the Python API. The Python API is used in this book because it's both the most commonly used, and the API most commonly used for the development of new models.
TensorFlow can greatly accelerate computation by performing those calculations on one or more Graphics Processing Units. The acceleration that GPU computation provides has become a necessity in modern deep learning.
