E-Book
32,39 €

Deep Learning Quick Reference E-Book

Michael Bernico

0,0

32,39 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Dive deeper into neural networks and get your models trained, optimized with this quick reference guide

Key FeaturesA quick reference to all important deep learning concepts and their implementationsEssential tips, tricks, and hacks to train a variety of deep learning models such as CNNs, RNNs, LSTMs, and moreSupplemented with essential mathematics and theory, every chapter provides best practices and safe choices for training and fine-tuning your models in Keras and Tensorflow.Book Description

Deep learning has become an essential necessity to enter the world of artificial intelligence. With this book deep learning techniques will become more accessible, practical, and relevant to practicing data scientists. It moves deep learning from academia to the real world through practical examples.

You will learn how Tensor Board is used to monitor the training of deep neural networks and solve binary classification problems using deep learning. Readers will then learn to optimize hyperparameters in their deep learning models. The book then takes the readers through the practical implementation of training CNN's, RNN's, and LSTM's with word embeddings and seq2seq models from scratch. Later the book explores advanced topics such as Deep Q Network to solve an autonomous agent problem and how to use two adversarial networks to generate artificial images that appear real. For implementation purposes, we look at popular Python-based deep learning frameworks such as Keras and Tensorflow, Each chapter provides best practices and safe choices to help readers make the right decision while training deep neural networks.

By the end of this book, you will be able to solve real-world problems quickly with deep neural networks.

What you will learn Solve regression and classification challenges with TensorFlow and Keras Learn to use Tensor Board for monitoring neural networks and its training Optimize hyperparameters and safe choices/best practices Build CNN's, RNN's, and LSTM's and using word embedding from scratch Build and train seq2seq models for machine translation and chat applications. Understanding Deep Q networks and how to use one to solve an autonomous agent problem. Explore Deep Q Network and address autonomous agent challenges.Who this book is for

If you are a Data Scientist or a Machine Learning expert, then this book is a very useful read in training your advanced machine learning and deep learning models. You can also refer this book if you are stuck in-between the neural network modeling and need immediate assistance in getting accomplishing the task smoothly. Some prior knowledge of Python and tight hold on the basics of machine learning is required.

Mike Bernico is a Lead Data Scientist at State Farm Mutual Insurance Companies. He also works as an adjunct for the University of Illinois at Springfield, where he teaches Essentials of Data Science, and Advanced Neural Networks and Deep Learning. Mike earned his MSCS from the University of Illinois at Springfield. He's an advocate for open source software and the good it can bring to the world. As a lifelong learner with umpteen hobbies, Mike also enjoys cycling, travel photography, and wine making.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 318

Veröffentlichungsjahr: 2018

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Deep Learning Quick Reference

Useful hacks for training and optimizing deep neural networks with TensorFlow and Keras

Mike Bernico

BIRMINGHAM - MUMBAI

Deep Learning Quick Reference

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Amey VarangaonkarAcquisition Editor: Viraj MadhavContent Development Editor: Varun SonyTechnical Editor: Dharmendra YadavCopy Editors: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Pratik ShirodkarGraphics: Tania DuttaProduction Coordinator: Deepika Naik

First published: March 2018

Production reference: 1070318

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78883-799-6

www.packtpub.com

To my wife, Lana, whose love and support define the best epoch of my life

To my son, William, who is likely disappointed that this book doesn't have more dragons in it

To my mother, Sharon, and to the memory of my father, Bob, who taught me that determination and resilience matter more than intelligence

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Foreword

I first met Mike Bernico when we were two of the founding members of a new data science team at a Fortune 50 company. Then, it was a heady time; there wasn't such a thing as formal data science education, so we were all self-taught. We were a collection of adventurous people from diverse backgrounds, who identified and learned data science techniques because we needed them to solve the problems that we were interested in. We built a team with an optimistic hacker approach—the belief that we could find and apply techniques "from the wild" to build interesting, useful things.

It is in this practical, scrappy spirit that Mike wrote Deep Learning Quick Reference book. Deep learning is frequently made out to be mysterious and difficult; however, in this guide, Mike breaks down major deep learning techniques, making them approachable and applicable. With this book, you (yes, you!) can quickly get started with using deep learning for your own projects in a variety of different modalities.

Mike has been practising data science since before the discipline was named, and he has been specifically teaching the topic to university students for 3 years. Prior to this, he spent many years as a working computer scientist with a specialization in networks and security, and he also has a knack for engaging with people and communicating with nonspecialists. He is currently the Lead Data Scientist for a large financial services company, where he designs systems for data science, builds machine learning models with direct applications and for research publications, mentors junior data scientists, and teaches stakeholders about data science. He knows his stuff!

With Deep Learning Quick Reference book, you'll benefit from Mike's deep experience, humor, and down-to-earth manner as you build example networks alongside him. After you complete Mike's book, you'll have the confidence and knowledge to understand and apply deep learning to the problems of your own devising, for both fun and function.

Bon voyage, and good hacking!

- J. Malia Andrus, Ph.D.

Data Scientist Seattle Washington

Contributors

About the author

I'd like to thank the very talented State Farm Data Scientists, current and past, for their friendship, expertise, and encouragement. Thanks to my technical reviewers for providing insight and assistance with this book. Most importantly, I’d like to thank my wife, Lana, and my son, Will, for making time for this in our lives.

About the reviewer

Vitor Bianchi Lanzetta has a master’s degree in Applied Economics from the University of São Paulo, one of the most reputable universities in Latin America. He has done a lot of research in economics using neural networks. He has also authored R Data Visualization Recipes, Packt Publishing. Vitor is very passionate about data science in general, and he walks the earth with a personal belief that he is just as cool as he is geek. He thinks that you will learn a lot from this book, and that TensorFlow may be the greatest deep learning tool currently available.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Title Page

Deep Learning Quick Reference

Dedication

Packt Upsell

Why subscribe?

PacktPub.com

Foreword

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Reviews

The Building Blocks of Deep Learning

The deep neural network architectures

Neurons

The neuron linear function

Neuron activation functions

The loss and cost functions in deep learning

The forward propagation process

The back propagation function

Stochastic and minibatch gradient descents

Optimization algorithms for deep learning

Using momentum with gradient descent

The RMSProp algorithm

The Adam optimizer

Deep learning frameworks

What is TensorFlow?

What is Keras?

Popular alternatives to TensorFlow

GPU requirements for TensorFlow and Keras

Installing Nvidia CUDA Toolkit and cuDNN

Installing Python

Installing TensorFlow and Keras

Building datasets for deep learning

Bias and variance errors in deep learning

The train, val, and test datasets

Managing bias and variance in deep neural networks

K-Fold cross-validation

Summary

Using Deep Learning to Solve Regression Problems

Regression analysis and deep neural networks

Benefits of using a neural network for regression

Drawbacks to consider when using a neural network for regression

Using deep neural networks for regression

How to plan a machine learning problem

Defining our example problem

Loading the dataset

Defining our cost function

Building an MLP in Keras

Input layer shape

Hidden layer shape

Output layer shape

Neural network architecture

Training the Keras model

Measuring the performance of our model

Building a deep neural network in Keras

Measuring the deep neural network performance

Tuning the model hyperparameters

Saving and loading a trained Keras model

Summary

Monitoring Network Training Using TensorBoard

A brief overview of TensorBoard

Setting up TensorBoard

Installing TensorBoard

How TensorBoard talks to Keras/TensorFlow

Running TensorBoard

Connecting Keras to TensorBoard

Introducing Keras callbacks

Creating a TensorBoard callback

Using TensorBoard

Visualizing training

Visualizing network graphs

Visualizing a broken network

Summary

Using Deep Learning to Solve Binary Classification Problems

Binary classification and deep neural networks

Benefits of deep neural networks

Drawbacks of deep neural networks

Case study – epileptic seizure recognition

Defining our dataset

Loading data

Model inputs and outputs

The cost function

Using metrics to assess the performance

Building a binary classifier in Keras

The input layer

The hidden layers

What happens if we use too many neurons?

What happens if we use too few neurons?

Choosing a hidden layer architecture

Coding the hidden layers for our example

The output layer

Putting it all together

Training our model

Using the checkpoint callback in Keras

Measuring ROC AUC in a custom callback

Measuring precision, recall, and f1-score

Summary

Using Keras to Solve Multiclass Classification Problems

Multiclass classification and deep neural networks

Benefits

Drawbacks

Case study - handwritten digit classification

Problem definition

Model inputs and outputs

Flattening inputs

Categorical outputs

Cost function

Metrics

Building a multiclass classifier in Keras

Loading MNIST

Input layer

Hidden layers

Output layer

Softmax activation

Putting it all together

Training

Using scikit-learn metrics with multiclass models

Controlling variance with dropout

Controlling variance with regularization

Summary

Hyperparameter Optimization

Should network architecture be considered a hyperparameter?

Finding a giant and then standing on his shoulders

Adding until you overfit, then regularizing

Practical advice

Which hyperparameters should we optimize?

Hyperparameter optimization strategies

Common strategies

Using random search with scikit-learn

Hyperband

Summary

Training a CNN from Scratch

Introducing convolutions

How do convolutional layers work?

Convolutions in three dimensions

A layer of convolutions

Benefits of convolutional layers

Parameter sharing

Local connectivity

Pooling layers

Batch normalization

Training a convolutional neural network in Keras

Input

Output

Cost function and metrics

Convolutional layers

Fully connected layers

Multi-GPU models in Keras

Training

Using data augmentation

The Keras ImageDataGenerator

Training with a generator

Summary

Transfer Learning with Pretrained CNNs

Overview of transfer learning

When transfer learning should be used

Limited data

Common problem domains

The impact of source/target volume and similarity

More data is always beneficial

Source/target domain similarity

Transfer learning in Keras

Target domain overview

Source domain overview

Source network architecture

Transfer network architecture

Data preparation

Data input

Training (feature extraction)

Training (fine-tuning)

Summary

Training an RNN from scratch

Introducing recurrent neural networks

What makes a neuron recurrent?

Long Short Term Memory Networks

Backpropagation through time

A refresher on time series problems

Stock and flow

ARIMA and ARIMAX forecasting

Using an LSTM for time series prediction

Data preparation

Loading the dataset

Slicing train and test by date

Differencing a time series

Scaling a time series

Creating a lagged training set

Input shape

Data preparation glue

Network output

Network architecture

Stateful versus stateless LSTMs

Training

Measuring performance

Summary

Training LSTMs with Word Embeddings from Scratch

An introduction to natural language processing

Semantic analysis

Document classification

Vectorizing text

NLP terminology

Bag of Word models

Stemming, lemmatization, and stopwords

Count and TF-IDF vectorization

Word embedding

A quick example

Learning word embeddings with prediction

Learning word embeddings with counting

Getting from words to documents

Keras embedding layer

1D CNNs for natural language processing

Case studies for document classifications

Sentiment analysis with Keras embedding layers and LSTMs

Preparing the data

Input and embedding layer architecture

LSTM layer

Output layer

Putting it all together

Training the network

Performance

Document classification with and without GloVe

Preparing the data

Loading pretrained word vectors

Input and embedding layer architecture

Without GloVe vectors

With GloVe vectors

Convolution layers

Output layer

Putting it all together

Training

Performance

Summary

Training Seq2Seq Models

Sequence-to-sequence models

Sequence-to-sequence model applications

Sequence-to-sequence model architecture

Encoders and decoders

Characters versus words

Teacher forcing

Attention

Translation metrics

Machine translation

Understanding the data

Loading data

One hot encoding

Training network architecture

Network architecture (for inference)

Putting it all together

Training

Inference

Loading data

Creating reverse indices

Loading models

Translating a sequence

Decoding a sequence

Example translations

Summary

Using Deep Reinforcement Learning

Reinforcement learning overview

Markov Decision Processes

Q Learning

Infinite state space

Deep Q networks

Online learning

Memory and experience replay

Exploitation versus exploration 

DeepMind

The Keras reinforcement learning framework

Installing Keras-RL

Installing OpenAI gym

Using OpenAI gym

Building a reinforcement learning agent in Keras

CartPole

CartPole neural network architecture

Memory

Policy

Agent

Training

Results

Lunar Lander 

Lunar Lander network architecture

Memory and policy

Agent

Training

Results

Summary

Generative Adversarial Networks

An overview of the GAN

Deep Convolutional GAN architecture

Adversarial training architecture

Generator architecture

Discriminator architecture

Stacked training

Step 1 – train the discriminator

Step 2 – train the stack 

How GANs can fail

Stability

Mode collapse

Safe choices for GAN

Generating MNIST images using a Keras GAN

Loading the dataset

Building the generator

Building the discriminator

Building the stacked model

The training loop

Model evaluation

Generating CIFAR-10 images using a Keras GAN

Loading CIFAR-10

Building the generator

Building the discriminator

The training loop

Model evaluation

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Deep Learning Quick Reference demonstrates a fast and practical approach to using deep learning. It's focused on real-life problems, and it provides just enough theory and math to reinforce the readers' understanding of the topic. Deep learning is an exciting, fast paced branch of machine learning, but it's also a field that can be broken into. It's a field where a flood of detailed, complicated research is created every day, and this can be overwhelming. In this book, I focus on teaching you the skills to apply deep learning on a variety of practical problems. My greatest hope for this book is that it will provide you with the tools you need to use deep learning techniques to solve your machine learning problems.

Who this book is for

I'm a practicing data scientist, and I'm writing this book keeping other practicing data scientists and machine learning engineers in mind. If you're a software engineer applying deep learning, this book is also for you.

If you're a deep learning researcher, then this book isn't really for you; however, you should still pick up a copy so that you can criticize the lack of proofs and mathematical rigor in this book.

If you're an academic or educator, then this book is definitely for you. I've taught a survey source in data science at the University of Illinois at Springfield (go Prairie Stars!) for the past 3 years, and in doing so, I've had the opportunity to inspire a number of future machine learning people. This experience has inspired me to create this book. I think a book like this is a great way to help students build interest in a very complex topic.

What this book covers

Chapter 1, The Building Blocks of Deep Learning, reviews some basics around the operation of neural networks, touches on optimization algorithms, talks about model validation, and goes over setting up a development environment suitable for building deep neural networks.

Chapter 2, Using Deep Learning to Solve Regression Problems, enables you build very simple neural networks to solve regression problems and explore the impact of deeper more complex models on those problems.

Chapter 3, Monitoring Network Training Using TensorBoard, lets you get started right away with TensorBoard, which is a wonderful application for monitoring and debugging your future models.

Chapter 4, Using Deep Learning to Solve Binary Classification Problems, helps you solve binary classification problems using deep learning.

Chapter 5, Using Keras to Solve Multiclass Classification Problems, takes you to multiclass classification and explores the differences. It also talks about managing overfitting and the safest choices for doing so.

Chapter 6, Hyperparameter Optimization, shows two separate methods for model tuning—one, well-known and battle tested, while the other is a state-of-the-art method.

Chapter 7, Training a CNN From Scratch, teaches you how to use convolutional networks to do classification with images.

Chapter 8, Transfer Learning with Pretrained CNNs, describes how to apply transfer learning to get amazing performance from an image classifier, even with very little data.

Chapter 9, Training an RNN from scratch, discusses RNNs and LSTMS, and how to use them for time series forecasting problems.

Chapter 10, Training LSTMs with Word Embeddings From Scratch, continues our conversation on LSTMs, this time talking about natural language classification tasks.

Chapter 11, Training Seq2Seq Models, helps us use sequence to sequence models to do machine translation.

Chapter 12, Using Deep Reinforcement Learning, introduces deep reinforcement learning and builds a deep Q network that can power autonomous agents.

Chapter 13, Generative Adversarial Networks, explains how to use generative adversarial networks to generate convincing images.

To get the most out of this book

I assume that you're already experienced with more traditional data science and predictive modeling techniques such as Linear/Logistic Regression and Random Forest. If this is your first experience with machine learning, this may be a little difficult for you.

I also assume that you have at least some experience in programming with Python, or at least another programming language such as Java or C++.

Deep learning is computationally intensive, and some of the models we build here require an NVIDIA GPU to run in a reasonable amount of time. If you don't own a fast GPU, you may wish to use a GPU-based cloud instance on either Amazon Web Services or Google Cloud Platform.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

www.packtpub.com

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

Enter the name of the book in the

box and follow the onscreen instructions.

Once the file is downloaded, make sure that you unzip or extract the folder using the latest version of any of these:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for macOS

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Deep-Learning-Quick-Reference. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Visit www.packtpub.com/submit-errata, select your book, click on the Errata Submission Form link, and enter the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions; also, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

The Building Blocks of Deep Learning

Welcome to Deep Learning Quick Reference! In this book, I am going to attempt to make deep learning techniques more accessible, practical, and consumable to data scientists, machine learning engineers, and software engineers who need to solve problems with deep learning. If you want to train your own deep neural network and you're stuck somewhere, there is a good chance this guide will help.

This book is hands on and is intended to be a practical guide that can help you solve your problems fast. It is primarily intended for experienced machine learning engineers and data scientists who need to use deep learning to solve a problem. Aside from this chapter, which provides some of the terminology, frameworks, and background that we will need to get started, it's not meant to be read in order. Each chapter contains a practical example, complete with code and a few best practices and safe choices. We expect you to flip to the chapter you need and get started.

This book won't go deeply into the theory of deep learning and neural networks. There are many wonderful books that can provide that background, and I highly recommend that you read at least one of them (maybe a bibliography or just recommendations). We hope to provide just enough theory and mathematical intuition to get you started.

We will cover the following topics in this chapter:

Deep neural network architectures

Optimization algorithms for deep learning

Deep learning frameworks

Building datasets for deep learning

The deep neural network architectures

The deep neural network architectures can vary greatly in structure depending on the network's application, but they all have some basic components. In this section, we will talk briefly about those components.

In this book, I'll define a deep neural network as a network with more than a single hidden layer. Beyond that we won't attempt to limit the membership to the Deep Learning Club. As such, our networks might have less than 100 neurons, or possibly millions. We might use special layers of neurons, including convolutions and recurrent layers, but we will refer to all of these as neurons nonetheless.

Neurons

A neuron is the atomic unit of a neural network. This is sometimes inspired by biology; however, that's a topic for a different book. Neurons are typically arranged into layers. In this book, if I'm referring to a specific neuron, I'll use the notation where l is the layer the neuron is in and k is the neuron number. As we will be using programming languages that observe 0th notation, my notation will also be 0th based.

At their core, most neurons are composed of two functions that work together: a linear function and an activation function. Let us take a high-level look at those two components.

The neuron linear function

The first component of the neuron is a linear function whose output is the sum of the inputs, each multiplied by a coefficient. This function is really more or less a linear regression. These coefficients are typically referred to as weights in neural network speak. For example, given some neuron with the input features of x1, x2, and x3, and output z, this linear component or the neuron linear function would simply be:

Where are weights or coefficients that we will need to learn given the data and b is a bias term.

Neuron activation functions

The second function of the neuron is the activation function, which is tasked with introducing a nonlinearity between neurons. A commonly used activation is the sigmoid activation, which you may be familiar with from logistic regression. It squeezes the output of the neuron into an output space where very large values of z are driven to 1 and very small values of z are driven to 0.

The sigmoid function looks like this:

It turns out that the activation function is very important for intermediate neurons. Without it one could prove that a stack of neurons with linear activation's (which is really no activation, or more formally an activation function where z=z) is really just a single linear function.

A single linear function is undesirable in this case because there are many scenarios where our network may be under specified for the problem at hand. That is to say that the network can't model the data well because of non-linear relationships present in the data between the input features and target variable (what we're predicting).

The canonical example of a function that cannot be modeled with a linear function is the exclusive OR function, which is shown in the following figure:

Other common activation functions are the tanh function and the ReLu or Rectilinear Activation.

The hyperbolic tangent or the tanh function looks like this:

The tanh usually works better than sigmoid for intermediate layers. As you can probably see, the output of tanh will be between [-1, 1], whereas the output of sigmoid is [0, 1]. This additional width provides some resilience from a phenomenon known as the vanishing/exploding gradient problem, which we will cover in more detail later. For now, it's enough to know that the vanishing gradient problem can cause networks to converge very slowly in the early layers, if at all. Because of that, networks using tanh will tend to converge somewhat faster than networks that use sigmoid activation. That said, they are still not as fast as ReLu.

ReLu, or Rectilinear Activation, is defined simply as:

It's a safe bet and we will use it most of the time throughout this book. Not only is ReLu easy to compute and differentiate, it's also resilient against the vanishing gradient problem. The only drawback to ReLu is that it's first derivative is undefined at exactly 0. Variants including leaky ReLu, are computationally harder, but more robust against this issue.

For completeness, here's a somewhat obvious graph of ReLu:

The loss and cost functions in deep learning

Every machine learning model really starts with a cost function. Simply, a cost function allows you to measure how well your model is fitting the training data. In this book, we will define the loss function as the correctness of fit for a single observation within the training set. The cost function will then most often be an average of the loss across the training set. We will revisit loss functions later when we introduce each type of neural network; however, quickly consider the cost function for linear regression as an example:

In this case, the loss function would be , which is really the squared error. So then J, our cost function, is really just the mean squared error, or an average of the squared error across the entire dataset. The term 1/2 is added to make some of the calculus cleaner by convention.

The forward propagation process

Forward propagation is the process by which we attempt to predict our target variable using the features present in a single observation. Imagine we had a two-layer neural network. In the forward propagation process, we would start with the features present within that observation and then multiply those features by their associated coefficients within layer 1 and add a bias term for each neuron. After that, we would send that output to the activation for the neuron. Following that, the output would be sent to the next layer, and so on, until we reach the end of the network where we are left with our network's prediction:

The back propagation function

Once forward propagation is complete, we have the network's prediction for each data point. We also know that data point's actual value. Typically, the prediction is defined as while the actual value of the target variable is defined as y.

Once both y and are known, the network's error can be computed using the cost function. Recall that the cost function is the average of the loss function.

In order for learning to occur within the network, the network's error signal must be propagated backwards through the network layers from the last layer to the first. Our goal in back propagation is to propagate this error signal backwards through the network while using it to update the network weights as the signal travels. Mathematically, to do so we need to minimize the cost function by nudging the weights towards values that make the cost function the smallest. This process is called gradient descent.

The gradient is the partial derivative of the error function with respect to each weight within the network. The gradient of each weight can be calculated, layer by layer, using the chain rule and the gradients of the layers above.

Once the gradients of each layer are known, we can use the gradient descent algorithm to minimize the cost function.

The Gradient Descent will repeat this update until the network's error is minimized and the process has converged:

The gradient descent algorithm multiples the gradient by a learning rate called alpha and subtracts that value from the current value of each weight. The learning rate is a hyperparameter.

Stochastic and minibatch gradient descents

The algorithm describe in the previous section assumes a forward and corresponding backwards pass over the entire dataset and as such it's called batch gradient descent.

Another possible way to do gradient descent would be to use a single data point at a time, updating the network weights as we go. This method might help speed up convergence around saddle points where the network might stop converging. Of course, the error estimation of only a single point may not be a very good approximation of the error of the entire dataset.

The best solution to this problem is using mini batch gradient descent, in which we will take some random subset of the data called a mini batch to compute our error and update our network weights. This is almost always the best option. It has the additional benefit of naturally splitting a very large dataset into chunks that are more easily managed in the memory of a machine, or even across machines.

This is an extremely high-level description of one of the most important parts of a neural network, which we believe fits with the practical nature of this book. In practice, most modern frameworks handle these steps for us; however, they are most certainly worth knowing at least theoretically. We encourage the reader to go deeper into forward and backward propagation as time permits.

Optimization algorithms for deep learning

The gradient descent algorithm is not the only optimization algorithm available to optimize our network weights, however it's the basis for most other algorithms. While understanding every optimization algorithm out there is likely a PhD worth of material, we will devote a few sentences to some of the most practical.

Using momentum with gradient descent

Using gradient descent with momentum speeds up gradient descent by increasing the speed of learning in directions the gradient has been constant in direction while slowing learning in directions the gradient fluctuates in direction. It allows the velocity of gradient descent to increase.

Momentum works by introducing a velocity term, and using a weighted moving average of that term in the update rule, as follows:

Most typically is set to 0.9 in the case of momentum, and usually this is not a hyper-parameter that needs to be changed.

The RMSProp algorithm

RMSProp is another algorithm that can speed up gradient descent by speeding up learning in some directions, and dampening oscillations in other directions, across the multidimensional space that the network weights represent:

This has the effect of reducing oscillations more in directions where is large.

The Adam optimizer

Adam is one of the best performing known optimizer and it's my first choice. It works well across a wide variety of problems. It combines the best parts of both momentum and RMSProp into a single update rule:

Where is some very small number to prevent division by 0.

Adam is often a great choice, and it's a great place to start when you're prototyping, so save yourself some time by starting with Adam.

Deep learning frameworks

While it's most certainly possible to build and train deep neural networks from scratch using just Python's numpy, that would take a great deal of time and code. It's far more practical, in almost every case, to use a deep learning framework.

Throughout this book we will be using TensorFlow and Keras to make developing deep neural networks much easier and faster.

What is TensorFlow?

TensorFlow is a library that can be used to quickly build deep neural networks. In TensorFlow, the mathematical operations that we've covered thus far are expressed as nodes. The edges between these nodes are tensors, or multidimensional data arrays. TensorFlow can, given a neural network defined as a graph and a loss function, automatically compute gradients for the network and optimize the graph to minimize the loss function.

TensorFlow was released as an open source project by Google in 2015. Since then it has gained a very large following and enjoys a large user community. While TensorFlow provides APIs in Java, C++, Go, and Python, we will only be covering the Python API. The Python API is used in this book because it's both the most commonly used, and the API most commonly used for the development of new models.

TensorFlow can greatly accelerate computation by performing those calculations on one or more Graphics Processing Units. The acceleration that GPU computation provides has become a necessity in modern deep learning.