E-Book
39,59 €

Reinforcement Learning with TensorFlow E-Book

Sayon Dutta

0,0

39,59 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Leverage the power of the Reinforcement Learning techniques to develop self-learning systems using Tensorflow

Key Features Learn reinforcement learning concepts and their implementation using TensorFlow Discover different problem-solving methods for Reinforcement Learning Apply reinforcement learning for autonomous driving cars, robobrokers, and moreBook Description

Reinforcement Learning (RL), allows you to develop smart, quick and self-learning systems in your business surroundings. It is an effective method to train your learning agents and solve a variety of problems in Artificial Intelligence—from games, self-driving cars and robots to enterprise applications that range from datacenter energy saving (cooling data centers) to smart warehousing solutions.

The book covers the major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. The book also introduces readers to the concept of Reinforcement Learning, its advantages and why it’s gaining so much popularity. The book also discusses on MDPs, Monte Carlo tree searches, dynamic programming such as policy and value iteration, temporal difference learning such as Q-learning and SARSA. You will use TensorFlow and OpenAI Gym to build simple neural network models that learn from their own actions. You will also see how reinforcement learning algorithms play a role in games, image processing and NLP.

By the end of this book, you will have a firm understanding of what reinforcement learning is and how to put your knowledge to practical use by leveraging the power of TensorFlow and OpenAI Gym.

What you will learn Implement state-of-the-art Reinforcement Learning algorithms from the basics Discover various techniques of Reinforcement Learning such as MDP, Q Learning and more Learn the applications of Reinforcement Learning in advertisement, image processing, and NLP Teach a Reinforcement Learning model to play a game using TensorFlow and the OpenAI gym Understand how Reinforcement Learning Applications are used in roboticsWho this book is for

If you want to get started with reinforcement learning using TensorFlow in the most practical way, this book will be a useful resource. The book assumes prior knowledge of machine learning and neural network programming concepts, as well as some understanding of the TensorFlow framework. No previous experience with Reinforcement Learning is required.

Sayon Dutta is an Artificial Intelligence researcher and developer. A graduate from IIT Kharagpur, he owns the software copyright for Mobile Irrigation Scheduler. At present, he is an AI engineer at Wissen Technology. He co-founded an AI startup Marax AI Inc., focused on AI-powered customer churn prediction. With over 2.5 years of experience in AI, he invests most of his time implementing AI research papers for industrial use cases, and weightlifting.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 359

Veröffentlichungsjahr: 2018

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Reinforcement Learning with TensorFlow

A beginner's guide to designing self-learning systems with TensorFlow and OpenAI Gym

Sayon Dutta

BIRMINGHAM - MUMBAI

Reinforcement Learning with TensorFlow

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Amey VarangaonkarAcquisition Editor:Viraj MadhavContent Development Editor:Aaryaman Singh, Varun SonyTechnical Editor: Dharmendra YadavCopy Editors:Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Tejal Daruwale SoniGraphics: Tania DuttaProduction Coordinator:Shantanu Zagade

First published: April 2018

Production reference: 1200418

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78883-572-5

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

I would extend my gratitude to Maa and Baba for everything, especially for teaching me that life is all about hustle and the key to enjoyment is getting used to it; my brothers Arnav, Kedia, Rawat, Abhishek Singh, and Garg for helping me in my lowest times. Thanks to the Packt team, especially Viraj for reaching out, and Aaryaman and Varun for guiding me throughout. Thanks to the AI community and my readers.

About the reviewer

Narotam Singh has been in Indian Meteorological Department, Ministry of Earth Sciences, India, since 1996. He has been actively involved with various technical programs and training of officers of GoI in IT and communication. He did his PG in electronics in 1996, and Diploma and PG diploma in computer engineering in 1994 and 1997 respectively. He is working in the enigmatic field of neural networks, deep learning, and machine learning app development in iOS with Core ML.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Title Page

Reinforcement Learning with TensorFlow

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Deep Learning – Architectures and Frameworks

Deep learning

Activation functions for deep learning

The sigmoid function

The tanh function

The softmax function

The rectified linear unit function

How to choose the right activation function

Logistic regression as a neural network

Notation

Objective

The cost function

The gradient descent algorithm

The computational graph

Steps to solve logistic regression using gradient descent

What is xavier initialization?

Why do we use xavier initialization?

The neural network model

Recurrent neural networks

Long Short Term Memory Networks

Convolutional neural networks

The LeNet-5 convolutional neural network

The AlexNet model

The VGG-Net model

The Inception model

Limitations of deep learning

The vanishing gradient problem

The exploding gradient problem

Overcoming the limitations of deep learning

Reinforcement learning

Basic terminologies and conventions

Optimality criteria

The value function for optimality

The policy model for optimality

The Q-learning approach to reinforcement learning

Asynchronous advantage actor-critic

Introduction to TensorFlow and OpenAI Gym

Basic computations in TensorFlow

An introduction to OpenAI Gym

The pioneers and breakthroughs in reinforcement learning

David Silver

Pieter Abbeel

Google DeepMind

The AlphaGo program

Libratus

Summary

Training Reinforcement Learning Agents Using OpenAI Gym

The OpenAI Gym

Understanding an OpenAI Gym environment

Programming an agent using an OpenAI Gym environment

Q-Learning

The Epsilon-Greedy approach

Using the Q-Network for real-world applications

Summary

Markov Decision Process

Markov decision processes

The Markov property

The S state set

Actions

Transition model

Rewards

Policy

The sequence of rewards - assumptions

The infinite horizons

Utility of sequences

The Bellman equations

Solving the Bellman equation to find policies

An example of value iteration using the Bellman equation

Policy iteration

Partially observable Markov decision processes

State estimation

Value iteration in POMDPs

Training the FrozenLake-v0 environment using MDP

Summary

Policy Gradients

The policy optimization method

Why policy optimization methods?

Why stochastic policy?

Example 1 - rock, paper, scissors

Example 2 - state aliased grid-world

Policy objective functions

Policy Gradient Theorem

Temporal difference rule

TD(1) rule

TD(0) rule

TD() rule

Policy gradients

The Monte Carlo policy gradient

Actor-critic algorithms

Using a baseline to reduce variance

Vanilla policy gradient

Agent learning pong using policy gradients

Summary

Q-Learning and Deep Q-Networks

Why reinforcement learning?

Model based learning and model free learning

Monte Carlo learning

Temporal difference learning

On-policy and off-policy learning

Q-learning

The exploration exploitation dilemma

Q-learning for the mountain car problem in OpenAI gym

Deep Q-networks

Using a convolution neural network instead of a single layer neural network

Use of experience replay

Separate target network to compute the target Q-values

Advancements in deep Q-networks and beyond

Double DQN

Dueling DQN

Deep Q-network for mountain car problem in OpenAI gym

Deep Q-network for Cartpole problem in OpenAI gym

Deep Q-network for Atari Breakout in OpenAI gym

The Monte Carlo tree search algorithm

Minimax and game trees

The Monte Carlo Tree Search

The SARSA algorithm

SARSA algorithm for mountain car problem in OpenAI gym

Summary

Asynchronous Methods

Why asynchronous methods?

Asynchronous one-step Q-learning

Asynchronous one-step SARSA

Asynchronous n-step Q-learning

Asynchronous advantage actor critic

A3C for Pong-v0 in OpenAI gym

Summary

Robo Everything – Real Strategy Gaming

Real-time strategy games

Reinforcement learning and other approaches

Online case-based planning

Drawbacks to real-time strategy games

Why reinforcement learning?

Reinforcement learning in RTS gaming

Deep autoencoder

How is reinforcement learning better?

Summary

AlphaGo – Reinforcement Learning at Its Best

What is Go?

Go versus chess

How did DeepBlue defeat Gary Kasparov?

Why is the game tree approach no good for Go?

AlphaGo – mastering Go

Monte Carlo Tree Search

Architecture and properties of AlphaGo 

Energy consumption analysis – Lee Sedol versus AlphaGo

AlphaGo Zero

Architecture and properties of AlphaGo Zero

Training process in AlphaGo Zero 

Summary

Reinforcement Learning in Autonomous Driving

Machine learning for autonomous driving

Reinforcement learning for autonomous driving

Creating autonomous driving agents

Why reinforcement learning ?

Proposed frameworks for autonomous driving

Spatial aggregation

Sensor fusion

Spatial features

Recurrent temporal aggregation

Planning

DeepTraffic – MIT simulator for autonomous driving 

Summary

Financial Portfolio Management

Introduction

Problem definition

Data preparation

Reinforcement learning

Further improvements

Summary

Reinforcement Learning in Robotics

Reinforcement learning in robotics

Evolution of reinforcement learning

Challenges in robot reinforcement learning

High dimensionality problem

Real-world challenges

Issues due to model uncertainty

What's the final objective a robot wants to achieve?

Open questions and practical challenges

Open questions

Practical challenges for robotic reinforcement learning

Key takeaways

Summary

Deep Reinforcement Learning in Ad Tech

Computational advertising challenges and bidding strategies

Business models used in advertising

Preface

Reinforcement learning (RL) allows you to develop smart, quick, and self-learning systems in your business surroundings. It is an effective method to train your learning agents and solve a variety of problems in artificial intelligence—from games, self-driving cars, and robots to enterprise applications that range from data center energy saving (cooling data centers) to smart warehousing solutions.

The book covers the major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. The book also introduces readers to the concept of Reinforcement Learning, its advantages and why it's gaining so much popularity. It discusses MDPs, Monte Carlo tree searches, policy and value iteration, temporal difference learning such as Q-learning, and SARSA. You will use TensorFlow and OpenAI Gym to build simple neural network models that learn from their own actions. You will also see how reinforcement learning algorithms play a role in games, image processing, and NLP.By the end of this book, you will have a firm understanding of what reinforcement learning is and how to put your knowledge to practical use by leveraging the power of TensorFlow and OpenAI Gym.

Who this book is for

If you want to get started with reinforcement learning using TensorFlow in the most practical way, this book will be a useful resource. The book assumes prior knowledge of traditional machine learning and linear algebra, as well as some understanding of the TensorFlow framework. No previous experience of reinforcement learning and deep neural networks is required.

What this book covers

Chapter 1, Deep Reinforcement – Architectures and Frameworks, covers the relevant and common deep learning architectures, basics of logistic regression, neural networks, RNN, LSTMs, and CNNs. We also cover an overview of reinforcement learning, the various technologies, frameworks, tools, and techniques, along with what has been achieved so far, the future, and various interesting applications.

Chapter 2, Training Reinforcement Learning Agents Using OpenAI Gym, explains that OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games such as Pong or Breakout. In this chapter, we learn how to use the OpenAI Gym framework to program interesting RL applications.

Chapter 3, Markov Decision Process, discusses the fundamental concepts behind reinforcement learning such as MDP, Bellman Value functions, POMDP, concepts of value iteration, reward's sequence, and training a reinforcement learning agent using value iteration in an MDP environment from OpenAI Gym.

Chapter 4, Policy Gradients, shows a way of implementing reinforcement learning systems by directly deriving the policies. Policy gradients are faster and can work in continuous state-action spaces. We cover the basics of policy gradient such as policy objective functions, temporal difference rule, policy gradients, and actor-critic algorithms. We learn to apply a policy gradient algorithm to train an agent to play the game of Pong.

Chapter 5, Q-Learning and Deep Q-Networks, explains that algorithms such as State-Action-Reward-State-Action (SARSA), MCTS, and DQN have enabled a new era of RL, including AlphaGo. In this chapter, we take a look at the building blocks of Q-Learning and applying deep neural networks (such as CNNs) to create DQN. We also implement SARSA, Q-learning, and DQN to create agents to play the games of Mountain Car, Cartpole, and Atari Breakout.

Chapter 6, Asynchronous Methods, teaches asynchronous methods: asynchronous one-step Q-learning, asynchronous one-step SARSA, asynchronous n-step Q-learning, and asynchronous advantage actor-critic (A3C). A3C is a state-of-the-art deep reinforcement learning framework. We also implement A3C to create a reinforcement learning agent.

Chapter 7, Robo Everything – Real Strategy Gaming, brings together the RL foundations, technologies, and frameworks together to develop RL pipelines and systems. We will also discuss the system-level strategies to make reinforcement learning problems easier to solve (shaping, curriculum learning, apprenticeship learning, building blocks, and multiconcepts).

Chapter 8, AlphaGo – Reinforcement Learning at Its Best, covers one of the most successful stories: the success of AI in playing and winning the game of Go against the world champion. In this chapter, we look at the algorithms, architectures, pipelines, hardware, training methodologies, and game strategies employed by AlphaGo.

Chapter 9, Reinforcement Learning in Autonomous Driving, illustrates one of the most interesting applications of RL, that is, autonomous driving. There are many use cases such as multi-lane merging and driving policies for negotiating roundabouts. We cover the challenges in autonomous driving and discuss proposed research-based solutions. We also introduce the famous MIT Deep Traffic simulator to test our reinforcement learning framework.Chapter 10, Financial Portfolio Management, covers the application of RL techniques in the financial world. Many predict that AI will be the norm in asset management, trading desks, and portfolio management.Chapter 11, Reinforcement Learning in Robotics, shows another interesting domain in which RL has found a lot of applications—robotics. The challenges of implementing RL in robotics and the probable solutions are covered.

Chapter 12, Deep Reinforcement Learning in Ad Tech, covers topics such as computational advertising challenges, bidding strategies, and real-time bidding by reinforcement learning in display advertising.

Chapter 13, Reinforcement Learning in Image Processing, is about the most famous domain in computer vision—object detection—and how reinforcement learning is trying to solve it.

Chapter 14, Deep Reinforcement Learning in NLP , illustrates the use of reinforcement learning in text summarization and question answering, which will give you a basic idea of how researchers are reaping the benefits of reinforcement learning in these domains.Appendix A, Further topics in Reinforcement Learning, has an introductory overview of some of the topics that were out of the scope of this book. But we mention them in brief and end these topics with external links for you to explore them further.

To get the most out of this book

The following are the requirements to get the most out of this book:

Python and TensorFlow

Linear algebra as a prerequisite for neural networks

Installation bundle: Python, TensorFlow, and OpenAI gym (shown in

Chapter 1

Deep Learning – Architectures and Frameworks

and

Chapter 2

Training Reinforcement Learning Agents Using OpenAI Gym

)

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

www.packtpub.com

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

Enter the name of the book in the

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Reinforcement-Learning-with-TensorFlow. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/ReinforcementLearningwithTensorFlow_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Deep Learning – Architectures and Frameworks

Artificial neural networks are computational systems that provide us with important tools to solve challenging machine learning tasks, ranging from image recognition to speech translation. Recent breakthroughs, such as Google DeepMind's AlphaGo defeating the best Go players or Carnegie Mellon University's Libratus defeating the world's best professional poker players, have demonstrated the advancement in the algorithms; these algorithms learn a narrow intelligence like a human would and achieve superhuman-level performance. In plain speech, artificial neural networks are a loose representation of the human brain that we can program in a computer; to be precise, it's an approach inspired by our knowledge of the functions of the human brain. A key concept of neural networks is to create a representation space of the input data and then solve the problem in that space; that is, warping the data from its current state in such a way that it can be represented in a different state where it can solve the concerned problem statement (say, a classification or regression). Deep learning means multiple hidden representations, that is, a neural network with many layers to create more effective representations of the data. Each layer refines the information received from the previous one.

Reinforcement learning, on the other hand, is another wing of machine learning, which is a technique to learn any kind of activity that follows a sequence of actions. A reinforcement learning agent gathers the information from the environment and creates a representation of the states; it then performs an action that results in a new state and a reward (that is, quantifiable feedback from the environment telling us whether the action was good or bad). This phenomenon continues until the agent is able to improve the performance beyond a certain threshold, that is, maximizing the expected value of the rewards. At each step, these actions can be chosen randomly, can be fixed, or can be supervised using a neural network. The supervision of predicting action using a deep neural network opens a new domain, called deep reinforcement learning. This forms the base of AlphaGo, Libratus, and many other breakthrough research in the field of artificial intelligence.

We will cover the following topics in this chapter:

Deep learning

Reinforcement learning

Introduction to TensorFlow and OpenAI Gym

The influential researchers and projects in reinforcement learning

Deep learning

Deep learning refers to training large neural networks. Let's first discuss some basic use cases of neural networks and why deep learning is creating such a furore even though these neural networks have been here for decades.

Following are the examples of supervised learning in neural networks:

Inputs(x)

Output(y)

Application domain

Suggested neural network approach

House features

Price of the house

Real estate

Standard neural network with rectified linear unit in the output layer

Ad and user info Click on ad ?

Yes(1) or No(0)

Online advertising

Standard neural network with binary classification

Image object

Classifying from 100 different objects, that is (1,2,.....,100)

Photo tagging

Convolutional neural network (since image, that is, spatial data)

Audio

Text transcript

Speech recognition

Recurrent neural network (since both input-output are sequential data)

English

Chinese

Machine translation

Recurrent neural network (since the input is a sequential data)

Image, radar information

Position of other cars

Autonomous driving

Customized hybrid/complex neural network

We will go into the details of the previously-mentioned neural networks in the coming sections of this chapter, but first we must understand that different types of neural networks are used based on the objective of the problem statement.

Supervised learning is an approach in machine learning where an agent is trained using pairs of input features and their corresponding output/target values (also called labels).

Traditional machine learning algorithms worked very well for the structured data, where most of the input features were very well defined. This is not the case with the unstructured data, such as audio, image, and text, where the data is a signal, pixels, and letters, respectively. It's harder for the computers to make sense of the unstructured data than the structured data. The neural network's ability to make predictions based on this unstructured data is the key reason behind their popularity and generate economic value.

First, it's the scale at the present moment, that is the scale of data, computational power and new algorithms, which is driving the progress in deep learning.It's been over four decades of internet, resulting in an enormous amount of digital footprints accumulating and growing. During that period, research and technological development helped to expand the storage and processing ability of computational systems. Currently, owing to these heavy computational systems and massive amounts of data, we are able to verify discoveries in the field of artificial intelligence done over the past three decades.

Now, what do we need to implement deep learning?

First, we need a large amount of data.

Second, we need to train a reasonably large neural network.

So, why not train a large neural network on small amounts of data?

Think back to your data structure lessons, where the utility of the structure is to sufficiently handle a particular type of value. For example, you will not store a scalar value in a variable that has the tensor data type. Similarly, these large neural networks create distinct representations and develop comprehending patterns given the high volume of data, as shown in the following graph:

Please refer to the preceding graphical representation of data versus performance of different machine learning algorithms for the following inferences:

We see that the performance of traditional machine learning algorithms converges after a certain time as they are not able to absorb distinct representations with data volume beyond a threshold.

Check the bottom left part of the graph, near the origin. This is the region where the relative ordering of the algorithms is not well defined. Due to the small data size, the inner representations are not that distinct. As a result, the performance metrics of all the algorithms coincide. At this level, performance is directly proportional to better feature engineering. But these hand engineered features fail with the increase in data size. That's where deep neural networks come in as they are able to capture better representations from large amounts of data.

Therefore, we can conclude that one shouldn't fit a deep learning architecture in to any encountered data. The volume and variety of the data obtained indicate which algorithm to apply. Sometimes small data works better with traditional machine learning algorithms rather than deep neural networks.

Deep learning problem statements and algorithms can be further segregated into four different segments based on their area of research and application:

General deep learning: Densely-connected layers or fully-connected networks

Sequence models: Recurrent neural networks, Long Short Term Memory Networks, Gated Recurrent Units, and so on

Spatial data models (images, for example): Convolutional neural networks, Generative Adversarial Networks

Others: Unsupervised learning, reinforcement learning, sparse encoding, and so on

Presently, the industry is mostly driven by the first three segments, but the future of Artificial Intelligence rests on the advancements in the fourth segment. Walking down the journey of advancements in machine learning, we can see that until now, these learning models were giving real numbers as output, for example, movie reviews (sentiment score) and image classification (class object). But now, as well as, other type of outputs are being generated, for example, image captioning (input: image, output: text), machine translation (input: text, output: text), and speech recognition (input: audio, output: text).

Human-level performance is necessary and being commonly applied in deep learning. Human-level accuracy becomes constant after some time converging to the highest possible point. This point is called theOptimal Error Rate (also known as theBayes Error Rate, that is, the lowest possible error rate for any classifier of a random outcome).

The reason behind this is that a lot of problems have a theoretical limit in performance owing to the noise in the data. Therefore, human-level accuracy is a good approach to improving your models by doing error analysis. This is done by incorporating human-level error, training set error, and validation set error to estimate bias variance effects, that is, the underfitting and overfitting conditions.

The scale of data, type of algorithm, and performance metrics are a set of approaches that help us to benchmark the level of improvements with respect to different machine learning algorithms. Thereby, governing the crucial decision of whether to invest in deep learning or go with the traditional machine learning approaches.

A basic perceptron with some input features (three, here in the following diagram) looks as follows:

The preceding diagram sets the basic approach of what a neural network looks like if we have input in the first layer and output in the next. Let's try to interpret it a bit. Here:

X1, X2, and X3 are input feature variables, that is, the dimension of input here is 3 (considering there's no bias variable).

W1, W2, and W3 are the corresponding weights associated with feature variables. When we talk about the training of neural networks, we mean to say the training of weights. Thus, these form the parameters of our small neural network.

The function in the output layer is an activation function applied over the aggregation of the information received from the previous layer. This function creates a representation state that corresponds to the actual output. The series of processes from the input layer to the output layer resulting into a predicted output is called forward propagation.

The error value between the output from the activation function and actual output is minimized through multiple iterations.

Minimization of the error only happens if we change the value of the weights (going from the output layer toward the input layer) in the direction that can minimize our error function. This process is termed backpropagation, as we are moving in the opposite direction.

Now, keeping these basics in mind, let's go into demystifying the neural networks further using logistic regression as a neural network and try to create a neural network with one hidden layer.

Activation functions for deep learning

Activation functions are the integral units of artificial neural networks. They decide whether a particular neuron is activated or not, that is, whether the information received by the neuron is relevant or not. The activation function performs nonlinear transformation on the receiving signal (data).

We will discuss some of the popular activation functions in the following sections.

The sigmoid function

Sigmoid is a smooth and continuously differentiable function. It results in nonlinear output. The sigmoid function is represented here:

Please, look at the observations in the following graph of the sigmoid function. The function ranges from 0 to 1. Observing the curve of the function, we see that the gradient is very high when x values between -3 and 3, but becomes flat beyond that. Thus, we can say that small changes in x near these points will bring large changes in the value of the sigmoid function. Therefore, the function goals in pushing the values of the sigmoid function towards the extremes.

Therefore, it's being used in classification problems:

Looking at the gradient of the following sigmoid function, we observe a smooth curve dependent on x. Since the gradient curve is continuous, it's easy to backpropagate the error and update the parameters, that is, and :

Sigmoids are widely used but its disadvantage is that the function goes flat beyond +3 and -3. Thus, whenever the function falls in that region, the gradients tends to approach zero and the learning of our neural network comes to a halt.

Since the sigmoid function outputs values from 0 to 1, that is, all positive, it's non symmetrical around the origin and all output signals are positive, that is, of the same sign. To tackle this, the sigmoid function has been scaled to the tanh function, which we will study next. Moreover, since the gradient results in a very small value, it's susceptible to the vanishing gradient problem (which we will discuss later in this chapter).

The tanh function

Tanh is a continuous function symmetric around the origin; it ranges from -1 to 1. The tanh function is represented as follows:

Thus the output signals will be both positive and negative thereby, adding to the segregation of the signals around the origin. As mentioned earlier, it is continuous and also non linear plus differentiable at all points. We can observe these properties in the graph of the tanh function in the following diagram. Though symmetrical, it becomes flat beyond -2 and 2:

Now looking at the gradient curve of the following tanh function, we observe it being steeper than the sigmoid function. The tanh function also has the vanishing gradient problem:

The softmax function

The softmax function is mainly used to handle classification problems and preferably used in the output layer, outputting the probabilities of the output classes. As seen earlier, while solving the binary logistic regression, we witnessed that the sigmoid function was able to handle only two classes. In order to handle multi-class we need a function that can generate values for all the classes and those values follow the rules of probability. This objective is fulfilled by the softmax function, which shrinks the outputs for each class between 0 and 1 and divides them by the sum of the outputs for all the classes:

For examples, , where x refers to four classes.

Then, the softmax function will gives results (rounded to three decimal places) as:

Thus, we see the probabilities of all the classes. Since the output of every classifier demands probabilistic values for all the classes, the softmax function becomes the best candidate for the outer layer activation function of the classifier.

How to choose the right activation function

The activation function is decided depending upon the objective of the problem statement and the concerned properties. Some of the inferences are as follows:

Sigmoid functions work very well in the case of shallow networks and binary classifiers. Deeper networks may lead to vanishing gradients.

The ReLU function is the most widely used, and try using Leaky ReLU to avoid the case of dead neurons. Thus, start with ReLU, then move to another activation function if ReLU doesn't provide good results.

Use softmax in the outer layer for the multi-class classification.

Avoid using ReLU in the outer layer.

Objective

The objective of any supervised classification learning algorithm is to predict the correct class with higher probability. Therefore, for each given , we have to calculate the predicted output, that is, the probability . Therefore, .

Referring to binary logistic regression in the preceding diagram:

Predicted output, that is, . Here, the sigmoid function shrinks the value of between 0 and 1.

This means, when , the sigmoid function of this, that is .

When , the sigmoid function of this, that is, .

Once we have calculated , that is, the predicted output, we are done with our forward propagation task. Now, we will calculate the error value using the cost functionand try to backpropagate to minimize our error value by changing the values of our parameters, W and b, throughgradient descent.