Hands-On Deep Learning Algorithms with Python - Sudharsan Ravichandiran - E-Book

Hands-On Deep Learning Algorithms with Python E-Book

Sudharsan Ravichandiran

0,0
32,36 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Understand basic to advanced deep learning algorithms, the mathematical principles behind them, and their practical applications.




Key Features



  • Get up-to-speed with building your own neural networks from scratch


  • Gain insights into the mathematical principles behind deep learning algorithms


  • Implement popular deep learning algorithms such as CNNs, RNNs, and more using TensorFlow



Book Description



Deep learning is one of the most popular domains in the AI space, allowing you to develop multi-layered models of varying complexities.







This book introduces you to popular deep learning algorithms—from basic to advanced—and shows you how to implement them from scratch using TensorFlow. Throughout the book, you will gain insights into each algorithm, the mathematical principles behind it, and how to implement it in the best possible manner. The book starts by explaining how you can build your own neural networks, followed by introducing you to TensorFlow, the powerful Python-based library for machine learning and deep learning. Moving on, you will get up to speed with gradient descent variants, such as NAG, AMSGrad, AdaDelta, Adam, and Nadam. The book will then provide you with insights into RNNs and LSTM and how to generate song lyrics with RNN. Next, you will master the math for convolutional and capsule networks, widely used for image recognition tasks. Then you learn how machines understand the semantics of words and documents using CBOW, skip-gram, and PV-DM. Afterward, you will explore various GANs, including InfoGAN and LSGAN, and autoencoders, such as contractive autoencoders and VAE.







By the end of this book, you will be equipped with all the skills you need to implement deep learning in your own projects.




What you will learn



  • Implement basic-to-advanced deep learning algorithms


  • Master the mathematics behind deep learning algorithms


  • Become familiar with gradient descent and its variants, such as AMSGrad, AdaDelta, Adam, and Nadam


  • Implement recurrent networks, such as RNN, LSTM, GRU, and seq2seq models


  • Understand how machines interpret images using CNN and capsule networks


  • Implement different types of generative adversarial network, such as CGAN, CycleGAN, and StackGAN


  • Explore various types of autoencoder, such as Sparse autoencoders, DAE, CAE, and VAE



Who this book is for



If you are a machine learning engineer, data scientist, AI developer, or simply want to focus on neural networks and deep learning, this book is for you. Those who are completely new to deep learning, but have some experience in machine learning and Python programming, will also find the book very helpful.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 477

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Hands-On Deep Learning Algorithms with Python

 

Master deep learning algorithms with extensive math by implementing them using TensorFlow

 

 

 

 

 

 

 

 

 

 

 

 

 

Sudharsan Ravichandiran

 

 

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Hands-On Deep Learning Algorithms with Python

Copyright © 2019 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Pravin DhandreAcquisition Editor: Devika BattikeContent Development Editor: Unnati GuhaSenior Editor: Martin WhittemoreTechnical Editor: Naveen SharmaCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Manju ArasanGraphics: Jisha ChirayilProduction Designer: Shraddha Falebhai

First published: July 2019

Production reference: 1240719

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78934-415-8

www.packtpub.com

To my adorable mom, Kasthuri, and to my beloved dad, Ravichandiran.
                                                                                                                                                          - Sudharsan Ravichandiran
 

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Fully searchable for easy access to vital information

Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. 

Contributors

About the author

Sudharsan Ravichandiran is a data scientist, researcher, artificial intelligence enthusiast, and YouTuber (search for Sudharsan reinforcement learning). He completed his bachelor's in information technology at Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning, which includes natural language processing and computer vision. He is an open source contributor and loves answering questions on Stack Overflow. He also authored a best-seller, Hands-On Reinforcement Learning with Python, published by Packt Publishing.

I would like to thank my most amazing parents and my brother, Karthikeyan, for inspiring and motivating me. I am forever grateful to my Sœur who always has my back. I can't thank enough my editors, Unnati and Naveen for their hard work and dedication. Without their support, it would have been impossible to complete this book.

 

 

 

About the reviewers

 

 

Sujit S Ahirrao is a computer vision and machine learning researcher and software developer who's mostly experienced in image processing and deep learning. He graduated in electronics and telecommunication from University of Pune. He made his way into the field of artificial intelligence through start-ups and has been a part of in-house R&D teams at well-established firms. He pursues his interest in contributing to education, healthcare, and scientific research communities with his growing skills and experience.

 

 

 

Bharath Kumar Varma currently works as a lead data scientist at an Indian tech start-up called MTW Labs, with clients in India and North America. His primary areas of interest are deep learning, NLP, and computer vision. He is a seasoned architect focusing on machine learning projects, vision and text analytics solutions, and is an active member of the start-up ecosystem. He holds a M.Tech degree from IIT Hyderabad, with a specialization in data science, and is certified in various other technological and banking-related certifications. Aside from his work, he actively participates in teaching and mentoring data science enthusiasts and contributes to the community by networking and working with fellow enthusiasts in groups.

 

 

 

Doug Ortiz is an experienced enterprise cloud, big data, data analytics, and solutions architect who has architectured, designed, developed, re-engineered, and integrated enterprise solutions. His other areas of expertise include Amazon Web Services, Azure, Google Cloud, business intelligence, Hadoop, Spark, NoSQL databases, and SharePoint. He is also the founder of Illustris, LLC.

 

 

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Hands-On Deep Learning Algorithms with Python

Dedication

About Packt

Why subscribe?

Contributors

About the author

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Getting Started with Deep Learning

Introduction to Deep Learning

What is deep learning?

Biological and artificial neurons

ANN and its layers

Input layer

Hidden layer

Output layer

Exploring activation functions

The sigmoid function

The tanh function

The Rectified Linear Unit function

The leaky ReLU function

The Exponential linear unit function

The Swish function

The softmax function

Forward propagation in ANN

How does ANN learn?

Debugging gradient descent with gradient checking

Putting it all together

Building a neural network from scratch

Summary

Questions

Further reading

Getting to Know TensorFlow

What is TensorFlow?

Understanding computational graphs and sessions

Sessions

Variables, constants, and placeholders

Variables

Constants

Placeholders and feed dictionaries

Introducing TensorBoard

Creating a name scope

Handwritten digit classification using TensorFlow

Importing the required libraries

Loading the dataset

Defining the number of neurons in each layer

Defining placeholders

Forward propagation

Computing loss and backpropagation

Computing accuracy

Creating summary

Training the model

Visualizing graphs in TensorBoard

Introducing eager execution

Math operations in TensorFlow

TensorFlow 2.0 and Keras

Bonjour Keras

Defining the model

Defining a sequential model

Defining a functional model

Compiling the model

Training the model

Evaluating the model

MNIST digit classification using TensorFlow 2.0

Should we use Keras or TensorFlow?

Summary

Questions

Further reading

Section 2: Fundamental Deep Learning Algorithms

Gradient Descent and Its Variants

Demystifying gradient descent

Performing gradient descent in regression

Importing the libraries

Preparing the dataset

Defining the loss function

Computing the gradients of the loss function

Updating the model parameters

Gradient descent versus stochastic gradient descent

Momentum-based gradient descent

Gradient descent with momentum

Nesterov accelerated gradient

Adaptive methods of gradient descent

Setting a learning rate adaptively using Adagrad

Doing away with the learning rate using Adadelta

Overcoming the limitations of Adagrad using RMSProp

Adaptive moment estimation

Adamax – Adam based on infinity-norm

Adaptive moment estimation with AMSGrad

Nadam – adding NAG to ADAM

Summary

Questions

Further reading

Generating Song Lyrics Using RNN

Introducing RNNs

The difference between feedforward networks and RNNs

Forward propagation in RNNs

Backpropagating through time

Gradients with respect to the hidden to output weight, V

Gradients with respect to hidden to hidden layer weights, W

Gradients with respect to input to the hidden layer weight, U

Vanishing and exploding gradients problem

Gradient clipping

Generating song lyrics using RNNs

Implementing in TensorFlow

Data preparation

Defining the network parameters

Defining placeholders

Defining forward propagation

Defining BPTT

Start generating songs

Different types of RNN architectures

One-to-one architecture

One-to-many architecture

Many-to-one architecture

Many-to-many architecture

Summary

Questions

Further reading

Improvements to the RNN

LSTM to the rescue

Understanding the LSTM cell

Forget gate

Input gate

Output gate

Updating the cell state

Updating hidden state

Forward propagation in LSTM

Backpropagation in LSTM

Gradients with respect to gates

Gradients with respect to weights

Gradients with respect to V

Gradients with respect to W

Gradients with respect to U

Predicting Bitcoin prices using LSTM model

Data preparation

Defining the parameters

Define the LSTM cell

Defining forward propagation

Defining backpropagation

Training the LSTM model

Making predictions using the LSTM model

Gated recurrent units

Understanding the GRU cell

Update gate

Reset gate

Updating hidden state

Forward propagation in a GRU cell

Backpropagation in a GRU cell

Gradient with respect to gates

Gradients with respect to weights

Gradients with respect to V

Gradients with respect to W

Gradients with respect to U

Implementing a GRU cell in TensorFlow

Defining the weights

Defining forward propagation

Bidirectional RNN

Going deep with deep RNN

Language translation using the seq2seq model

Encoder

Decoder

Attention is all we need

Summary

Questions

Further reading

Demystifying Convolutional Networks

What are CNNs?

Convolutional layers

Strides

Padding

Pooling layers

Fully connected layers

The architecture of CNNs

The math behind CNNs

Forward propagation

Backward propagation

Implementing a CNN in TensorFlow

Defining helper functions

Defining the convolutional network

Computing loss

Starting the training

Visualizing extracted features

CNN architectures

LeNet architecture

Understanding AlexNet

Architecture of VGGNet

GoogleNet

Inception v1

Inception v2 and v3

Capsule networks

Understanding Capsule networks

Computing prediction vectors

Coupling coefficients

Squashing function

Dynamic routing algorithm

Architecture of the Capsule network

The loss function

Margin loss

Reconstruction loss

Building Capsule networks in TensorFlow

Defining the squash function

Defining a dynamic routing algorithm

Computing primary and digit capsules

Masking the digit capsule

Defining the decoder

Computing the accuracy of the model

Calculating loss

Margin loss

Reconstruction loss

Total loss

Training the Capsule network

Summary

Questions

Further reading

Learning Text Representations

Understanding the word2vec model

Understanding the CBOW model

CBOW with a single context word

Forward propagation

Backward propagation

CBOW with multiple context words

Understanding skip-gram model

Forward propagation in skip-gram

Backward propagation

Various training strategies

Hierarchical softmax

Negative sampling

Subsampling frequent words

Building the word2vec model using gensim

Loading the dataset

Preprocessing and preparing the dataset

Building the model

Evaluating the embeddings

Visualizing word embeddings in TensorBoard

Doc2vec

Paragraph Vector – Distributed Memory model

Paragraph Vector – Distributed Bag of Words model

Finding similar documents using doc2vec

Understanding skip-thoughts algorithm

Quick-thoughts for sentence embeddings

Summary

Questions

Further reading

Section 3: Advanced Deep Learning Algorithms

Generating Images Using GANs

Differences between discriminative and generative models

Say hello to GANs!

Breaking down the generator

Breaking down the discriminator

How do they learn though?

Architecture of a GAN

Demystifying the loss function

Discriminator loss

First term

Second term

Final term

Generator loss

Total loss

Heuristic loss

Generating images using GANs in TensorFlow

Reading the dataset

Defining the generator

Defining the discriminator

Defining the input placeholders

Starting the GAN!

Computing the loss function

Discriminator loss

Generator loss

Optimizing the loss

Starting the training

Generating handwritten digits

DCGAN – Adding convolution to a GAN

Deconvolutional generator

Convolutional discriminator

Implementing DCGAN to generate CIFAR images

Exploring the dataset

Defining the discriminator

Defining the generator

Defining the inputs

Starting the DCGAN

Computing the loss function

Discriminator loss

Generator loss

Optimizing the loss

Train the DCGAN

Least squares GAN

Loss function

LSGAN in TensorFlow

Discriminator loss

Generator loss

GANs with Wasserstein distance

Are we minimizing JS divergence in GANs?

What is the Wasserstein distance?

Demystifying the k-Lipschitz function

The loss function of WGAN

WGAN in TensorFlow

Summary

Questions

Further reading

Learning More about GANs

Conditional GANs

Loss function of CGAN

Generating specific handwritten digits using CGAN

Defining the generator

Defining discriminator

Start the GAN!

Computing the loss function

Discriminator loss

Generator loss

Optimizing the loss

Start training the CGAN

Generate the handwritten digit, 7

Understanding InfoGAN

Mutual information

Architecture of the InfoGAN

Constructing an InfoGAN in TensorFlow

Defining generator

Defining the discriminator

Define the input placeholders

Start the GAN

Computing loss function

Discriminator loss

Generator loss

Mutual information

Optimizing the loss

Beginning training

Generating handwritten digits

Translating images using a CycleGAN

Role of generators

Role of discriminators

Loss function

Cycle consistency loss

Converting photos to paintings using a CycleGAN

StackGAN

The architecture of StackGANs

Conditioning augmentation

Stage I

Generator

Discriminator

Stage II

Generator

Discriminator

Summary

Questions

Further reading

Reconstructing Inputs Using Autoencoders

What is an autoencoder?

Understanding the architecture of autoencoders

Reconstructing the MNIST images using an autoencoder

Preparing the dataset

Defining the encoder

Defining the decoder

Building the model

Reconstructing images

Plotting reconstructed images

Autoencoders with convolutions

Building a convolutional autoencoder

Defining the encoder

Defining the decoder

Building the model

Reconstructing the images

Exploring denoising autoencoders

Denoising images using DAE

Understanding sparse autoencoders

Building the sparse autoencoder

Defining the sparse regularizer

Learning to use contractive autoencoders

Implementing the contractive autoencoder

Defining the contractive loss

Dissecting variational autoencoders

Variational inference

The loss function

Reparameterization trick

Generating images using VAE

Preparing the dataset

Defining the encoder

Defining the sampling operation

Defining the decoder

Building the model

Defining the generator

Plotting generated images

Summary

Questions

Further reading

Exploring Few-Shot Learning Algorithms

What is few-shot learning?

Siamese networks

Architecture of siamese networks

Prototypical networks

Relation networks

Matching networks

Support set embedding function

Query set embedding function

The architecture of matching networks

Summary

Questions

Further reading

Assessments

Chapter 1 - Introduction to Deep Learning

Chapter 2 - Getting to Know TensorFlow

Chapter 3 - Gradient Descent and Its Variants

Chapter 4 - Generating Song Lyrics Using an RNN 

Chapter 5 - Improvements to the RNN

Chapter 6 - Demystifying Convolutional Networks

Chapter 7 - Learning Text Representations

Chapter 8 - Generating Images Using GANs

Chapter 9 - Learning More about GANs

Chapter 10 - Reconstructing Inputs Using Autoencoders

Chapter 11 - Exploring Few-Shot Learning Algorithms

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Deep learning is one of the most popular domains in the artificial intelligence (AI) space, which allows you to develop multi-layered models of varying complexities. This book introduces you to popular deep learning algorithms—from basic to advanced—and shows you how to implement them from scratch using TensorFlow. Throughout the book, you’ll gain insights into each algorithm, the mathematical principles behind it, and how to implement them in the best possible manner.

The book starts by explaining how you can build your own neural network, followed by introducing you to TensorFlow; the powerful Python-based library for machine learning and deep learning. Next, you will get up to speed with gradient descent variants, such as NAG, AMSGrad, AdaDelta, Adam, Nadam, and more. The book will then provide you with insights into the working of Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) and how to generate song lyrics with RNN. Next, you will master the math for convolutional and Capsule networks, widely used for image recognition tasks. Towards the concluding chapters, you will learn how machines understand the semantics of words and documents using CBOW, skip-gram, and PV-DM. Then you will explore various GANs such as InfoGAN and LSGAN and also autoencoders such as contractive autoencoders, VAE, and so on.

By the end of this book, you will be equipped with the skills needed to implement deep learning in your own projects.

Who this book is for

If you are a machine learning engineer, data scientist, AI developer, or anyone who wants to focus on neural networks and deep learning, this book is for you. Those who are completely new to deep learning, but have some experience in machine learning and Python programming, will also find this book helpful.

What this book covers

Chapter 1, Introduction to Deep Learning, explains the fundamentals of deep learning and helps us to understand what artificial neural networks are and how they learn. We will also learn to build our first artificial neural network from scratch.

Chapter 2, Getting to Know TensorFlow, helps us to understand one of the most powerful and popular deep learning libraries called TensorFlow. You will understand several important functionalities of TensorFlow and how to build neural networks using TensorFlow to perform handwritten digits classification.

Chapter 3, Gradient Descent and Its Variants, provides an in-depth understanding of gradient descent algorithm. We will explore several variants of gradient descent algorithm such as SGD, Adagrad, ADAM, Adadelta, Nadam, and many more and learn how to implement them from scratch.

Chapter 4, Generating Song Lyrics Using RNN, describes how an RNN is used to model sequential datasets and how it remembers the previous input. We will begin by getting a basic understanding of RNN then we will deep dive into its math. Next, we will learn how to implement RNN in TensorFlow for generating song lyrics.

Chapter 5, Improvements to the RNN, begins by exploring LSTM and how exactly LSTM overcomes the shortcomings of RNN. Later, we will learn about GRU cell and how bidirectional RNN and deep RNN work. At the end of the chapter, we will learn how to perform language translation using seq2seq model.

Chapter 6, Demystifying Convolutional Networks, helps us to master how convolutional neural networks work. We will explore how forward and backpropagation of CNNs work mathematically. We will also learn about various architectures of CNN and Capsule networks and implement them in TensorFlow.

Chapter 7, Learning Text Representations, covers the state-of-the-art text representation learning algorithm known as word2vec. We will explore how different types of word2vec models such as CBOW and skip-gram work mathematically. We will also learn how to visualize the word embeddings using TensorBoard. Later we will learn about doc2vec, skip-thoughts and quick-thoughts models for learning the sentence representations.

Chapter 8, Generating Images Using GANs, helps us to understand one of the most popular generative algorithms called GAN. We will learn how to implement GAN in TensorFlow to generate images. We will also explore different types of GANs such as LSGAN and WGAN.

Chapter 9, Learning More about GANs, uncovers various interesting different types of GANs. First, we will learn about CGAN, which conditions the generator and discriminator. Then we see how to implement InfoGAN in TensorFlow. Moving on, we will learn to convert photos to paintings using CycleGAN and how to convert text descriptions to photos using StackGANs.

Chapter 10, Reconstructing Inputs Using Autoencoders, describes how autoencoders learn to reconstruct the input. We will explore and learn to implement different types of autoencoders such as convolutional autoencoders, sparse autoencoders, contractive autoencoders, variational autoencoders, and more in TensorFlow. 

Chapter 11, Exploring Few-Shot Learning Algorithms, describes how to build models to learn from a few data points. We will learn what is few-shot learning and explore popular few-shot learning algorithms such as siamese, prototypical, relation, and matching networks.

To get the most out of this book

Those who are completely new to deep learning, but who have some experience in machine learning and Python programming, will find this book helpful.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Deep-Learning-Algorithms-with-Python. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789344158_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1: Getting Started with Deep Learning

In this section, we will get ourselves familiarized with deep learning and will understand the fundamental deep learning concepts. We will also learn about powerful deep learning framework called TensorFlow, and set TensorFlow up for all of our future deep learning tasks.

The following chapters are included in this section:

Chapter 1

,

Introduction to Deep Learning

Chapter 2

,

Getting to Know TensorFlow

Introduction to Deep Learning

Deep learning is a subset of machine learning inspired by the neural networks in the human brain. It has been around for a decade, but thereason it is so popular right now is due to the computational advancements andavailability of the huge volume of data. With a huge volume of data, deep learning algorithms outperform classic machine learning. It has already been transfiguring and extensively used in several interdisciplinary scientific fields such as computer vision, natural language processing (NLP), speech recognition, and many others.

In this chapter, we will learn about the following topics:

Fundamental concepts of deep learning

Biological and artificial neurons

Artificial neural network and its layers

Activation functions

Forward and backward propagation in ANN

Gradient checking algorithm

Building an artificial neural network from scratch

What is deep learning?

Deep learning is just a modern name for artificial neural networks with many layers. What is deep in deep learning though? It is basically due to the structure of the artificial neural network (ANN). ANN consists of some n number of layers to perform any computation. We can build an ANN with several layers where each layer is responsible for learning the intricate patterns in the data. Due to the computational advancements, we can build a network even with 100s or 1000s of layers deep. Since the ANN uses deep layers to perform learning we call it as deep learning and when ANN uses deep layers to learn we call it as a deep network. We have learned that deep learning is a subset of machine learning. How does deep learning differ from machine learning? What makes deep learning so special and popular?

The success of machine learning lies in the right set of features. Feature engineering plays a crucial role in machine learning. If we handcraft the right set of features to predict a certain outcome, then the machine learning algorithms can perform well, but finding and engineering the right set of features is not an easy task.

With deep learning, we don't have to handcraft such features. Since deep ANNs employ several layers, it learns the complex intrinsic features and multi-level abstract representation of data by itself. Let's explore this a bit with an analogy.

Let's suppose we want to perform an image classification task. Say, we are learning to recognize whether an image contains a dog or not. With machine learning, we need to handcraft features that help the model to understand whether the image contains a dog. We send these handcrafted features as inputs to machine learning algorithms which then learn a mapping between the features and the label (dog). But extracting features from an image is a tedious task. With deep learning, we just need to feed in a bunch of images to the deep neural networks, and it will automatically act as a feature extractor by learning the right set of features. As we have learned, ANN uses multiple layers; in the first layer, it will learn the basic features of the image that characterize the dog, say, the body structure of the dog, and, in the succeeding layers, it will learn the complex features. Once it learns the right set of features, it will look for the presence of such features in the image. If those features are present then it says that the given image contains a dog. Thus, unlike machine learning, with DL, we don't have to manually engineer the features, instead, the network will itself learns the correct set of features required for the task.

Due to this interesting aspect of deep learning, it is substantially used in unstructured datasets where extracting features are difficult, such as speech recognition, text classification, and many more. When we have a fair amount of huge datasets, deep learning algorithms are good at extracting features and mapping the extracted features to their labels. Having said that, deep learning is not just throwing a bunch of data points to a deep network and getting results. It's not that simple either. We would have numerous hyperparameters that act as a tuning knob to obtain better results which we will explore in the upcoming sections.

Although deep learning performs better than conventional machine learning models, it is not recommended to use DL for smaller datasets. When we don't have enough data points or the data is very simple, then the deep learning algorithms can easily overfit to the training dataset and fail to generalize well on the unseen dataset. Thus, we should apply deep learning only when we have a significant amount of data points.

The applications of deep learning are numerous and almost everywhere. Some of the interesting applications include automatically generating captions to the image, adding sound to the silent movies, converting black-and-white images to colored images, generating text, and many more. Google's language translate, Netflix, Amazon, and Spotify's recommendations engines, and self-driving cars are some of the applications powered by deep learning. There is no doubt that deep learning is a disruptive technology and has achieved tremendous technological advancement in the past few years.

In this book, we will learn from the basic deep learning algorithms as to the state of the algorithms by building some of the interesting applications of deep learning from scratch, which includes image recognition, generating song lyrics, predicting bitcoin prices, generating realistic artificial images, converting photographs to paintings, and many more. Excited already? Let's get started!

Biological and artificial neurons

Before going ahead, first, we will explore what are neurons and how neurons in our brain actually work, and then we will learn about artificial neurons.

A neuron can be defined as the basic computational unit of the human brain. Neurons are the fundamental units of our brain and nervous system.Our brain encompasses approximately 100 billion neurons.Each and every neuron is connected to one another through a structure called a synapse, which is accountable for receiving input from the external environment, sensory organs for sending motor instructions to our muscles, and for performing other activities.

A neuron can also receive inputs from the other neurons through a branchlike structure called a dendrite.These inputs are strengthened or weakened; that is, they are weighted according to their importance and then they are summed together in the cellbody called the soma. From the cell body, these summed inputs are processed and move through the axons and are sent to the other neurons.

The basic single biological neuron is shown in the following diagram:

Now, let's see how artificial neurons work. Let's suppose we have three inputs , , and , to predict output . These inputs are multiplied by weights , , and and are summed together as follows:

But why are we multiplying these inputs by weights? Because all of the inputs are not equally important in calculating the output . Let's say that is more important in calculating the output compared to the other two inputs. Then, we assign a higher value to than the other two weights. So, upon multiplying weights with inputs, will have a higher value than the other two inputs. In simple terms, weights are used for strengthening the inputs. After multiplying inputs with the weights, we sum them together and we add a value called bias, :

If you look at the preceding equation closely, it may look familiar? Doesn't look like the equation of linear regression? Isn't it just the equation of a straight line? We know that the equation of a straight line is given as:

Here m is the weights (coefficients), x is the input, and b is the bias (intercept).

Well, yes. Then, what is the difference between neurons and linear regression? In neurons, we introduce non-linearity to the result, , by applying a function called the activation or transfer function. Thus, our output becomes:

A single artificial neuron is shown in the following diagram:

So, a neuron takes the input, x, multiples it by weights, w, and adds bias, b, forms , and then we apply the activation function on and get the output, .

ANN and its layers

While neurons are really cool, we cannot just use a single neuron to perform complex tasks. This is the reason our brain has billions of neurons, stacked in layers, forming a network. Similarly, artificial neurons are arranged in layers. Each and every layer will be connected in such a way that information is passed from one layer to another.

A typical ANN consists of the following layers:

Input layer

Hidden layer

Output layer

Each layer has a collection of neurons, and the neurons in one layer interact with all the neurons in the other layers. However, neurons in the same layer will not interact with one another. This is simply because neurons from the adjacent layers have connections or edges between them; however, neurons in the same layer do not have any connections. We use the term nodes or units to represent the neurons in the artificial neural network.

A typical ANN is shown in the following diagram:

Input layer

The input layer is where we feed input to the network. The number of neurons in the input layer is the number of inputs we feed to the network. Each input will have some influence on predicting the output. However, no computation is performed in the input layer; it is just used for passing information from the outside world to the network.

Hidden layer

Any layer between the input layer and the output layer is called a hidden layer. It processes the input received from the input layer. The hidden layer is responsible for deriving complex relationships between input and output. That is, the hidden layer identifies the pattern in the dataset. It is majorly responsible for learning the data representation and for extracting the features.

There can be any number of hidden layers; however, we have to choose a number of hidden layers according to our use case. For a very simple problem, we can just use one hidden layer, but while performing complex tasks such as image recognition, we use many hidden layers, where each layer is responsible for extracting important features. The network is called a deep neural network when we have many hidden layers.

Output layer

After processing the input, the hidden layer sends its result to the output layer. As the name suggests, the output layer emits the output. The number of neurons in the output layer is based on the type of problem we want our network to solve.

If it is a binary classification, then the number of neurons in the output layer is one that tells us which class the input belongs to. If it is a multi-class classification say, with five classes, and if we want to get the probability of each class as an output, then the number of neurons in the output layer is five, each emitting the probability. If it is a regression problem, then we have one neuron in the output layer.

The sigmoid function

The sigmoid function is one of the most commonly used activation functions. It scales the value between 0 and 1. The sigmoid function can be defined as follows:

It is an S-shaped curve shown as follows:

It is differentiable, meaning that we can find the slope of the curve at any two points. It is monotonic, which implies it is either entirely non-increasing or non-decreasing. The sigmoid function is also known as a logistic function. As we know that probability lies between 0 and 1 and since the sigmoid function squashes the value between 0 and 1,it is used for predicting the probability of output.

The sigmoid function can be defined in Python as follows:

def sigmoid(x): return 1/ (1+np.exp(-x))

The Rectified Linear Unit function

The Rectified Linear Unit (ReLU) function is another one of the most commonly used activation functions. It outputs a value from o to infinity. It is basically a piecewise function and can be expressed as follows:

That is, returns zero when the value of x is less than zero and returns x when the value of x is greater than or equal to zero. It can also be expressed as follows:

The ReLU function is shown in the following figure:

As we can see in the preceding diagram, when we feed any negative input to the ReLU function, it converts it to zero. The snag for being zero for all negative values is a problem called dying ReLU, and a neuron is said to be dead if it always outputs zero. A ReLU function can be implemented as follows:

def ReLU(x): if x<0: return 0 else: return x

The leaky ReLU function

Leaky ReLU is a variant of the ReLU function that solves the dying ReLU problem. Instead of converting every negative input to zero, it has a small slope for a negative value as shown:

Leaky ReLU can be expressed as follows:

The value of is typically set to 0.01. The leaky ReLU function is implemented as follows:

def leakyReLU(x,alpha=0.01): if x<0: return (alpha*x) else: return x

Instead of setting some default values to , we can send them as a parameter to a neural network and make the network learn the optimal value of. Such an activation function can be termed as a Parametric ReLU function. We can alsoset the value ofto some random valueand it is called as Randomized ReLU function.

The Exponential linear unit function

Exponential linear unit (ELU), like Leaky ReLU, has a small slope for negative values. But instead of having a straight line, it has a log curve, as shown in the following diagram:

It can be expressed as follows:

The ELU function is implemented in Python as follows:

def ELU(x,alpha=0.01): if x<0: return ((alpha*(np.exp(x)-1)) else: return x

The Swish function

The Swish function is a recently introduced activation function by Google. Unlike other activation functions, which are monotonic, Swish is a non-monotonic function, which means it is neither always non-increasing nor non-decreasing.It provides better performance than ReLU. It is simple and can be expressed as follows:

Here, is the sigmoid function. The Swish function is shown in the following diagram:

We can also reparametrize the Swish function and express it as follows:

When the value of  is 0, then we get the identity function .

It becomes a linear function and, when the value of  tends to infinity, then becomes , which is basically the ReLU function multiplied by some constant value. So, the value of acts as a good interpolation between a linear and a nonlinear function. The swish function can be implemented as shown:

def swish(x,beta): return 2*x*sigmoid(beta*x)

The softmax function

The softmax function is basically the generalization of the sigmoid function. It is usually applied to the final layer of the network and while performing multi-class classification tasks. It gives the probabilities of each class for being output and thus, the sum of softmax values will always equal 1.

It can be represented as follows:

As shown in the following diagram, the softmax function converts their inputs to probabilities:

The softmax function can be implemented in Python as follows:

def softmax(x): return np.exp(x) / np.exp(x).sum(axis=0)