E-Book
31,19 €

Deep Learning for Computer Vision E-Book

Rajalingappaa Shanmugamani

0,0

31,19 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Fachliteratur
Sprache: Englisch

Beschreibung

Deep learning has shown its power in several application areas of Artificial Intelligence, especially in Computer Vision. Computer Vision is the science of understanding and manipulating images, and finds enormous applications in the areas of robotics, automation, and so on. This book will also show you, with practical examples, how to develop Computer Vision applications by leveraging the power of deep learning.

In this book, you will learn different techniques related to object classification, object detection, image segmentation, captioning, image generation, face analysis, and more. You will also explore their applications using popular Python libraries such as TensorFlow and Keras. This book will help you master state-of-the-art, deep learning algorithms and their implementation.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 258

Veröffentlichungsjahr: 2018

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Der größte Raubzug der Geschichte

Matthias Weik

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Die Jahrhundertlüge, die nur Insider kennen

Heiko Schrang

Organisation für Komplexität

Niels Pfläging

Radikal führen

Reinhard K. Sprenger

30 Minuten Sympathisch und souverän: So geht Vortragen!

Thomas Lorenz

BLACKOUT - Morgen ist es zu spät

Marc Elsberg

The Truth About Employee Engagement

Patrick M. Lencioni

Mensch und Wald

Carsten Wippermann

The Food Truck Handbook

David Weber

Die selbstbestimmte Geburt

Ina May Gaskin

Leseprobe

Deep Learning for Computer Vision

Expert techniques to train advanced neural networks using TensorFlow and Keras

Rajalingappaa Shanmugamani

BIRMINGHAM - MUMBAI

Deep Learning for Computer Vision

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Amey VarangaonkarAcquisition Editor: Aman SinghContent Development Editor: Varun SonyTechnical Editor: Dharmendra YadavCopy Editors: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Pratik ShirodkarGraphics: Tania DuttaProduction Coordinator: Shantanu Zagade

First published: January 2018

Production reference: 1220118

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78829-562-8

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Foreword

Deep learning is revolutionizing AI, and over the next several decades, it will change the world radically. AI powered by deep learning will be on par in scale with the industrial revolution. This, understandably, has created excitement and fear about the future. But the reality is that just like the industrial revolution and machinery, deep learning will improve industrial capacity and raise the standards of living dramatically for humankind. Rather than replace jobs, it will create many more jobs of a higher stand. This is why this book is so important and timely. Readers of this book will be introduced to deep learning for computer vision, its power, and many applications. This book will give readers a grounding in the fundamentals of an emerging industry that will grow exponentially over the next decade.

Rajalingappaa Shanmugamani is a great researcher whom I have worked with previously on several projects in computer vision. He was the lead engineer in designing and delivering a complex computer vision and deep learning system for fashion search that was deployed in the real world with great success. Among his strengths is his ability to take up state-of-the-art research in complex problems and apply them to real-world situations. He can also break down complex ideas and explain them in simple terms as is demonstrated in this book. Raja is a very ambitious person with great work ethics, and in this book, he has given a great overview of the current state of computer vision using deep learning, a task not many can do in today's industry. This book is a great achievement by Raja and I’m sure the reader will enjoy and benefit from it for many years to come.

Dr. Stephen Moore

Chief Technology Officer, EmotionReader, Singapore

Contributors

About the author

Rajalingappaa Shanmugamani is currently working as a Deep Learning Lead at SAP, Singapore. Previously, he has worked and consulted at various startups for developing computer vision products. He has a Masters from Indian Institute of Technology – Madras where his thesis was based on applications of computer vision in the manufacturing industry. He has published articles in peer-reviewed journals and conferences and applied for few patents in the area of machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.

I thank my spouse Ezhil, family and friends for their immense support. I thank all the teachers, colleagues, managers and mentors from whom I have learned a lot. I thank Jean Ooi for creating the graphics for the book.

About the reviewers

Nishanth Koganti received B.Tech in Electrical Engineering from Indian Institute of Technology Jodhpur, India in 2012, M.E and PhD in Information Science from Nara Institute of Science and Technology, Japan in 2014, 2017 respectively. He is currently a Postdoctoral researcher at the University of Tokyo, Japan. His research interests are in assistive robotics, motor-skills learning, and machine learning. His graduate research was on the development of a clothing assistance robot that helps elderly people to wear clothes.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Reviews

Getting Started

Understanding deep learning

Perceptron

Activation functions

Sigmoid

The hyperbolic tangent function

The Rectified Linear Unit (ReLU)

Artificial neural network (ANN)

One-hot encoding

Softmax

Cross-entropy

Dropout

Batch normalization

L1 and L2 regularization

Training neural networks

Backpropagation

Gradient descent

Stochastic gradient descent

Playing with TensorFlow playground

Convolutional neural network

Kernel

Max pooling

Recurrent neural networks (RNN)

Long short-term memory (LSTM)

Deep learning for computer vision

Classification

Detection or localization and segmentation

Similarity learning

Image captioning

Generative models

Video analysis

Development environment setup

Hardware and Operating Systems - OS

General Purpose - Graphics Processing Unit (GP-GPU)

Computer Unified Device Architecture - CUDA

CUDA Deep Neural Network - CUDNN

Installing software packages

Python

Open Computer Vision - OpenCV

The TensorFlow library

Installing TensorFlow

TensorFlow example to print Hello, TensorFlow

TensorFlow example for adding two numbers

TensorBoard

The TensorFlow Serving tool

The Keras library

Summary

Image Classification

Training the MNIST model in TensorFlow

The MNIST datasets

Loading the MNIST data

Building a perceptron

Defining placeholders for input data and targets

Defining the variables for a fully connected layer

Training the model with data

Building a multilayer convolutional network

Utilizing TensorBoard in deep learning

Training the MNIST model in Keras

Preparing the dataset

Building the model

Other popular image testing datasets

The CIFAR dataset

The Fashion-MNIST dataset

The ImageNet dataset and competition

The bigger deep learning models

The AlexNet model

The VGG-16 model

The Google Inception-V3 model

The Microsoft ResNet-50 model

The SqueezeNet model

Spatial transformer networks

The DenseNet model

Training a model for cats versus dogs

Preparing the data

Benchmarking with simple CNN

Augmenting the dataset

Augmentation techniques

Transfer learning or fine-tuning of a model

Training on bottleneck features

Fine-tuning several layers in deep learning

Developing real-world applications

Choosing the right model

Tackling the underfitting and overfitting scenarios

Gender and age detection from face

Fine-tuning apparel models

Brand safety

Summary

Image Retrieval

Understanding visual features

Visualizing activation of deep learning models

Embedding visualization

Guided backpropagation

The DeepDream

Adversarial examples

Model inference

Exporting a model

Serving the trained model

Content-based image retrieval

Building the retrieval pipeline

Extracting bottleneck features for an image

Computing similarity between query image and target database

Efficient retrieval

Matching faster using approximate nearest neighbour

Advantages of ANNOY

Autoencoders of raw images

Denoising using autoencoders

Summary

Object Detection

Detecting objects in an image

Exploring the datasets

ImageNet dataset

PASCAL VOC challenge

COCO object detection challenge

Evaluating datasets using metrics

Intersection over Union

The mean average precision

Localizing algorithms

Localizing objects using sliding windows

The scale-space concept

Training a fully connected layer as a convolution layer

Convolution implementation of sliding window

Thinking about localization as a regression problem

Applying regression to other problems

Combining regression with the sliding window

Detecting objects

Regions of the convolutional neural network (R-CNN)

Fast R-CNN

Faster R-CNN

Single shot multi-box detector

Object detection API

Installation and setup

Pre-trained models

Re-training object detection models

Data preparation for the Pet dataset

Object detection training pipeline

Training the model

Monitoring loss and accuracy using TensorBoard

Training a pedestrian detection for a self-driving car

The YOLO object detection algorithm

Summary

Semantic Segmentation

Predicting pixels

Diagnosing medical images

Understanding the earth from satellite imagery

Enabling robots to see

Datasets

Algorithms for semantic segmentation

The Fully Convolutional Network

The SegNet architecture

Upsampling the layers by pooling

Sampling the layers by convolution

Skipping connections for better training

Dilated convolutions

DeepLab

RefiNet

PSPnet

Large kernel matters

DeepLab v3

Ultra-nerve segmentation

Segmenting satellite images

Modeling FCN for segmentation

Segmenting instances

Summary

Similarity Learning

Algorithms for similarity learning

Siamese networks

Contrastive loss

FaceNet

Triplet loss

The DeepNet model

DeepRank

Visual recommendation systems

Human face analysis

Face detection

Face landmarks and attributes

The Multi-Task Facial Landmark (MTFL) dataset

The Kaggle keypoint dataset

The Multi-Attribute Facial Landmark (MAFL) dataset

Learning the facial key points

Face recognition

The labeled faces in the wild (LFW) dataset

The YouTube faces dataset

The CelebFaces Attributes dataset (CelebA)

CASIA web face database

The VGGFace2 dataset

Computing the similarity between faces

Finding the optimum threshold

Face clustering

Summary

Image Captioning

Understanding the problem and datasets

Understanding natural language processing for image captioning

Expressing words in vector form

Converting words to vectors

Training an embedding

Approaches for image captioning and related problems

Using a condition random field for linking image and text

Using RNN on CNN features to generate captions

Creating captions using image ranking

Retrieving captions from images and images from captions

Dense captioning

Using RNN for captioning

Using multimodal metric space

Using attention network for captioning

Knowing when to look

Implementing attention-based image captioning

Summary

Generative Models

Applications of generative models

Artistic style transfer

Predicting the next frame in a video

Super-resolution of images

Interactive image generation

Image to image translation

Text to image generation

Inpainting

Blending

Transforming attributes

Creating training data

Creating new animation characters

3D models from photos

Neural artistic style transfer

Content loss

Style loss using the Gram matrix

Style transfer

Generative Adversarial Networks

Vanilla GAN

Conditional GAN

Adversarial loss

Image translation

InfoGAN

Drawbacks of GAN

Visual dialogue model

Algorithm for VDM

Generator

Discriminator

Summary

Video Classification

Understanding and classifying videos

Exploring video classification datasets

UCF101

YouTube-8M

Other datasets

Splitting videos into frames

Approaches for classifying videos

Fusing parallel CNN for video classification

Classifying videos over long periods

Streaming two CNN's for action recognition

Using 3D convolution for temporal learning

Using trajectory for classification

Multi-modal fusion

Attending regions for classification

Extending image-based approaches to videos

Regressing the human pose

Tracking facial landmarks

Segmenting videos

Captioning videos

Generating videos

Summary

Deployment

Performance of models

Quantizing the models

MobileNets

Deployment in the cloud

AWS

Google Cloud Platform

Deployment of models in devices

Jetson TX2

Android

iPhone

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Deep Learning for Computer Vision is a book intended for readers who want to learn deep-learning-based computer vision techniques for various applications. This book will give the reader tools and techniques to develop computer-vision-based products. There are plenty of practical examples covered in the book to follow the theory.

Who this book is for

The reader wants to know how to apply deep learning to computer vision problems such as classification, detection, retrieval, segmentation, generation, captioning, and video classification. The reader also wants to understand how to achieve good accuracy under various constraints such as less data, imbalanced classes, and noise. Then the reader also wants to know how to deploy trained models on various platforms (AWS, Google Cloud, Raspberry Pi, and mobile phones). After completing this book, the reader should be able to develop code for problems of person detection, face recognition, product search, medical image segmentation, image generation, image captioning, video classification, and so on.

What this book covers

Chapter 1, Getting Started, introduces the basics of deep learning and makes the readers familiar with the vocabulary. The readers will install the software packages necessary to follow the rest of the chapters.

Chapter 2, Image Classification, talks about the image classification problem, which is labeling an image as a whole. The readers will learn about image classification techniques and train a deep learning model for pet classification. They will also learn methods to improve accuracy and dive deep into variously advanced architectures.

Chapter 3, Image Retrieval, covers deep features and image retrieval. The reader will learn about various methods of obtaining model visualization, visual features, inference using TensorFlow, and serving and using visual features for product retrieval.

Chapter 4, Object Detection, talks about detecting objects in images. The reader will learn about various techniques of object detection and apply them for pedestrian detection. The TensorFlow API for object detection will be utilized in this chapter.

Chapter 5, Semantic Segmentation, covers segmenting of images pixel-wise. The readers will earn about segmentation techniques and train a model for segmentation of medical images.

Chapter 6, Similarity Learning, talks about similarity learning. The readers will learn about similarity matching and how to train models for face recognition. A model to train facial landmark is illustrated.

Chapter 7, Image Captioning, is about generating or selecting captions for images. The readers will learn natural language processing techniques and how to generate captions for images using those techniques.

Chapter 8, Generative Models, talks about generating synthetic images for various purposes. The readers will learn what generative models are and use them for image generation applications, such as style transfer, training data, and so on.

Chapter 9, Video Classification, covers computer vision techniques for video data. The readers will understand the key differences between solving video versus image problems and implement video classification techniques.

Chapter 10, Deployment,talks about the deployment steps for deep learning models. The reader will learn how to deploy trained models and optimize for speed on various platforms.

To get the most out of this book

The examples covered in this book can be run with Windows, Ubuntu, or Mac. All the installation instructions are covered. Basic knowledge of Python and machine learning is required. It's preferable that the reader has GPU hardware but it's not necessary.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

www.packtpub.com

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

Enter the name of the book in the

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Deep-Learning-for-Computer-Vision. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Getting Started

Computer vision is the science of understanding or manipulating images and videos. Computer vision has a lot of applications, including autonomous driving, industrial inspection, and augmented reality. The use of deep learning for computer vision can be categorized into multiple categories: classification, detection, segmentation, and generation, both in images and videos. In this book, you will learn how to train deep learning models for computer vision applications and deploy them on multiple platforms. We will use TensorFlow, a popular python library for deep learning throughout this book for the examples. In this chapter, we will cover the following topics:

The basics and vocabulary of deep learning

How deep learning meets computer vision?

Setting up the development environment that will be used for the examples covered in this book

Getting a feel for TensorFlow, along with its powerful tools, such as TensorBoard and TensorFlow Serving

Understanding deep learning

Computer vision as a field has a long history. With the emergence of deep learning, computer vision has proven to be useful for various applications. Deep learning is a collection of techniques from artificial neural network (ANN), which is a branch of machine learning. ANNs are modelled on the human brain; there are nodes linked to each other that pass information to each other. In the following sections, we will discuss in detail how deep learning works by understanding the commonly used basic terms.

Perceptron

An artificial neuron or perceptron takes several inputs and performs a weighted summation to produce an output. The weight of the perceptron is determined during the training process and is based on the training data. The following is adiagram of the perceptron:

The inputs are weighted and summed as shown in the preceding image. The sum is then passed through a unit step function, in this case, for a binary classification problem. A perceptron can only learn simple functions by learning the weights from examples. The process of learning the weights is called training. The training on a perceptron can be done through gradient-based methods which are explained in a later section. The output of the perceptron can be passed through anactivationfunction ortransferfunction, which will be explained in the next section.

Activation functions

The activation functions make neural nets nonlinear. An activation function decides whether a perceptron should fire or not. During training activation, functions play an important role in adjusting the gradients. An activation function such as sigmoid, shown in the next section, attenuates the values with higher magnitudes. This nonlinear behaviour of the activation function gives the deep nets to learn complex functions. Most of the activation functions are continuous and differential functions, except rectified unit at 0. A continuous function has small changes in output for every small change in input. A differential function has a derivative existing at every point in the domain.

In order to train a neural network, the function has to be differentiable. Following are a few activation functions.

Don't worry if you don't understand the terms like continuous and differentiable in detail. It will become clearer over the chapters.

Sigmoid

Sigmoid can be considered a smoothened step function and hence differentiable.Sigmoid is useful for converting any value to probabilities and can be used for binary classification. The sigmoid maps input to a value in the range of 0 to 1, as shown in the following graph:

The change in Y values with respect to X is going to be small, and hence, there will be vanishing gradients. After some learning, the change may be small. Another activation function called tanh, explained in next section, is a scaled version of sigmoid and avoids the problem of a vanishing gradient.

The hyperbolic tangent function

The hyperbolic tangent function, ortanh, is the scaled version of sigmoid. Like sigmoid, it is smooth and differentiable. The tanh maps input to a value in the range of -1 to 1, as shown in the following graph:

The gradients are more stable than sigmoid and hence have fewer vanishing gradient problems. Both sigmoid and tanh fire all the time, making the ANN really heavy. The Rectified Linear Unit (ReLU) activation function, explained in the next section, avoids this pitfall by not firing at times.

The Rectified Linear Unit (ReLU)

ReLu can let big numbers pass through. This makes a few neurons stale and they don't fire. This increases the sparsity, and hence, it is good. The ReLU maps input x to max (0, x), that is, they map negative inputs to 0, and positive inputs are output without any change as shown in the following graph:

Because ReLU doesn't fire all the time, it can be trained faster. Since the function is simple, it is computationally the least expensive. Choosing the activation function is very dependent on the application. Nevertheless, ReLU works well for a large range of problems. In the next section, you will learn how to stack several perceptrons together that can learn more complex functions than perceptron.

Artificial neural network (ANN)

ANN is a collection of perceptrons and activation functions. The perceptrons are connected to form hidden layers or units. The hidden units form the nonlinear basis that maps the input layers to output layers in a lower-dimensional space, which is also called artificial neural networks. ANN is a map from input to output. The map is computed by weighted addition of the inputs with biases. The values of weight and bias values along with the architecture are called model.

The training process determines the values of these weights and biases. The model values are initialized with random values during the beginning of the training. The error is computed using a loss function by contrasting it with the ground truth. Based on the loss computed, the weights are tuned at every step. The training is stopped when the error cannot be further reduced. The training process learns the features during the training. The features are a better representation than the raw images. The following is a diagram of an artificial neural network, or multi-layer perceptron:

Several inputs of x are passed through a hidden layer of perceptrons and summed to the output. The universal approximation theorem suggests that such a neural network can approximate any function. The hidden layer can also be called a dense layer. Every layer can have one of the activation functions described in the previous section. The number of hidden layers and perceptrons can be chosen based on the problem. There are a few more things that make this multilayer perceptron work for multi-class classification problems. A multi-class classification problem tries to discriminate more than ten categories. We will explore those terms in the following sections.

One-hot encoding

One-hot encoding is a way to represent the target variables or classes in case of a classification problem. The target variables can be converted from the string labels to one-hot encoded vectors. A one-hot vector is filled with 1 at the index of the target class but with 0 everywhere else. For example, if the target classes are cat and dog, they can be represented by [1, 0] and [0, 1], respectively. For 1,000 classes, one-hot vectors will be of size 1,000 integers with all zeros but 1. It makes no assumptions about the similarity of target variables. With the combination of one-hot encoding with softmax explained in the following section, multi-class classification becomes possible in ANN.

Softmax

Softmax is a way of forcing the neural networks to output the sum of 1. Thereby, the output values of the softmax function can be considered as part of a probability distribution. This is useful in multi-class classification problems. Softmax is a kind of activation function with the speciality of output summing to 1. It converts the outputs to probabilities by dividing the output by summation of all the other values. The Euclidean distance can be computed between softmax probabilities and one-hot encoding for optimization. But the cross-entropy explained in the next section is a better cost function to optimize.

Cross-entropy

Cross-entropy compares the distance between the outputs of softmax and one-hot encoding. Cross-entropy is a loss function for which error has to be minimized. Neural networks estimate the probability of the given data to every class. The probability has to be maximized to the correct target label. Cross-entropy is the summation of negative logarithmic probabilities. Logarithmic value is used for numerical stability. Maximizing a function is equivalent to minimizing the negative of the same function. In the next section, we will see the following regularization methods to avoid the overfitting of ANN:

Dropout

Batch normalization

L1 and L2 normalization

Dropout

Dropout is an effective way of regularizing neural networks to avoid the overfitting of ANN. During training, the dropout layer cripples the neural network by removing hidden units stochastically as shown in the following image:

Note how the neurons are randomly trained. Dropout is also an efficient way of combining several neural networks. For each training case, we randomly select a few hidden units so that we end up with different architectures for each case. This is an extreme case of bagging and model averaging. Dropout layer should not be used during the inference as it is not necessary.

Batch normalization

Batch normalization, or batch-norm, increase the stability and performance of neural network training. It normalizes the output from a layer with zero mean and a standard deviation of 1. This reduces overfitting and makes the network train faster. It is very useful in training complex neural networks.

L1 and L2 regularization

L1 penalizes the absolute value of the weight and tends to make the weights zero. L2 penalizes the squared value of the weight and tends to make the weight smaller during the training. Both the regularizes assume that models with smaller weights are better.

Training neural networks

Training ANN is tricky as it contains several parameters to optimize. The procedure of updating the weights is called backpropagation. The procedure to minimize the error is called optimization. We will cover both of them in detail in the next sections.

Backpropagation

A backpropagation algorithm is commonly used for training artificial neural networks. The weights are updated from backward based on the error calculated as shown in the following image:

After calculating the error, gradient descent can be used to calculate the weight updating, as explained in the next section.