Advanced Deep Learning with Python - Ivan Vasilev - E-Book

Advanced Deep Learning with Python E-Book

Ivan Vasilev

0,0
40,81 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Gain expertise in advanced deep learning domains such as neural networks, meta-learning, graph neural networks, and memory augmented neural networks using the Python ecosystem




Key Features



  • Get to grips with building faster and more robust deep learning architectures


  • Investigate and train convolutional neural network (CNN) models with GPU-accelerated libraries such as TensorFlow and PyTorch


  • Apply deep neural networks (DNNs) to computer vision problems, NLP, and GANs



Book Description



In order to build robust deep learning systems, you'll need to understand everything from how neural networks work to training CNN models. In this book, you'll discover newly developed deep learning models, methodologies used in the domain, and their implementation based on areas of application.







You'll start by understanding the building blocks and the math behind neural networks, and then move on to CNNs and their advanced applications in computer vision. You'll also learn to apply the most popular CNN architectures in object detection and image segmentation. Further on, you'll focus on variational autoencoders and GANs. You'll then use neural networks to extract sophisticated vector representations of words, before going on to cover various types of recurrent networks, such as LSTM and GRU. You'll even explore the attention mechanism to process sequential data without the help of recurrent neural networks (RNNs). Later, you'll use graph neural networks for processing structured data, along with covering meta-learning, which allows you to train neural networks with fewer training samples. Finally, you'll understand how to apply deep learning to autonomous vehicles.







By the end of this book, you'll have mastered key deep learning concepts and the different applications of deep learning models in the real world.




What you will learn



  • Cover advanced and state-of-the-art neural network architectures


  • Understand the theory and math behind neural networks


  • Train DNNs and apply them to modern deep learning problems


  • Use CNNs for object detection and image segmentation


  • Implement generative adversarial networks (GANs) and variational autoencoders to generate new images


  • Solve natural language processing (NLP) tasks, such as machine translation, using sequence-to-sequence models


  • Understand DL techniques, such as meta-learning and graph neural networks



Who this book is for



This book is for data scientists, deep learning engineers and researchers, and AI developers who want to further their knowledge of deep learning and build innovative and unique deep learning projects. Anyone looking to get to grips with advanced use cases and methodologies adopted in the deep learning domain using real-world examples will also find this book useful. Basic understanding of deep learning concepts and working knowledge of the Python programming language is assumed.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 582

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Advanced Deep Learning with Python

 

 

Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch

 

 

 

 

 

 

Ivan Vasilev

 

 

 

 

 

 

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Advanced Deep Learning with Python

Copyright © 2019 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

Commissioning Editor: Pravin DhandreAcquisition Editor:Devika BattikeContent Development Editor:Nathanya DiasSenior Editor: Ayaan HodaTechnical Editor: Manikandan KurupCopy Editor: Safis EditingProject Coordinator:Aishwarya MohanProofreader: Safis EditingIndexer:Tejal Daruwale SoniProduction Designer:Nilesh Mohite

First published: December 2019

Production reference: 1111219

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78995-617-7

www.packt.com

 

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Fully searchable for easy access to vital information

Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. 

Contributors

About the author

Ivan Vasilev started working on the first open source Java deep learning library with GPU support in 2013. The library was acquired by a German company, where he continued to develop it. He has also worked as a machine learning engineer and researcher in the area of medical image classification and segmentation with deep neural networks. Since 2017, he has been focusing on financial machine learning. He is working on a Python-based platform that provides the infrastructure to rapidly experiment with different machine learning algorithms for algorithmic trading. Ivan holds an MSc degree in artificial intelligence from the University of Sofia, St. Kliment Ohridski.

 

 

About the reviewer

Saibal Dutta has been working as an analytical consultant in SAS Research and Development. He is also pursuing a PhD in data mining and machine learning from IIT, Kharagpur. He holds an M.Tech in electronics and communication from the National Institute of Technology, Rourkela. He has worked at TATA communications, Pune, and HCL Technologies Limited, Noida, as a consultant. In his 7 years of consulting experience, he has been associated with global players including IKEA (in Sweden) and Pearson (in the US). His passion for entrepreneurship led him to create his own start-up in the field of data analytics. His areas of expertise include data mining, artificial intelligence, machine learning, image processing, and business consultation.

 

 

 

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Advanced Deep Learning with Python

About Packt

Why subscribe?

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Core Concepts

The Nuts and Bolts of Neural Networks

The mathematical apparatus of NNs

Linear algebra

Vector and matrix operations

Introduction to probability

Probability and sets

Conditional probability and the Bayes rule

Random variables and probability distributions

Probability distributions

Information theory

Differential calculus

A short introduction to NNs

Neurons

Layers as operations

NNs

Activation functions

The universal approximation theorem

Training NNs

Gradient descent

Cost functions

Backpropagation

Weight initialization

SGD improvements

Summary

Section 2: Computer Vision

Understanding Convolutional Networks

Understanding CNNs

Types of convolutions

Transposed convolutions

1×1 convolutions

Depth-wise separable convolutions

Dilated convolutions

Improving the efficiency of CNNs

Convolution as matrix multiplication

Winograd convolutions

Visualizing CNNs

Guided backpropagation

Gradient-weighted class activation mapping

CNN regularization

Introducing transfer learning

Implementing transfer learning with PyTorch

Transfer learning with TensorFlow 2.0

Summary

Advanced Convolutional Networks

Introducing AlexNet

An introduction to Visual Geometry Group 

VGG with PyTorch and TensorFlow

Understanding residual networks

Implementing residual blocks

Understanding Inception networks

Inception v1

Inception v2 and v3

Inception v4 and Inception-ResNet

Introducing Xception

Introducing MobileNet

An introduction to DenseNets

The workings of neural architecture search

Introducing capsule networks

The limitations of convolutional networks

Capsules

Dynamic routing

The structure of the capsule network

Summary

Object Detection and Image Segmentation

Introduction to object detection

Approaches to object detection

Object detection with YOLOv3

A code example of YOLOv3 with OpenCV

Object detection with Faster R-CNN

Region proposal network

Detection network

Implementing Faster R-CNN with PyTorch

Introducing image segmentation

Semantic segmentation with U-Net

Instance segmentation with Mask R-CNN

Implementing Mask R-CNN with PyTorch

Summary

Generative Models

Intuition and justification of generative models

Introduction to VAEs

Generating new MNIST digits with VAE

Introduction to GANs

Training GANs

Training the discriminator

Training the generator

Putting it all together

Problems with training GANs

Types of GAN

Deep Convolutional GAN

Implementing DCGAN

Conditional GAN

Implementing CGAN

Wasserstein GAN

Implementing WGAN

Image-to-image translation with CycleGAN

Implementing CycleGAN

Building the generator and discriminator

Putting it all together

Introducing artistic style transfer

Summary

Section 3: Natural Language and Sequence Processing

Language Modeling

Understanding n-grams

Introducing neural language models

Neural probabilistic language model

Word2Vec

CBOW

Skip-gram

fastText

Global Vectors for Word Representation model

Implementing language models

Training the embedding model

Visualizing embedding vectors

Summary

Understanding Recurrent Networks

Introduction to RNNs

RNN implementation and training

Backpropagation through time

Vanishing and exploding gradients

Introducing long short-term memory

Implementing LSTM

Introducing gated recurrent units

Implementing GRUs

Implementing text classification

Summary

Sequence-to-Sequence Models and Attention

Introducing seq2seq models

Seq2seq with attention

Bahdanau attention

Luong attention

General attention

Implementing seq2seq with attention

Implementing the encoder

Implementing the decoder

Implementing the decoder with attention

Training and evaluation

Understanding transformers

The transformer attention

The transformer model

Implementing transformers

Multihead attention

Encoder

Decoder

Putting it all together

Transformer language models

Bidirectional encoder representations from transformers

Input data representation

Pretraining

Fine-tuning

Transformer-XL

Segment-level recurrence with state reuse

Relative positional encodings

XLNet

Generating text with a transformer language model

Summary

Section 4: A Look to the Future

Emerging Neural Network Designs

Introducing Graph NNs

Recurrent GNNs

Convolutional Graph Networks

Spectral-based convolutions

Spatial-based convolutions with attention

Graph autoencoders

Neural graph learning

Implementing graph regularization

Introducing memory-augmented NNs

Neural Turing machines

MANN*

Summary

Meta Learning

Introduction to meta learning

Zero-shot learning

One-shot learning

Meta-training and meta-testing

Metric-based meta learning

Matching networks for one-shot learning

Siamese networks

Implementing Siamese networks

Prototypical networks

Optimization-based learning

Summary

Deep Learning for Autonomous Vehicles

Introduction to AVs

Brief history of AV research

Levels of automation

Components of an AV system 

Environment perception

Sensing

Localization

Moving object detection and tracking

Path planning

Introduction to 3D data processing

Imitation driving policy

Behavioral cloning with PyTorch

Generating the training dataset

Implementing the agent neural network

Training

Letting the agent drive

Putting it all together

Driving policy with ChauffeurNet

Input and output representations

Model architecture

Training

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

This book is a collection of newly evolved deep learning models, methodologies, and implementations based on the areas of their application. In the first section of the book, you will learn about the building blocks of deep learning and the math behind neural networks (NNs). In the second section, you'll focus on convolutional neural networks (CNNs) and their advanced applications in computer vision (CV). You'll learn to apply the most popular CNN architectures in object detection and image segmentation. Finally, you'll discuss variational autoencoders and generative adversarial networks.In the third section, you'll focus on natural language and sequence processing. You'll use NNs to extract sophisticated vector representations of words. You'll discuss various types of recurrent networks, such as long short-term memory (LSTM) and gated recurrent unit (GRU). Finally, you'll cover the attention mechanism to process sequential data without the help of recurrent networks. In the final section, you'll learn how to use graph NNs to process structured data. You'll cover meta-learning, which allows you to train an NN with fewer training samples. And finally, you'll learn how to apply deep learning in autonomous vehicles.By the end of this book, you'll have gained mastery of the key concepts associated with deep learning and evolutionary approaches to monitoring and managing deep learning models.

Who this book is for

This book is for data scientists, deep learning engineers and researchers, and AI developers who want to master deep learning and want to build innovative and unique deep learning projects of their own. This book will also appeal to those who are looking to get well-versed with advanced use cases and the methodologies adopted in the deep learning domain using real-world examples. Basic conceptual understanding of deep learning and a working knowledge of Python is assumed.

What this book covers

Chapter 1, The Nuts and Bolts of Neural Networks, will briefly introduce what deep learning is and then discuss the mathematical underpinnings of NNs. This chapter will discuss NNs as mathematical models. More specifically, we'll focus on vectors, matrices, and differential calculus. We'll also discuss some gradient descent variations, such as Momentum, Adam, and Adadelta, in depth. We will also discuss how to deal with imbalanced datasets.

Chapter 2, Understanding Convolutional Networks, will provide a short description of CNNs. We'll discuss CNNs and their applications in CV

Chapter 3, Advanced Convolutional Networks, will discuss some advanced and widely used NN architectures, including VGG, ResNet, MobileNets, GoogleNet, Inception, Xception, and DenseNets. We'll also implement ResNet and Xception/MobileNets using PyTorch.

Chapter 4, Object Detection and Image Segmentation, will discuss two important vision tasks: object detection and image segmentation. We'll provide implementations for both of them. 

Chapter 5, Generative Models, will begin the discussion about generative models. In particular, we'll talk about generative adversarial networks and neural style transfer. The particular style transfer will be implemented later.

Chapter 6, Language Modeling, will introduce word and character-level language models. We'll also talk about word vectors (word2vec, Glove, and fastText) and we'll use Gensim to implement them. We'll also walk through the highly technical and complex process of preparing text data for machine learning applications such as topic modeling and sentiment modeling with the help of the Natural Language ToolKit's (NLTK) text processing techniques.

Chapter 7, Understanding Recurrent Networks, will discuss the basic recurrent networks, LSTM, and GRU cells. We'll provide a detailed explanation and pure Python implementations for all of the networks.

Chapter 8, Sequence-to-Sequence Models and Attention, will discuss sequence models and the attention mechanism, including bidirectional LSTMs, and a new architecture called transformer with encoders and decoders. 

Chapter 9, Emerging Neural Network Designs, will discuss graph NNs and NNs with memory, such as Neural Turing Machines (NTM), differentiable neural computers, and MANN.

Chapter 10, Meta Learning, will discuss meta learning—the way to teach algorithms how to learn. We'll also try to improve upon deep learning algorithms by giving them the ability to learn more information using less training samples.

Chapter 11, Deep Learning for Autonomous Vehicles, will explore the applications of deep learning in autonomous vehicles. We'll discuss how to use deep networks to help the vehicle make sense of its surrounding environment.

To get the most out of this book

To get the most out of this book, you should be familiar with Python and have some knowledge of machine learning. The book includes short introductions to the major types of NNs, but it will help if you are already familiar with the basics of NNs.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

Support

tab.

Click on

Code Downloads

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Advanced-Deep-Learning-with-Python. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789956177_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Build the full GAN model by including the generator, discriminator, and the combined network."

A block of code is set as follows:

import

matplotlib.pyplot

as

plt

from

matplotlib.markers

import

MarkerStyle

import

numpy

as

np

import

tensorflow

as

tf

from

tensorflow.keras

import

backend

as

K

from

tensorflow.keras.layers

import

Lambda

,

Input

,

Dense

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "The collection of all possible outcomes (events) of an experiment is called, samplespace."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1: Core Concepts

This section will discuss some core Deep Learning (DL) concepts: what exactly DL is, the mathematical underpinnings of DL algorithms, and the libraries and tools that make it possible to develop DL algorithms rapidly.

This section contains the following chapter:

Chapter 1

The Nuts and Bolts of Neural Networks

The Nuts and Bolts of Neural Networks

In this chapter, we'll discuss some of the intricacies of neural networks (NNs)—the cornerstone of deep learning (DL). We'll talk about their mathematical apparatus, structure, and training. Our main goal is to provide you with a systematic understanding of NNs. Often, we approach them from a computer science perspective—as a machine learning (ML) algorithm (or even a special entity) composed of a number of different steps/components. We gain our intuition by thinking in terms of neurons, layers, and so on (at least I did this when I first learned about this field). This is a perfectly valid way to do things and we can still do impressive things at this level of understanding. Perhaps this is not the correct approach, though.

NNs have solid mathematical foundations and if we approach them from this point of view, we'll be able to define and understand them in a more fundamental and elegant way. Therefore, in this chapter, we'll try to underscore the analogy between NNs from mathematical and computer science points of view. If you are already familiar with these topics, you can skip this chapter. Still, I hope that you'll find some interesting bits you didn't know about already (we'll do our best to keep this chapter interesting!).

In this chapter, we will cover the following topics:

The mathematical apparatus of NNs

A short introduction to NNs

Training NNs

The mathematical apparatus of NNs

In the next few sections, we'll discuss the mathematical branches related to NNs. Once we've done this, we'll connect them to NNs themselves.

Linear algebra

Linear algebra deals with linear equations such as  and linear transformations (or linear functions) and their representations, such as matrices and vectors.

Linear algebra identifies the following mathematical objects:

Scalars

: A single number.

Vectors

: A one-dimensional array of numbers (or components). Each component of the array has an index. In literature, we will see vectors denoted either with a superscript arrow () or in bold (

x

). The following is an example of a vector:

Throughout this book, we'll mostly use the bold (x) graph notations. But in some instances, we'll use formulas from different sources and we'll try to retain their original notation.

We can visually represent an n-dimensional vector as the coordinates of a point in an n-dimensional Euclidean space, (equivalent to a coordinate system). In this case, the vector is referred to as Euclidean and each vector component represents the coordinate along the corresponding axis, as shown in the following diagram: 

Vector representation in  space

However, the Euclidean vector is more than just a point and we can also represent it with the following two properties:

Magnitude

(or

length

is a generalization of the Pythagorean theorem for an 

n

-dimensional space

:

Direction

is the angle of the vector along each axis of the vector space.

Matrices

: This is a two-dimensional array of numbers. Each element is identified by two indices (row and column). A matrix is usually denoted with a bold capital letter; for example, 

A

. Each matrix element is denoted with the small matrix letter and a subscript index; for example, 

a

ij

.

Let's look at an example of the matrix notation in the following formula

:

We can represent a vector as a single-column n×1 matrix (referred to as a column matrix) or a single -ow 1×n matrix (referred to as a row matrix).

Tensors

: Before we explain them, we have to start with a disclaimer. Tensors originally come from mathematics and physics, where they have existed long before we started using them in ML. The tensor definition in these fields differs from the ML one. For the purposes of this book, we'll only consider tensors in the ML context. Here, a tensor is a multi-dimensional array with the following properties:

Rank

: Indicates the number of array dimensions. For example, a tensor of rank 2 is a matrix, a tensor of rank 1 is a vector, and a tensor of rank 0 is a scalar. However, the tensor has no limit on the number of dimensions. Indeed, some types of NNs use tensors of rank 4. 

Shape

: The size of each dimension.

The data type

of the tensor elements. These can vary between libraries, but typically include 16-, 32-, and 64-bit float and 8-, 16-, 32-, and 64-bit integers.

Contemporary DL libraries such as TensorFlow and PyTorch use tensors as their main data structure.

You can find a thorough discussion on the nature of tensors here: https://stats.stackexchange.com/questions/198061/why-the-sudden-fascination-with-tensors. You can also check the TensorFlow (https://www.tensorflow.org/guide/tensors) and PyTorch (https://pytorch.org/docs/stable/tensors.html) tensor definitions.

Now that we've introduced the types of objects in linear algebra, in the next section, we'll discuss some operations that can be applied to them.

Probability distributions

We'll start with the binomial distribution for discrete variables in binomial experiments. A binomial experiment has only two possible outcomes: success or failure. It also satisfies the following requirements:

Each trial is independent of the others.

The probability of success is always the same.

An example of a binomial experiment is the coin toss experiment.

Now, let's assume that the experiment consists of n trials. x of them are successful, while the probability of success at each trial is p. The formula for a binomial PMF of variable X (not to be confused with x) is as follows: 

Here,  is the binomial coefficient. This is the number of combinations of x successful trials, which we can select from the n total trials. If n=1, then we have a special case of binomial distribution called Bernoulli distribution.

Next, let's discuss the normal (or Gaussian) distribution for continuous variables, which closely approximates many natural processes. The normal distribution is defined with the following exponential PDF formula, known as normal equation (one of the most popular notations):

Here, x is the value of the random variable, μ is the mean, σ is the standard deviation, and σ2 is the variance. The preceding equation produces a bell-shaped curve, which is shown in the following diagram:

Normal distribution

Let's discuss some of the properties of the normal distribution, in no particular order:

The curve is symmetric along its center, which is also the maximum value.

The shape and location of the curve are fully described by the mean and standard deviation, where we have the following:

The center of the curve (and its maximum value) is equal to the mean. That is, the mean determines the location of the curve along the

x

axis.

The width of the curve is determined by the standard deviation. 

In the following diagram, we can see examples of normal distributions with different μ and σ values:

Examples of normal distributions with different μ and σ values

The normal distribution approaches 0 toward +/- infinity, but it never becomes 0. Therefore, a random variable under normal distribution can have any value (albeit some values with a tiny probability).

The surface area under the curve is equal to 1, which is ensured by the constant, , being before the exponent.

 (located in the exponent) is called the standard score (or z-score). A standardized normal variable has a mean of 0 and a standard deviation of 1. Once transformed, the random variable participates in the equation in its standardized form.

In the next section, we'll introduce the multidisciplinary field of information theory, which will help us use probability theory in the context of NNs. 

A short introduction to NNs

A NN is a function (let's denote it with f) that tries to approximate another target function, g. We can describe this relationship with the following equation:

Here, x is the input data and θ are the NN parameters (weights). The goal is to find such θ parameters with the best approximate, g. This generic definition applies for both regression (approximating the exact value of g) and classification (assigning the input to one of multiple possible classes) tasks. Alternatively, the NN function can be denoted as .

We'll start our discussion from the smallest building block of the NN—the neuron.

Layers as operations

The next level in the NN organizational structure is the layers of units, where we combine the scalar outputs of multiple units in a single output vector. The units in a layer are not connected to each other. This organizational structure makes sense for the following reasons:

We can generalize multivariate regression to a layer, as opposed to only linear or logistic regression for a single unit. In other words, we can approximate multiple values with a layer as opposed to a single value with a unit. This happens in the case of classification output, where each output unit represents the probability the input belongs to a certain class.

A unit can convey limited information because its output is a scalar. By combining the unit outputs, instead of a single activation, we can now consider the vector in its entirety. In this way, we can convey a lot more information, not only because the vector has multiple values, but also because the relative ratios between them carry additional meaning.

Because the units in a layer have no connections to each other, we can parallelize the computation of their outputs (thereby increasing the computational speed). This ability is one of the major reasons for the success of DL in recent years.