28,14 €
Concepts, tools, and techniques to explore deep learning architectures and methodologies
Key Features
Book Description
Deep learning architectures are composed of multilevel nonlinear operations that represent high-level abstractions; this allows you to learn useful feature representations from the data. This book will help you learn and implement deep learning architectures to resolve various deep learning research problems.
Hands-On Deep Learning Architectures with Python explains the essential learning algorithms used for deep and shallow architectures. Packed with practical implementations and ideas to help you build efficient artificial intelligence systems (AI), this book will help you learn how neural networks play a major role in building deep architectures. You will understand various deep learning architectures (such as AlexNet, VGG Net, GoogleNet) with easy-to-follow code and diagrams. In addition to this, the book will also guide you in building and training various deep architectures such as the Boltzmann mechanism, autoencoders, convolutional neural networks (CNNs), recurrent neural networks (RNNs), natural language processing (NLP), GAN, and more—all with practical implementations.
By the end of this book, you will be able to construct deep models using popular frameworks and datasets with the required design patterns for each architecture. You will be ready to explore the potential of deep architectures in today's world.
What you will learn
Who this book is for
If you're a data scientist, machine learning developer/engineer, or deep learning practitioner, or are curious about AI and want to upgrade your knowledge of various deep learning architectures, this book will appeal to you. You are expected to have some knowledge of statistics and machine learning algorithms to get the best out of this book
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 316
Veröffentlichungsjahr: 2019
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor:Sunith ShettyAcquisition Editor:Porous GodhaaContent Development Editor:Karan ThakkarTechnical Editor: Sushmeeta JenaCopy Editor: Safis EditingProject Coordinator:Hardik BhindeProofreader: Safis EditingIndexer:Pratik ShirodkarGraphics:Jisha ChirayilProduction Coordinator:Arvindkumar Gupta
First published: April 2019
Production reference: 1300419
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78899-808-6
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Yuxi (Hayden) Liu is an author of a series of machine learning books and an education enthusiast. His first book, the first edition of Python Machine Learning By Example, was a #1 bestseller on Amazon India in 2017 and 2018 and his other book R Deep Learning Projects, both published by Packt Publishing.
He is an experienced data scientist who is focused on developing machine learning and deep learning models and systems. He has worked in a variety of data-driven domains and has applied his machine learning expertise to computational advertising, recommendations, and network anomaly detection. He published five first-authored IEEE transaction and conference papers during his master's research at the University of Toronto.
Saransh Mehta has cross-domain experience of working with texts, images, and audio using deep learning. He has been building artificial, intelligence-based solutions, including a generative chatbot, an attendee-matching recommendation system, and audio keyword recognition systems for multiple start-ups. He is very familiar with the Python language, and has extensive knowledge of deep learning libraries such as TensorFlow and Keras. He has been in the top 10% of entrants to deep learning challenges hosted by Microsoft and Kaggle.
Antonio L. Amadeu is a data science consultant and is passionate about data, artificial intelligence, and neural networks, in particular, using machine learning and deep learning algorithms in daily challenges to solve all types of issues in any business field and industry. He has worked for large companies, including Unilever, Lloyds Bank, TE Connectivity, and Microsoft.
As an aspiring astrophysicist, he does some research in relation to the Virtual Observatory group at Sao Paulo University in Brazil, a member of the International Virtual Observatory Alliance (IVOA).
Junho Kim received a BS in mathematics and computer science engineering in 2015, and an MS in computer science engineering in 2017, from Chung-Ang University, Seoul, South Korea. After graduation, he worked as an artificial intelligence research intern at AIRI, Lunit, and Naver Webtoon. Currently, he is working for NCSOFT as an artificial intelligence research scientist.
His research interests include deep learning in computer vision, especially in relation to generative models with GANs, and image-to-image translation. He likes to read papers and implement deep learning in a simple way that others can understand easily. All his works are shared on GitHub (@taki0112). His dream is to make everyone's life more fun using AI.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On Deep Learning Architectures with Python
About Packt
Why subscribe?
Packt.com
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: The Elements of Deep Learning
Getting Started with Deep Learning
Artificial intelligence
Machine learning
Supervised learning
Regression
Classification
Unsupervised learning
Reinforcement learning
Deep learning
Applications of deep learning
Self-driving cars
Image translation
Machine translation
Encoder-decoder structure
Chatbots
Building the fundamentals
Biological inspiration 
ANNs
Activation functions
Linear activation 
Sigmoid activation
Tanh activation
ReLU activation
Softmax activation
TensorFlow and Keras
Setting up the environment
Introduction to TensorFlow
Installing TensorFlow CPU
Installing TensorFlow GPU
Testing your installation
Getting to know TensorFlow 
Building a graph
Creating a Session
Introduction to Keras
Sequential API
Functional API
Summary
Deep Feedforward Networks
Evolutionary path to DFNs
Architecture of DFN
Training
Loss function
Regression loss
Mean squared error (MSE)
Mean absolute error
Classification loss
Cross entropy
Gradient descent
Types of gradient descent
Batch gradient descent
Stochastic gradient descent
Mini-batch gradient descent
Backpropagation
Optimizers
Train, test, and validation
Training set
Validation set
Test set
Overfitting and regularization
L1 and L2 regularization
Dropout
Early stopping
Building our first DFN
MNIST fashion data
Getting the data 
Visualizing data
Normalizing and splitting data
Model parameters
One-hot encoding
Building a model graph
Adding placeholders
Adding layers
Adding loss function
Adding an optimizer
Calculating accuracy
Running a session to train
The easy way
Summary
Restricted Boltzmann Machines and Autoencoders
What are RBMs?
The evolution path of RBMs
RBM architectures and applications
RBM and their implementation in TensorFlow
RBMs for movie recommendation
DBNs and their implementation in TensorFlow
DBNs for image classification
What are autoencoders?
The evolution path of autoencoders
Autoencoders architectures and applications
Vanilla autoencoders
Deep autoencoders
Sparse autoencoders
Denoising autoencoders
Contractive autoencoders
Summary
Exercise
Acknowledgements
Section 2: Convolutional Neural Networks
CNN Architecture
Problem with deep feedforward networks
Evolution path to CNNs
Architecture of CNNs
The input layer
The convolutional layer
The maxpooling layer
The fully connected layer
Image classification with CNNs
VGGNet
InceptionNet
ResNet
Building our first CNN 
CIFAR
Data loading and pre-processing
Object detection with CNN
R-CNN
Faster R-CNN
You Only Look Once (YOLO)
Single Shot Multibox Detector
TensorFlow object detection zoo
Summary
Mobile Neural Networks and CNNs
Evolution path to MobileNets
Architecture of MobileNets
Depth-wise separable convolution
The need for depth-wise separable convolution
Structure of MobileNet
MobileNet with Keras
MobileNetV2
Motivation behind MobileNetV2
Structure of MobileNetV2
Linear bottleneck layer
Expansion layer
Inverted residual block
Overall architecture
Implementing MobileNetV2
Comparing the two MobileNets
SSD MobileNetV2
Summary
Section 3: Sequence Modeling
Recurrent Neural Networks
What are RNNs?
The evolution path of RNNs
RNN architectures and applications
Architectures by input and output
Vanilla RNNs
Vanilla RNNs for text generation
LSTM RNNs
LSTM RNNs for text generation
GRU RNNs
GRU RNNs for stock price prediction
Bidirectional RNNs
Bidirectional RNNs for sentiment classification
Summary
Section 4: Generative Adversarial Networks (GANs)
Generative Adversarial Networks
What are GANs?
Generative models
Adversarial – training in an adversarial manner
The evolution path of GANs
GAN architectures and implementations
Vanilla GANs
Deep convolutional GANs
Conditional GANs
InfoGANs
Summary
Section 5: The Future of Deep Learning and Advanced Artificial Intelligence
New Trends of Deep Learning
New trends in deep learning
Bayesian neural networks
What our deep learning models don't know – uncertainty
How we can obtain uncertainty information – Bayesian neural networks
Capsule networks
What convolutional neural networks fail to do
Capsule networks – incorporating oriental and relative spatial relationships
Meta-learning
One big challenge in deep learning – training data
Meta-learning – learning to learn
Metric-based meta-learning
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
Deep learning architectures are composed of multilevel nonlinear operations that represent high-level abstractions. This allows you to learn useful feature representations from data. Hands-On Deep Learning Architectures with Python gives you a rundown explaining the essential learning algorithms used for deep and shallow architectures. Packed with practical implementations and ideas to build efficient artificial intelligence systems, this book will help you learn how neural networks play a major role in building deep architectures.
You will gain an understanding of various deep learning architectures, such as AlexNet, VGG Net, GoogleNet, and many more, with easy-to-follow code and diagrams. In addition to this, the book will also guide you in building and training various deep architectures, such as the Boltzmann mechanism, autoencoders, convolutional neural networks (CNNs), recurrent neural networks (RNN), natural language processing (NLP), generative adversarial networks (GANs), and others, with practical implementations. This book explains the essential learning algorithms used for deep and shallow architectures.
By the end of this book, you will be able to construct deep models using popular frameworks and datasets with the required design patterns for each architecture. You will be ready to explore the possibilities of deep architectures in today's world.
If you're a data scientist, machine learning developer/engineer, deep learning practitioner, or are curious about the field of AI and want to upgrade your knowledge of various deep learning architectures, this book will appeal to you. You are expected to have some knowledge of statistics and machine learning algorithms to get the most out of this book.
Chapter 1, Getting Started with Deep Learning, covers the evolution of intelligence in machines and artificial intelligence and, eventually, deep learning. We'll then look at some applications of deep learning and set up our environment for coding our way through deep learning models. Completing this chapter, you will learn the following things.
Chapter 2, Deep Feedforward Networks, covers the evolution history of deep feedforward networks and their architecture. We will also demonstrate how to bring up and preprocess data for training a deep learning network.
Chapter 3, Restricted Boltzmann Machines and Autoencoders, explains the algorithm behind the scenes, called restricted Boltzmann machines (RBMs) and their evolutionary path. We will then dig deeper into the logic behind them and implement RBMs in TensorFlow. We will also apply them to build a movie recommender. We'll then learn about autoencoders and briefly look at their evolutionary path. We will also illustrate a variety of autoencoders, categorized by their architectures or forms of regularization.
Chapter 4, CNN Architecture, covers an important class of deep learning network for images, called convolutional neural networks (CNNs). We will also discuss the benefits of CNNs over deep feedforward networks. We will then learn more about some famous image classification CNNs and then build our first CNN image classifier on the CIFAR-10 dataset. Then, we'll move on to object detection with CNNs and the TensorFlow detection model, zoo.
Chapter 5, Mobile Neural Networks and CNNs, discusses the need for mobile neural networks for doing CNN work in a real-time application. We will also talk about the two benchmark MobileNet architectures introduced by Google—MobileNet and MobileNetV2. Later, we'll discuss the successful combination of MobileNet with object detection networks such as SSD to achieve object detection on mobile devices.
Chapter 6, Recurrent Neural Networks, explains one of the most important deep learning models, recurrent neural networks (RNNs), its architecture, and the evolutionary path of RNNs. Later, we'll will discuss a variety of architectures categorized by the recurrent layer, including vanilla RNNs, LSTM, GRU, and bidirectional RNNs, and apply the vanilla architecture to write our own War and Peace (a bit nonsensical though). We'll also introduce the bidirectional architecture that allows the model to preserve information from both past and future contexts of the sequence.
Chapter 7, Generative Adversarial Networks, explains one of the most interesting deep learning models, generative adversarial networks (GANs), and its evolutionary path. We will also illustrate a variety of GAN architectures with an example of image generation. We will also explore four GAN architectures, including vanilla GANs, deep convolutional GANs, conditional GANs, and information-maximizing GANs.
Chapter 8, New Trends in Deep Learning, talks about a few deep learning ideas that we have found impactful this year and more prominent in the future. We'll also learn that Bayesian deep learning combines the merits of both Bayesian learning and deep learning.
Readers will require prior knowledge of Python, TensorFlow, and Keras.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Deep-Learning-Architectures-with-Python. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781788998086_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."
A block of code is set as follows:
import tensorflow as tfimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.model_selection import train_test_split
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
import tensorflow as tfimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.model_selection import
train_test_split
Any command-line input or output is written as follows:
conda activate test_env
conda install tensorflow
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
In this section, you will get an overview of deep learning with Python, and will also learn about the architectures of the deep feedforward network, the Boltzmann machine, and autoencoders. We will also practice examples based on DFN and applications of the Boltzmann machine and autoencoders, with the concrete examples based on the DL frameworks/libraries with Python, along with their benchmarks.
This section consists of the following chapters:
Chapter 1
,
Getting Started with Deep Learning
Chapter 2
,
Deep Feedforward Networks
Chapter 3
,
Restricted Boltzmann Machines and Autoencoders
Welcome to the Hands-On Deep Learning Architectures with Python! If you are completely unfamiliar with deep learning, you can begin your journey right here with this book. And for readers who have an idea about it, we have covered almost every aspect of deep learning. So you are definitely going to learn a lot more about deep learning from this book.
The book is laid out in a cumulative manner; that is, it begins from the basics and builds it over and over to get to advanced levels. In this chapter, we discuss how humans started creating intelligence in machines and how artificial intelligence gradually evolved to machine learning and eventually deep learning. We then see some nice applications of deep learning. Moving back to the fundamentals, we will learn how artificial neurons work and, in the end, set up our environment for coding our way through deep learning models. After completing this chapter, you will have learned about the following things.
What artificial intelligence is, and how machine learning, deep learning relates to it
The types of machine learning tasks
Information about some interesting deep learning applications
What an artificial neural network is, and how it works
Setting up TensorFlow and Keras with Python
Let's begin with a short discussion on artificial intelligence and the relationships between artificial intelligence, machine learning, and deep learning.
Ever since the beginning of the computer era, humans have been trying to mimic the brain into the machine. Researchers have been developing methods that would make machines not only compute but also decide like we humans do. This quest of ours gave birth to artificial intelligence around the 1960s. By definition, artificial intelligence means developing systems that are capable of accomplishing tasks without a human explicitly programming every decision. In 1956, the first program for playing checkers was written by Arthur Samuel. Since then, researchers tried to mimic human intelligence by defining sets of handwritten rules that didn't involve any learning. Artificial intelligence programs, which played games such as chess, were nothing but sets of manually defined moves and strategies. In 1959, Arthur Samuel coined the term machine learning. Machine learning started using various concepts of probability and bayesian statistics to perform pattern recognition, feature extraction, classification, and so on. In the 1980s, inspired by the neural structure of the human brain, artificial neural networks (ANN) were introduced. ANN in the 2000s evolved into today's so-called deep learning! The following is a timeline for the evolution of artificial intelligence through machine learning and deep learning:
Artificial intelligence prior to machine learning was just about writing rules that the machine used to process the provided data. Machine learning made a transition. Now, just by providing the data and expected output to the machine learning algorithm, the computer returns an optimized set of rules for the task. Machine learning uses historic data to train a system and test it on unknown but similar data, beginning the journey of machines learning how to make decisions without being hard coded. In the early 90s, machine learning emerged as the new face of artificial intelligence. Larger datasets were developed and made public to allow more people to build and train machine learning models. Very soon a huge community of machine learning scientists/engineers developed. Although machine learning algorithms draw inference from statistics, what makes it powerful is the error minimization approach. It tries to minimize the error between expected output provided by the dataset and predicted algorithm output to discover the optimized rules. This is the learning part of machine learning. We won't be covering machine learning algorithms in this book but they are essentially divided into three categories: supervised, unsupervised, and reinforcement. Since deep learning is also a subset of machine learning, these categories apply to deep learning as well.
Regression deals with learning continuous mapping functions that can predict values provided by various input features. The function can be linear or non-linear. If the function is linear, it is referred to as linear regression, and if it is non-linear, it is commonly called polynomial regression. Predicting values when there are multiple input features (variables), we call multi-variate regression. A very typical example of regression is the house prediction problem. Provided with the various parameters of a house, such as build area, locality, number of rooms, and so on, the accurate selling price of the house can be predicted using historic data.
When the target output values are categorized instead of raw values, as in regression, it is a classification task. For example, we could classify different species of flowers based on input features, petal length, petal width, sepal length, and sepal width. Output categories are versicolor, setosa, and virginica. Algorithms like logistic regression, decision tree, naive bayes, and so on are classification algorithms. We will be covering details of Classification in Chapter 2.
Unsupervised learning is used when we don't have the corresponding target output values for the input. It is used to understand the data distribution and discover similarity of some kinds between the data points. As there is no target output to learn from, unsupervised algorithms rely on initializers to generate initial decision boundaries and update them as they go through the data. After going through the data multiple times, the algorithms update to optimized decision boundaries, which groups data points based on similarities. This method is known as clustering, and algorithms such as k-means are used for it.
Remember how you learned to ride a bicycle in your childhood? It was a trial and error process, right? You tried to balance yourself, and each time you did something wrong, you tipped off the bicycle. But, you learned from your mistakes, and eventually, you were able to ride without falling. In the same way, Reinforcement learning does the same! An agent is exposed to an environment where it takes action from a list of possible actions, which leads to a change in the state of the agent. A state is the current situation of the environment the agent is in. For every action, the agent receives an award. Whenever the received reward is positive, it signifies the agent has taken the correct step, and when the reward is negative, it signifies a mistake. The agent follows a policy, a reinforcement learning algorithm through which the agent determines next actions considering the current state. Reinforcement learning is the true form of artificial intelligence, inspired by a human's way of learning through trial and error. Think of yourself as the agent and the bicycle the environment! Discussing reinforcement learning algorithms here is beyond the scope of this book, so let's shift focus back to deep learning!
Though machine learning has provided computers with the capability to learn decision boundaries, it misses out on the robustness of doing so. Machine learning models have to be very specifically designed for every particular application. People spent hours deciding what features to select for optimal learning. As the data cross folded and non-linearity in data increased, machine learning models struggled to produce accurate results. Scientists soon realized that a much more powerful tool was required to apex this growth. In the 1980s, the concept of ANN was reborn, and with faster computing capabilities, deeper versions of ANN were developed, providing us with the powerful tool we were looking for—deep learning!
The leverage of a technology is decided by the robustness of its applications. Deep learning has created a great amount of commotion in tech as well as the non-tech market, owing to its hefty applications. So, in this section, we discuss some of the amazing applications of deep learning, which will keep you motivated all through the book.
This is probably the coolest and most promising application of deep learning. An autonomous vehicle has a number of cameras attached to it. The output video stream is fed into deep learning networks that recognize, as well as segment, different objects present around the car. NVIDIA has introduced an End to End Learning for Self-Driving Cars, which is a convolutional neural network that takes in input images from cameras and predicts what actions should be taken in the form of steering angle or acceleration. For training the network, the steering angles and throttle and camera views are stored when a human is driving along, documenting the actions taken by him for the changes occurring around him. The network's parameters are then updated through backpropagating (Backpropagation is discussed in details in Chapter 2, Deep Feedforward Networks) the error from human input and the network's predictions.
Generative adversarial networks (GANs) are the most notorious deep learning architectures. This is due to their capability to generate outputs from random noise input vectors. GAN has two networks: generator and discriminator. The job of the generator is to take a random vector as input and generate sample output data. The discriminator takes input from both the real data and faked data created by the generator. The job of the discriminator is to determine whether the input is coming from real data or a faked one from the generator. You can visualize the scenario, imagining the discriminator is a bank trying to distinguish between real and fake currency. At the same time, the generator is the fraud trying to pass fake currency to a counterfeit bank; generator and discriminator both learn through their mistakes, and the generator eventually produces results that imitate the real data very precisely.
One of the interesting applications of GANs is image-to-image translation. It is based on conditional GAN (we will be discussing GANs in details under Chapter 7). Given a pair of images holding some relation, say I1 and I2, a conditional GAN learns how to convert I1 into I2. A dedicated software called pix2pix is created to demonstrate the applications of this concept. It can be used to fill in colors to black and white images, create maps from satellite images, generate object images from mere sketches, and what not!
The following is the link to the actual paper published by Phillip Isola for image-to-image translation and a sample image from pix2pix depicting various applications of image-to-image translation (https://arxiv.org/abs/1611.07004):
There are more than 4,000 languages in this world and billions of people communicating through them. You can imagine the scale at which language translation is required. Most of the translations were done with human translators because classical rule-based translations made by machine were quite often meaningless. Deep learning came up with a solution to this. It can learn the language like we do and generate translations that are more natural. This is commonly referred to as neural machine translation (NMT).
Neural machine translation models are Recurrent Neural Networks (RNN), arranged in encoder-decoder fashion. The encoder network takes in variable length input sequences through RNN and encodes the sequences into a fixed size vector. The decoder begins with this encoded vector and starts generating translation word by word, until it predicts the end of sentence. The whole architecture is trained end-to-end with input sentence and correct output translation. The major advantage of these systems (apart from the capability to handle variable input size) is that they learn the context of a sentence and predict accordingly, rather than making a word-to-word translation. Neural machine translation can be best seen in action on Google translate in the following screenshot:
You might find this is the coolest application! Computers talking to us like humans has been a fascinating desire. It gives us a sense of computers being intelligent. However, most of the chatbot systems built earlier were based on a knowledge base and rules that define which response to pick from it. This made chatbots a very closed domain and they sounded quite unnatural. But with a little tweaking to the encoder-decoder architecture, we saw machine translation can actually make a chatbot generate a response on its own. The encodings learn the context of the input sentences, and, if the whole architecture is trained on sample queries and responses, whenever the system sees a new query, it can generate its response based on the learning. A lot of platforms like IBM Watson, Bottr, and rasa are building deep learning powered tools to build chatbots for business purposes.
This section is where you will begin the journey of being a deep learning architect. Deep learning stands on the pillar of ANNs. Our first step should be to understand how they work. In this section, we describe the biological inspiration behind the artificial neuron and the mathematical model to create an ANN. We have tried keeping the mathematics to a minimum and focused more on concepts. However, we assume you are familiar with basic algebra and calculus.
As we mentioned earlier, deep learning is inspired by the human brain. This seems a good idea indeed. To develop the intelligence of the brain inside a machine, you need the machine to mimic the brain! Now, if you are slightly aware of how a human brain learns and memorizes things so fast, you must know that this is possible due to millions of neurons developing an interconnected network, sending signals to each other, which makes up the memory. The neuron has two major components: dendrite and axon. The dendrite acts as a receptor and combines all the signals that the neuron is receiving. The axon is connected to dendrites at the end of other neurons through synapses. Once the incoming signals cross a threshold, they flow through the axon and synapse to pass the signal to the connected neuron. The structure in which the neurons are connected to each other decides the network's capabilities. Following is a diagram of what a biological neuron might look like:
Hence, the artificial model of neural network should be a parallel network of interconnected nodes, which take in inputs from various other nodes, and pass on the output when activated. This activation phenomenon must be controlled by some sort of mathematical operations. Let's see the operations and equations next!
