Deep Learning with PyTorch - Vishnu Subramanian - E-Book

Deep Learning with PyTorch E-Book

Vishnu Subramanian

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Deep learning powers the most intelligent systems in the world, such as Google Voice, Siri, and Alexa. Advancements in powerful hardware, such as GPUs, software frameworks such as PyTorch, Keras, TensorFlow, and CNTK along with the availability of big data have made it easier to implement solutions to problems in the areas of text, vision, and advanced analytics.

This book will get you up and running with one of the most cutting-edge deep learning libraries—PyTorch. PyTorch is grabbing the attention of deep learning researchers and data science professionals due to its accessibility, efficiency and being more native to Python way of development. You'll start off by installing PyTorch, then quickly move on to learn various fundamental blocks that power modern deep learning. You will also learn how to use CNN, RNN, LSTM and other networks to solve real-world problems. This book explains the concepts of various state-of-the-art deep learning architectures, such as ResNet, DenseNet, Inception, and Seq2Seq, without diving deep into the math behind them. You will also learn about GPU computing during the course of the book. You will see how to train a model with PyTorch and dive into complex neural networks such as generative networks for producing text and images.

By the end of the book, you'll be able to implement deep learning applications in PyTorch with ease.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 264

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Deep Learning with PyTorch
A practical approach to building neural network models using PyTorch
Vishnu Subramanian
BIRMINGHAM - MUMBAI

Deep Learning with PyTorch

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Veena PagareAcquisition Editor: Aman SinghContent Development Editor: Snehal KolteTechnical Editor: Sayli NikaljeCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Pratik ShirodkarGraphics: Tania DuttaProduction Coordinator: Deepika Naik

First published: February 2018

Production reference: 1210218

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78862-433-6

www.packtpub.com

To Jeremy Howard and Rachel Thomas for inspiring me to write this book, and to my family for their love.
– Vishnu Subramanian
mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Foreword

I have been working with Vishnu Subramanian for the last few years. Vishnu comes across as a passionate techno-analytical expert who has the rigor one requires to achieve excellence. His points of view on big data/machine learning/AI are well informed and carry his own analysis and appreciation of the landscape of problems and solutions. Having known him closely, I'm glad to be writing this foreword in my capacity as the CEO of Affine.

Increased success through deep learning solutions for our Fortune 500 clients clearly necessitates quick prototyping. PyTorch (a year-old deep learning framework) allows rapid prototyping for analytical projects without worrying too much about the complexity of the framework. This leads to an augmentation of the best of human capabilities with frameworks that can help deliver solutions faster. As an entrepreneur delivering advanced analytical solutions, building this capability in my teams happens to be the primary objective for me. In this book, Vishnu takes you through the fundamentals of building deep learning solutions using PyTorch while helping you build a mindset geared towards modern deep learning techniques.

The first half of the book introduces several fundamental building blocks of deep learning and PyTorch. It also covers key concepts such as overfitting, underfitting, and techniques that helps us deal with them.

In the second half of the book, Vishnu covers advanced concepts such as CNN, RNN, and LSTM transfer learning using pre-convoluted features, and one-dimensional convolutions, along with real-world examples of how these techniques can be applied. The last two chapters introduce you to modern deep learning architectures such as Inception, ResNet, DenseNet model and ensembling, and generative networks such as style transfer, GAN, and language modeling.

With all the practical examples covered and with solid explanations, this is one of the best books for readers who want to become proficient in deep learning. The rate at which technology evolves is unparalleled today. To a reader looking forward towards developing mature deep learning solutions, I would like to point that the right framework also drives the right mindset.

To all those reading through this book, happy exploring new horizons!

Wishing Vishnu and this book a roaring success, which they both deserve.

Manas Agarwal

CEO, Co-Founder of Affine Analytics,

Bengaluru, India

Contributors

About the author

Vishnu Subramanian has experience in leading, architecting, and implementing several big data analytical projects (artificial intelligence, machine learning, and deep learning). He specializes in machine learning, deep learning, distributed machine learning, and visualization. He has experience in retail, finance, and travel. He is good at understanding and coordinating between businesses, AI, and engineering teams.

This book would not have been possible without the inspiration and MOOC by Jeremy Howard and Rachel Thomas of fast.ai. Thanks to them for the important role they are playing in democratizing AI/deep learning.

About the reviewer

Poonam Ligade is a freelancer who specializes in big data tools such as Spark, Flink, and Cassandra, as well as scalable machine learning and deep learning. She is also a top kaggle kernel writer.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Deep Learning with PyTorch

Dedication

Packt Upsell

Why subscribe?

PacktPub.com

Foreword

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Getting Started with Deep Learning Using PyTorch

Artificial intelligence

The history of AI

Machine learning

Examples of machine learning in real life

Deep learning

Applications of deep learning

Hype associated with deep learning

The history of deep learning

Why now?

Hardware availability

Data and algorithms

Deep learning frameworks

PyTorch

Summary

Building Blocks of Neural Networks

Installing PyTorch

Our first neural network

Data preparation

Scalar (0-D tensors)

Vectors (1-D tensors)

Matrix (2-D tensors)

3-D tensors

Slicing tensors

4-D tensors

5-D tensors

Tensors on GPU

Variables

Creating data for our neural network

Creating learnable parameters

Neural network model

Network implementation

Loss function

Optimize the neural network

Loading data

Dataset class

DataLoader class

Summary

Diving Deep into Neural Networks

Deep dive into the building blocks of neural networks

Layers – fundamental blocks of neural networks

Non-linear activations

Sigmoid

Tanh

ReLU

Leaky ReLU

PyTorch non-linear activations

The PyTorch way of building deep learning algorithms

Model architecture for different machine learning problems

Loss functions

Optimizing network architecture

Image classification using deep learning

Loading data into PyTorch tensors

Loading PyTorch tensors as batches

Building the network architecture

Training the model

Summary

Fundamentals of Machine Learning

Three kinds of machine learning problems

Supervised learning

Unsupervised learning

Reinforcement learning

Machine learning glossary

Evaluating machine learning models

Training, validation, and test split

Simple holdout validation

K-fold validation

K-fold validation with shuffling

Data representativeness

Time sensitivity

Data redundancy

Data preprocessing and feature engineering

Vectorization

Value normalization

Handling missing values

Feature engineering

Overfitting and underfitting

Getting more data

Reducing the size of the network

Applying weight regularization

Dropout

Underfitting

Workflow of a machine learning project

Problem definition and dataset creation

Measure of success

Evaluation protocol

Prepare your data

Baseline model

Large model enough to overfit

Applying regularization

Learning rate picking strategies

Summary

Deep Learning for Computer Vision

Introduction to neural networks

MNIST – getting data

Building a CNN model from scratch

Conv2d

Pooling

Nonlinear activation – ReLU

View

Linear layer

Training the model

Classifying dogs and cats – CNN from scratch

Classifying dogs and cats using transfer learning

Creating and exploring a VGG16 model

Freezing the layers

Fine-tuning VGG16

Training the VGG16 model

Calculating pre-convoluted features

Understanding what a CNN model learns

Visualizing outputs from intermediate layers

Visualizing weights of the CNN layer

Summary

Deep Learning with Sequence Data and Text

Working with text data

Tokenization

Converting text into characters

Converting text into words

N-gram representation

Vectorization

One-hot encoding

Word embedding

Training word embedding by building a sentiment classifier

Downloading IMDB data and performing text tokenization

torchtext.data

torchtext.datasets

Building vocabulary

Generate batches of vectors

Creating a network model with embedding

Training the model

Using pretrained word embeddings

Downloading the embeddings

Loading the embeddings in the model

Freeze the embedding layer weights

Recursive neural networks

Understanding how RNN works with an example

LSTM

Long-term dependency

LSTM networks

Preparing the data

Creating batches

Creating the network

Training the model

Convolutional network on sequence data

Understanding one-dimensional convolution for sequence data

Creating the network

Training the model

Summary

Generative Networks

Neural style transfer

Loading the data

Creating the VGG model

Content loss

Style loss

Extracting the losses

Creating loss function for each layers

Creating the optimizer

Training

Generative adversarial networks

Deep convolutional GAN

Defining the generator network

Transposed convolutions

Batch normalization

Generator

Defining the discriminator network

Defining loss and optimizer

Training the discriminator

Training the discriminator with real images

Training the discriminator with fake images

Training the generator network

Training the complete network

Inspecting the generated images

Language modeling

Preparing the data

Generating the batches

Batches

Backpropagation through time

Defining a model based on LSTM

Defining the train and evaluate functions

Training the model

Summary

Modern Network Architectures

Modern network architectures

ResNet

Creating PyTorch datasets

Creating loaders for training and validation

Creating a ResNet model

Extracting convolutional features

Creating a custom PyTorch dataset class for the pre-convoluted features and loader

Creating a simple linear model

Training and validating the model

Inception

Creating an Inception model

Extracting convolutional features using register_forward_hook

Creating a new dataset for the convoluted features

Creating a fully connected model

Training and validating the model

Densely connected convolutional networks – DenseNet

DenseBlock

DenseLayer

Creating a DenseNet model

Extracting DenseNet features

Creating a dataset and loaders

Creating a fully connected model and train

Model ensembling

Creating models

Extracting the image features

Creating a custom dataset along with data loaders

Creating an ensembling model

Training and validating the model

Encoder-decoder architecture

Encoder

Decoder

Summary

What Next?

What next?

Overview

Interesting ideas to explore

Object detection

Image segmentation

OpenNMT in PyTorch

Alien NLP

fast.ai – making neural nets uncool again

Open Neural Network Exchange

How to keep yourself updated

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

PyTorch is grabbing the attention of data science professionals and deep learning practitioners due to its flexibility and ease of use. This book introduces the fundamental building blocks of deep learning and PyTorch. It demonstrates how to solve real-world problems using a practical approach. You will also learn some of the modern architectures and techniques that are used to crack some cutting-edge research problems.

This book provides the intuition behind various state-of-the-art deep learning architectures, such as ResNet, DenseNet, Inception, and Seq2Seq, without diving deep into the math. It also shows how to do transfer learning, how to speed up transfer learning using pre-computed features, and how to do text classification using embeddings, pretrained embeddings, LSTM, and one-dimensional convolutions.

By the end of the book, you will be a proficient deep learning practitioner who will be able to solve some business problems using the different techniques learned here.

Who this book is for

This book is for engineers, data analysts, and data scientists, interested in deep learning, and those looking to explore and implement advanced algorithms with PyTorch. Knowledge of machine learning is helpful but not mandatory. Knowledge of Python programming is expected.

What this book covers

Chapter 1, Getting Started with Deep Learning Using PyTorch, goes over the history of artificial intelligence (AI) and machine learning and looks at the recent growth of deep learning. We will also cover how various improvements in hardware and algorithms triggered huge success in the implementation of deep learning across different applications. Finally, we will introduce the beautiful PyTorch Python library, built on top of Torch by Facebook.

Chapter 2, Building Blocks of Neural Networks, discusses the knowledge of various building blocks of PyTorch, such as variables, tensors, and nn.module, and how they are used to develop neural networks.

Chapter 3, Diving Deep into Neural Networks, covers the different processes involved in training a neural network, such as the data preparation, data loaders for batching tensors, the torch.nn package for creating network architectures and the use of PyTorch loss functions and optimizers.

Chapter 4, Fundamentals of Machine Learning, covers different types of machine learning problems, along with challenges such as overfitting and underfitting. We also cover different techniques such as data augmentation, adding dropouts, and using batch normalization to prevent overfitting.

Chapter 5, Deep Learning for Computer Vision, explains the building blocks of Convolutional Neural Networks (CNNs), such as one-dimensional and two-dimensional convolutions, max pooling, average pooling, basic CNN architectures, transfer learning, and using pre-convoluted features to train faster.

Chapter 6, Deep Learning with Sequence Data and Text, covers word embeddings, how to use pretrained embeddings, RNN, LSTM, and one-dimensional convolutions for text classification on the IMDB dataset.

Chapter 7, Generative Networks, explains how to use deep learning to generate artistic images, new images with DCGAN, and text using language modeling.

Chapter 8, Modern Network Architectures, explores architectures such as ResNet, Inception, and DenseNet that power modern computer vision applications. We will have a quick introduction to encoder-decoder architectures that power modern systems such as language translations and image captioning.

Chapter 9, What Next?, looks into the summarizes what we have learned and looks at keeping yourself updated in the field of deep learning.

To get the most out of this book

All the chapters (except Chapter 1, Getting Started with Deep Learning Using PyTorch and Chapter 9, What Next) have associated Jupyter Notebooks in the book's GitHub repository. The imports required for the code to run may not be included in the text to save space. You should be able to run all of the code from the Notebooks.

The book focuses on practical illustrations, so run the Jupyter Notebooks as you read the chapters.

Access to a computer with a GPU will help run the code quickly. There are companies such as paperspace.com and www.crestle.com that abstract a lot of the complexity required to run deep learning algorithms.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packtpub.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Deep-Learning-with-PyTorch. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/DeepLearningwithPyTorch_ColorImages.pdf

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Getting Started with Deep Learning Using PyTorch

Deep learning (DL) has revolutionized industry after industry. It was once famously described by Andrew Ng on Twitter:

Artificial Intelligence is the new electricity!

Electricity transformed countless industries; artificial intelligence (AI) will now do the same.

AI and DL are used like synonyms, but there are substantial differences between the two. Let's demystify the terminology used in the industry so that you, as a practitioner, will be able to differentiate between signal and noise.

In this chapter, we will cover the following different parts of AI:

AI itself and its origination

Machine learning in the real world

Applications of

deep learning

Why deep learning now?

Deep learning framework: PyTorch

Artificial intelligence

Countless articles discussing AI are published every day. The trend has increased in the last two years. There are several definitions of AI floating around the web, my favorite being the automation of intellectual tasks normally performed by humans.

The history of AI

The term artificial intelligence was first coined by John McCarthy in 1956, when he held the first academic conference on the subject. The journey of the question of whether machines think or not started much earlier than that. In the early days of AI, machines were able to solve problems that were difficult for humans to solve.

For example, the Enigma machine was built at the end of World War II to be used in military communications. Alan Turing built an AI system that helped to crack the Enigma code. Cracking the Enigma code was a very challenging task for a human, and it could take weeks for an analyst to do. The AI machine was able to crack the code in hours.

Computers have a tough time solving problems that are intuitive to us, such as differentiating between dogs and cats, telling whether your friend is angry at you for arriving late at a party (emotions), differentiating between a truck and a car, taking notes during a seminar (speech recognition), or converting notes to another language for your friend who does not understand your language (for example, French to English). Most of these tasks are intuitive to us, but we were unable to program or hard code a computer to do these kinds of tasks. Most of the intelligence in early AI machines was hard coded, such as a computer program playing chess.

In the early years of AI, a lot of researchers believed that AI could be achieved by hard coding rules. This kind of AI is called symbolic AI and was useful in solving well-defined, logical problems, but it was almost incapable of solving complex problems such as image recognition, object detection, object segmentation, language translation, and natural-language-understanding tasks. Newer approaches to AI, such as machine learning and DL, were developed to solve these kinds of problems.

To better understand the relationships among AI, ML, and DL, let's visualize them as concentric circles with AI—the idea that came first (the largest), then machine learning—(which blossomed later), and finally DL—which is driving today’s AI explosion (fitting inside both):

How AI, machine learning, and DL fit together

Machine learning

Machine learning (ML) is a sub-field of AI and has become popular in the last 10 years and, at times, the two are used interchangeably. AI has a lot of other sub-fields aside from machine learning. ML systems are built by showing lots of examples, unlike symbolic AI, where we hard code rules to build the system. At a high level, machine learning systems look at tons of data and come up with rules to predict outcomes for unseen data:

Machine learning versus traditional programming

Most ML algorithms perform well on structured data, such as sales predictions, recommendation systems, and marketing personalization. An important factor for any ML algorithm is feature engineering and data scientists need to spend a lot of time to get the features right for ML algorithms to perform. In certain domains, such as computer vision and natural language processing (NLP), feature engineering is challenging as they suffer from high dimensionality.

Until recently, problems like this were challenging for organizations to solve using typical machine-learning techniques, such as linear regression, random forest, and so on, for reasons such as feature engineering and high dimensionality. Consider an image of size 224 x 224 x 3 (height x width x channels), where 3 in the image size represents values of red, green, and blue color channels in a color image. To store this image in computer memory, our matrix will contain 150,528 dimensions for a single image. Assume you want to build a classifier on top of 1,000 images of size 224 x 224 x 3, the dimensions will become 1,000 times 150,528. A special branch of machine learning called deep learning allows you to handle these problems using modern techniques and hardware.

Examples of machine learning in real life

The following are some cool products that are powered by machine learning:

Example 1

: Google Photos uses a specific form of machine learning called

deep learning for grouping photos

Example 2

: Recommendation systems, which are a family of ML algorithms, are used for recommending movies, music, and products by major companies such as Netflix, Amazon, and iTunes

Deep learning

Traditional ML algorithms use handwritten feature extraction to train algorithms, while DL algorithms use modern techniques to extract these features in an automatic fashion.

For example, a DL algorithm predicting whether an image contains a face or not extracts features such as the first layer detecting edges, the second layer detecting shapes such as noses and eyes, and the final layer detecting face shapes or more complex structures. Each layer trains based on the previous layer's representation of the data. It's OK if you find this explanation hard to understand, the later chapters of the book will help you to intuitively build and inspect such networks:

Visualizing the output of intermediate layers (Image source: https://www.cs.princeton.edu/~rajeshr/papers/cacm2011-researchHighlights-convDBN.pdf)

The use of DL has grown tremendously in the last few years with the rise of GPUs, big data, cloud providers such as Amazon Web Services (AWS) and Google Cloud, and frameworks such as Torch, TensorFlow, Caffe, and PyTorch. In addition to this, large companies share algorithms trained on huge datasets, thus helping startups to build state-of-the-art systems on several use cases with little effort.

Applications of deep learning

Some popular applications that were made possible using DL are as follows:

Near-human-level image classification

Near-human-level speech recognition

Machine translation

Autonomous cars

Siri, Google Voice, and Alexa have become more accurate in recent years

A Japanese farmer sorting cucumbers

Lung cancer detection

Language translation beating human-level accuracy

The following screenshot shows a short example of summarization, where the computer takes a large paragraph of text and summarizes it in a few lines:

Summary of a sample paragraph generated by computer

In the following image, a computer has been given a plain image without being told what it shows and, using object detection and some help from a dictionary, you get back an image caption stating two young girls are playing with lego toy. Isn't it brilliant?

Object detection and image captioning (Image source: https://cs.stanford.edu/people/karpathy/cvpr2015.pdf)

Hype associated with deep learning

People in the media and those outside the field of AI, or people who are not real practitioners of AI and DL, have been suggesting that things like the story line of the film Terminator 2: Judgement Day could become reality as AI/DL advances. Some of them even talk about a time in which we will become controlled by robots, where robots decide what is good for humanity. At present, the ability of AI is exaggerated far beyond its true capabilities. Currently, most DL systems are deployed in a very controlled environment and are given a limited decision boundary.

My guess is that when these systems can learn to make intelligent decisions, rather than merely completing pattern matching and, when hundreds or thousands of DL algorithms can work together, then maybe we can expect to see robots that could probably behave like the ones we see in science fiction movies. In reality, we are no closer to general artificial intelligence, where machines can do anything without being told to do so. The current state of DL is more about finding patterns from existing data to predict future outcomes. As DL practitioners, we need to differentiate between signal and noise.

The history of deep learning

Though deep learning has become popular in recent years, the theory behind deep learning has been evolving since the 1950s. The following table shows some of the most popular techniques used today in DL applications and their approximate timeline:

Techniques

Year

Neural networks

1943

Backpropogation

Early 1960s

Convolution Neural Networks

1979

Recurrent neural networks

1980

Long Short-Term Memory

1997

Deep learning has been given several names over the years. It was called cybernetics in the 1970s, connectionism in the 1980s, and now it is either known as deep learning or neural networks. We will use DL and neural networks interchangeably. Neural networks are often referred to as an algorithms inspired by the working of human brains. However, as practitioners of DL, we need to understand that it is majorly inspired and backed by strong theories in math (linear algebra and calculus), statistics (probability), and software engineering.

Why now?

Why has DL became so popular now? Some of the crucial reasons are as follows:

Hardware availability

Data and algorithms

Deep learning frameworks

Hardware availability

Deep learning requires complex mathematical operations to be performed on millions, sometimes billions, of parameters. Existing CPUs take a long time to perform these kinds of operations, although this has improved over the last several years. A new kind of hardware called a graphics processing unit (GPU) has completed these huge mathematical operations, such as matrix multiplications, orders of magnitude faster.

GPUs were initially built for the gaming industry by companies such as Nvidia and AMD. It turned out that this hardware is extremely efficient, not only for rendering high quality video games, but also to speed up the DL algorithms. One recent GPU from Nvidia, the 1080ti, takes a few days to build an image-classification system on top of an ImageNet dataset, which previously could have taken around a month.

If you are planning to buy hardware for running deep learning, I would recommend choosing a GPU from Nvidia based on your budget. Choose one with a good amount of memory. Remember, your computer memory and GPU memory are two different things. The 1080ti comes with 11 GB of memory and it costs around $700.

You can also use various cloud providers such as AWS, Google Cloud, or Floyd (this company offers GPU machines optimized for DL). Using a cloud provider is economical if you are just starting with DL or if you are setting up machines for organization usage where you may have more financial freedom.

Performance could vary if these systems are optimized.