31,19 €
Deep learning powers the most intelligent systems in the world, such as Google Voice, Siri, and Alexa. Advancements in powerful hardware, such as GPUs, software frameworks such as PyTorch, Keras, TensorFlow, and CNTK along with the availability of big data have made it easier to implement solutions to problems in the areas of text, vision, and advanced analytics.
This book will get you up and running with one of the most cutting-edge deep learning libraries—PyTorch. PyTorch is grabbing the attention of deep learning researchers and data science professionals due to its accessibility, efficiency and being more native to Python way of development. You'll start off by installing PyTorch, then quickly move on to learn various fundamental blocks that power modern deep learning. You will also learn how to use CNN, RNN, LSTM and other networks to solve real-world problems. This book explains the concepts of various state-of-the-art deep learning architectures, such as ResNet, DenseNet, Inception, and Seq2Seq, without diving deep into the math behind them. You will also learn about GPU computing during the course of the book. You will see how to train a model with PyTorch and dive into complex neural networks such as generative networks for producing text and images.
By the end of the book, you'll be able to implement deep learning applications in PyTorch with ease.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 264
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Veena PagareAcquisition Editor: Aman SinghContent Development Editor: Snehal KolteTechnical Editor: Sayli NikaljeCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Pratik ShirodkarGraphics: Tania DuttaProduction Coordinator: Deepika Naik
First published: February 2018
Production reference: 1210218
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78862-433-6
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
I have been working with Vishnu Subramanian for the last few years. Vishnu comes across as a passionate techno-analytical expert who has the rigor one requires to achieve excellence. His points of view on big data/machine learning/AI are well informed and carry his own analysis and appreciation of the landscape of problems and solutions. Having known him closely, I'm glad to be writing this foreword in my capacity as the CEO of Affine.
Increased success through deep learning solutions for our Fortune 500 clients clearly necessitates quick prototyping. PyTorch (a year-old deep learning framework) allows rapid prototyping for analytical projects without worrying too much about the complexity of the framework. This leads to an augmentation of the best of human capabilities with frameworks that can help deliver solutions faster. As an entrepreneur delivering advanced analytical solutions, building this capability in my teams happens to be the primary objective for me. In this book, Vishnu takes you through the fundamentals of building deep learning solutions using PyTorch while helping you build a mindset geared towards modern deep learning techniques.
The first half of the book introduces several fundamental building blocks of deep learning and PyTorch. It also covers key concepts such as overfitting, underfitting, and techniques that helps us deal with them.
In the second half of the book, Vishnu covers advanced concepts such as CNN, RNN, and LSTM transfer learning using pre-convoluted features, and one-dimensional convolutions, along with real-world examples of how these techniques can be applied. The last two chapters introduce you to modern deep learning architectures such as Inception, ResNet, DenseNet model and ensembling, and generative networks such as style transfer, GAN, and language modeling.
With all the practical examples covered and with solid explanations, this is one of the best books for readers who want to become proficient in deep learning. The rate at which technology evolves is unparalleled today. To a reader looking forward towards developing mature deep learning solutions, I would like to point that the right framework also drives the right mindset.
To all those reading through this book, happy exploring new horizons!
Wishing Vishnu and this book a roaring success, which they both deserve.
Manas Agarwal
CEO, Co-Founder of Affine Analytics,
Bengaluru, India
Vishnu Subramanian has experience in leading, architecting, and implementing several big data analytical projects (artificial intelligence, machine learning, and deep learning). He specializes in machine learning, deep learning, distributed machine learning, and visualization. He has experience in retail, finance, and travel. He is good at understanding and coordinating between businesses, AI, and engineering teams.
Poonam Ligade is a freelancer who specializes in big data tools such as Spark, Flink, and Cassandra, as well as scalable machine learning and deep learning. She is also a top kaggle kernel writer.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Deep Learning with PyTorch
Dedication
Packt Upsell
Why subscribe?
PacktPub.com
Foreword
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Getting Started with Deep Learning Using PyTorch
Artificial intelligence
The history of AI
Machine learning
Examples of machine learning in real life
Deep learning
Applications of deep learning
Hype associated with deep learning
The history of deep learning
Why now?
Hardware availability
Data and algorithms
Deep learning frameworks
PyTorch
Summary
Building Blocks of Neural Networks
Installing PyTorch
Our first neural network
Data preparation
Scalar (0-D tensors)
Vectors (1-D tensors)
Matrix (2-D tensors)
3-D tensors
Slicing tensors
4-D tensors
5-D tensors
Tensors on GPU
Variables
Creating data for our neural network
Creating learnable parameters
Neural network model
Network implementation
Loss function
Optimize the neural network
Loading data
Dataset class
DataLoader class
Summary
Diving Deep into Neural Networks
Deep dive into the building blocks of neural networks
Layers – fundamental blocks of neural networks
Non-linear activations
Sigmoid
Tanh
ReLU
Leaky ReLU
PyTorch non-linear activations
The PyTorch way of building deep learning algorithms
Model architecture for different machine learning problems
Loss functions
Optimizing network architecture
Image classification using deep learning
Loading data into PyTorch tensors
Loading PyTorch tensors as batches
Building the network architecture
Training the model
Summary
Fundamentals of Machine Learning
Three kinds of machine learning problems
Supervised learning
Unsupervised learning
Reinforcement learning
Machine learning glossary
Evaluating machine learning models
Training, validation, and test split
Simple holdout validation
K-fold validation
K-fold validation with shuffling
Data representativeness
Time sensitivity
Data redundancy
Data preprocessing and feature engineering
Vectorization
Value normalization
Handling missing values
Feature engineering
Overfitting and underfitting
Getting more data
Reducing the size of the network
Applying weight regularization
Dropout
Underfitting
Workflow of a machine learning project
Problem definition and dataset creation
Measure of success
Evaluation protocol
Prepare your data
Baseline model
Large model enough to overfit
Applying regularization
Learning rate picking strategies
Summary
Deep Learning for Computer Vision
Introduction to neural networks
MNIST – getting data
Building a CNN model from scratch
Conv2d
Pooling
Nonlinear activation – ReLU
View
Linear layer
Training the model
Classifying dogs and cats – CNN from scratch
Classifying dogs and cats using transfer learning
Creating and exploring a VGG16 model
Freezing the layers
Fine-tuning VGG16
Training the VGG16 model
Calculating pre-convoluted features
Understanding what a CNN model learns
Visualizing outputs from intermediate layers
Visualizing weights of the CNN layer
Summary
Deep Learning with Sequence Data and Text
Working with text data
Tokenization
Converting text into characters
Converting text into words
N-gram representation
Vectorization
One-hot encoding
Word embedding
Training word embedding by building a sentiment classifier
Downloading IMDB data and performing text tokenization
torchtext.data
torchtext.datasets
Building vocabulary
Generate batches of vectors
Creating a network model with embedding
Training the model
Using pretrained word embeddings
Downloading the embeddings
Loading the embeddings in the model
Freeze the embedding layer weights
Recursive neural networks
Understanding how RNN works with an example
LSTM
Long-term dependency
LSTM networks
Preparing the data
Creating batches
Creating the network
Training the model
Convolutional network on sequence data
Understanding one-dimensional convolution for sequence data
Creating the network
Training the model
Summary
Generative Networks
Neural style transfer
Loading the data
Creating the VGG model
Content loss
Style loss
Extracting the losses
Creating loss function for each layers
Creating the optimizer
Training
Generative adversarial networks
Deep convolutional GAN
Defining the generator network
Transposed convolutions
Batch normalization
Generator
Defining the discriminator network
Defining loss and optimizer
Training the discriminator
Training the discriminator with real images
Training the discriminator with fake images
Training the generator network
Training the complete network
Inspecting the generated images
Language modeling
Preparing the data
Generating the batches
Batches
Backpropagation through time
Defining a model based on LSTM
Defining the train and evaluate functions
Training the model
Summary
Modern Network Architectures
Modern network architectures
ResNet
Creating PyTorch datasets
Creating loaders for training and validation
Creating a ResNet model
Extracting convolutional features
Creating a custom PyTorch dataset class for the pre-convoluted features and loader
Creating a simple linear model
Training and validating the model
Inception
Creating an Inception model
Extracting convolutional features using register_forward_hook
Creating a new dataset for the convoluted features
Creating a fully connected model
Training and validating the model
Densely connected convolutional networks – DenseNet
DenseBlock
DenseLayer
Creating a DenseNet model
Extracting DenseNet features
Creating a dataset and loaders
Creating a fully connected model and train
Model ensembling
Creating models
Extracting the image features
Creating a custom dataset along with data loaders
Creating an ensembling model
Training and validating the model
Encoder-decoder architecture
Encoder
Decoder
Summary
What Next?
What next?
Overview
Interesting ideas to explore
Object detection
Image segmentation
OpenNMT in PyTorch
Alien NLP
fast.ai – making neural nets uncool again
Open Neural Network Exchange
How to keep yourself updated
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
PyTorch is grabbing the attention of data science professionals and deep learning practitioners due to its flexibility and ease of use. This book introduces the fundamental building blocks of deep learning and PyTorch. It demonstrates how to solve real-world problems using a practical approach. You will also learn some of the modern architectures and techniques that are used to crack some cutting-edge research problems.
This book provides the intuition behind various state-of-the-art deep learning architectures, such as ResNet, DenseNet, Inception, and Seq2Seq, without diving deep into the math. It also shows how to do transfer learning, how to speed up transfer learning using pre-computed features, and how to do text classification using embeddings, pretrained embeddings, LSTM, and one-dimensional convolutions.
By the end of the book, you will be a proficient deep learning practitioner who will be able to solve some business problems using the different techniques learned here.
This book is for engineers, data analysts, and data scientists, interested in deep learning, and those looking to explore and implement advanced algorithms with PyTorch. Knowledge of machine learning is helpful but not mandatory. Knowledge of Python programming is expected.
Chapter 1, Getting Started with Deep Learning Using PyTorch, goes over the history of artificial intelligence (AI) and machine learning and looks at the recent growth of deep learning. We will also cover how various improvements in hardware and algorithms triggered huge success in the implementation of deep learning across different applications. Finally, we will introduce the beautiful PyTorch Python library, built on top of Torch by Facebook.
Chapter 2, Building Blocks of Neural Networks, discusses the knowledge of various building blocks of PyTorch, such as variables, tensors, and nn.module, and how they are used to develop neural networks.
Chapter 3, Diving Deep into Neural Networks, covers the different processes involved in training a neural network, such as the data preparation, data loaders for batching tensors, the torch.nn package for creating network architectures and the use of PyTorch loss functions and optimizers.
Chapter 4, Fundamentals of Machine Learning, covers different types of machine learning problems, along with challenges such as overfitting and underfitting. We also cover different techniques such as data augmentation, adding dropouts, and using batch normalization to prevent overfitting.
Chapter 5, Deep Learning for Computer Vision, explains the building blocks of Convolutional Neural Networks (CNNs), such as one-dimensional and two-dimensional convolutions, max pooling, average pooling, basic CNN architectures, transfer learning, and using pre-convoluted features to train faster.
Chapter 6, Deep Learning with Sequence Data and Text, covers word embeddings, how to use pretrained embeddings, RNN, LSTM, and one-dimensional convolutions for text classification on the IMDB dataset.
Chapter 7, Generative Networks, explains how to use deep learning to generate artistic images, new images with DCGAN, and text using language modeling.
Chapter 8, Modern Network Architectures, explores architectures such as ResNet, Inception, and DenseNet that power modern computer vision applications. We will have a quick introduction to encoder-decoder architectures that power modern systems such as language translations and image captioning.
Chapter 9, What Next?, looks into the summarizes what we have learned and looks at keeping yourself updated in the field of deep learning.
All the chapters (except Chapter 1, Getting Started with Deep Learning Using PyTorch and Chapter 9, What Next) have associated Jupyter Notebooks in the book's GitHub repository. The imports required for the code to run may not be included in the text to save space. You should be able to run all of the code from the Notebooks.
The book focuses on practical illustrations, so run the Jupyter Notebooks as you read the chapters.
Access to a computer with a GPU will help run the code quickly. There are companies such as paperspace.com and www.crestle.com that abstract a lot of the complexity required to run deep learning algorithms.
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packtpub.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Deep-Learning-with-PyTorch. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/DeepLearningwithPyTorch_ColorImages.pdf
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
Deep learning (DL) has revolutionized industry after industry. It was once famously described by Andrew Ng on Twitter:
Electricity transformed countless industries; artificial intelligence (AI) will now do the same.
AI and DL are used like synonyms, but there are substantial differences between the two. Let's demystify the terminology used in the industry so that you, as a practitioner, will be able to differentiate between signal and noise.
In this chapter, we will cover the following different parts of AI:
AI itself and its origination
Machine learning in the real world
Applications of
deep learning
Why deep learning now?
Deep learning framework: PyTorch
Countless articles discussing AI are published every day. The trend has increased in the last two years. There are several definitions of AI floating around the web, my favorite being the automation of intellectual tasks normally performed by humans.
The term artificial intelligence was first coined by John McCarthy in 1956, when he held the first academic conference on the subject. The journey of the question of whether machines think or not started much earlier than that. In the early days of AI, machines were able to solve problems that were difficult for humans to solve.
For example, the Enigma machine was built at the end of World War II to be used in military communications. Alan Turing built an AI system that helped to crack the Enigma code. Cracking the Enigma code was a very challenging task for a human, and it could take weeks for an analyst to do. The AI machine was able to crack the code in hours.
Computers have a tough time solving problems that are intuitive to us, such as differentiating between dogs and cats, telling whether your friend is angry at you for arriving late at a party (emotions), differentiating between a truck and a car, taking notes during a seminar (speech recognition), or converting notes to another language for your friend who does not understand your language (for example, French to English). Most of these tasks are intuitive to us, but we were unable to program or hard code a computer to do these kinds of tasks. Most of the intelligence in early AI machines was hard coded, such as a computer program playing chess.
In the early years of AI, a lot of researchers believed that AI could be achieved by hard coding rules. This kind of AI is called symbolic AI and was useful in solving well-defined, logical problems, but it was almost incapable of solving complex problems such as image recognition, object detection, object segmentation, language translation, and natural-language-understanding tasks. Newer approaches to AI, such as machine learning and DL, were developed to solve these kinds of problems.
To better understand the relationships among AI, ML, and DL, let's visualize them as concentric circles with AI—the idea that came first (the largest), then machine learning—(which blossomed later), and finally DL—which is driving today’s AI explosion (fitting inside both):
Machine learning (ML) is a sub-field of AI and has become popular in the last 10 years and, at times, the two are used interchangeably. AI has a lot of other sub-fields aside from machine learning. ML systems are built by showing lots of examples, unlike symbolic AI, where we hard code rules to build the system. At a high level, machine learning systems look at tons of data and come up with rules to predict outcomes for unseen data:
Most ML algorithms perform well on structured data, such as sales predictions, recommendation systems, and marketing personalization. An important factor for any ML algorithm is feature engineering and data scientists need to spend a lot of time to get the features right for ML algorithms to perform. In certain domains, such as computer vision and natural language processing (NLP), feature engineering is challenging as they suffer from high dimensionality.
Until recently, problems like this were challenging for organizations to solve using typical machine-learning techniques, such as linear regression, random forest, and so on, for reasons such as feature engineering and high dimensionality. Consider an image of size 224 x 224 x 3 (height x width x channels), where 3 in the image size represents values of red, green, and blue color channels in a color image. To store this image in computer memory, our matrix will contain 150,528 dimensions for a single image. Assume you want to build a classifier on top of 1,000 images of size 224 x 224 x 3, the dimensions will become 1,000 times 150,528. A special branch of machine learning called deep learning allows you to handle these problems using modern techniques and hardware.
The following are some cool products that are powered by machine learning:
Example 1
: Google Photos uses a specific form of machine learning called
deep learning for grouping photos
Example 2
: Recommendation systems, which are a family of ML algorithms, are used for recommending movies, music, and products by major companies such as Netflix, Amazon, and iTunes
Traditional ML algorithms use handwritten feature extraction to train algorithms, while DL algorithms use modern techniques to extract these features in an automatic fashion.
For example, a DL algorithm predicting whether an image contains a face or not extracts features such as the first layer detecting edges, the second layer detecting shapes such as noses and eyes, and the final layer detecting face shapes or more complex structures. Each layer trains based on the previous layer's representation of the data. It's OK if you find this explanation hard to understand, the later chapters of the book will help you to intuitively build and inspect such networks:
The use of DL has grown tremendously in the last few years with the rise of GPUs, big data, cloud providers such as Amazon Web Services (AWS) and Google Cloud, and frameworks such as Torch, TensorFlow, Caffe, and PyTorch. In addition to this, large companies share algorithms trained on huge datasets, thus helping startups to build state-of-the-art systems on several use cases with little effort.
Some popular applications that were made possible using DL are as follows:
Near-human-level image classification
Near-human-level speech recognition
Machine translation
Autonomous cars
Siri, Google Voice, and Alexa have become more accurate in recent years
A Japanese farmer sorting cucumbers
Lung cancer detection
Language translation beating human-level accuracy
The following screenshot shows a short example of summarization, where the computer takes a large paragraph of text and summarizes it in a few lines:
In the following image, a computer has been given a plain image without being told what it shows and, using object detection and some help from a dictionary, you get back an image caption stating two young girls are playing with lego toy. Isn't it brilliant?
People in the media and those outside the field of AI, or people who are not real practitioners of AI and DL, have been suggesting that things like the story line of the film Terminator 2: Judgement Day could become reality as AI/DL advances. Some of them even talk about a time in which we will become controlled by robots, where robots decide what is good for humanity. At present, the ability of AI is exaggerated far beyond its true capabilities. Currently, most DL systems are deployed in a very controlled environment and are given a limited decision boundary.
My guess is that when these systems can learn to make intelligent decisions, rather than merely completing pattern matching and, when hundreds or thousands of DL algorithms can work together, then maybe we can expect to see robots that could probably behave like the ones we see in science fiction movies. In reality, we are no closer to general artificial intelligence, where machines can do anything without being told to do so. The current state of DL is more about finding patterns from existing data to predict future outcomes. As DL practitioners, we need to differentiate between signal and noise.
Though deep learning has become popular in recent years, the theory behind deep learning has been evolving since the 1950s. The following table shows some of the most popular techniques used today in DL applications and their approximate timeline:
Techniques
Year
Neural networks
1943
Backpropogation
Early 1960s
Convolution Neural Networks
1979
Recurrent neural networks
1980
Long Short-Term Memory
1997
Deep learning has been given several names over the years. It was called cybernetics in the 1970s, connectionism in the 1980s, and now it is either known as deep learning or neural networks. We will use DL and neural networks interchangeably. Neural networks are often referred to as an algorithms inspired by the working of human brains. However, as practitioners of DL, we need to understand that it is majorly inspired and backed by strong theories in math (linear algebra and calculus), statistics (probability), and software engineering.
Why has DL became so popular now? Some of the crucial reasons are as follows:
Hardware availability
Data and algorithms
Deep learning frameworks
Deep learning requires complex mathematical operations to be performed on millions, sometimes billions, of parameters. Existing CPUs take a long time to perform these kinds of operations, although this has improved over the last several years. A new kind of hardware called a graphics processing unit (GPU) has completed these huge mathematical operations, such as matrix multiplications, orders of magnitude faster.
GPUs were initially built for the gaming industry by companies such as Nvidia and AMD. It turned out that this hardware is extremely efficient, not only for rendering high quality video games, but also to speed up the DL algorithms. One recent GPU from Nvidia, the 1080ti, takes a few days to build an image-classification system on top of an ImageNet dataset, which previously could have taken around a month.
If you are planning to buy hardware for running deep learning, I would recommend choosing a GPU from Nvidia based on your budget. Choose one with a good amount of memory. Remember, your computer memory and GPU memory are two different things. The 1080ti comes with 11 GB of memory and it costs around $700.
You can also use various cloud providers such as AWS, Google Cloud, or Floyd (this company offers GPU machines optimized for DL). Using a cloud provider is economical if you are just starting with DL or if you are setting up machines for organization usage where you may have more financial freedom.