36,59 €
Apply deep learning techniques and neural network methodologies to build, train, and optimize generative network models
Key Features
Book Description
With continuously evolving research and development, Generative Adversarial Networks (GANs) are the next big thing in the field of deep learning. This book highlights the key improvements in GANs over generative models and guides in making the best out of GANs with the help of hands-on examples.
This book starts by taking you through the core concepts necessary to understand how each component of a GAN model works. You'll build your first GAN model to understand how generator and discriminator networks function. As you advance, you'll delve into a range of examples and datasets to build a variety of GAN networks using PyTorch functionalities and services, and become well-versed with architectures, training strategies, and evaluation methods for image generation, translation, and restoration. You'll even learn how to apply GAN models to solve problems in areas such as computer vision, multimedia, 3D models, and natural language processing (NLP). The book covers how to overcome the challenges faced while building generative models from scratch. Finally, you'll also discover how to train your GAN models to generate adversarial examples to attack other CNN and GAN models.
By the end of this book, you will have learned how to build, train, and optimize next-generation GAN models and use them to solve a variety of real-world problems.
What you will learn
Who this book is for
This GAN book is for machine learning practitioners and deep learning researchers looking to get hands-on guidance in implementing GAN models using PyTorch. You'll become familiar with state-of-the-art GAN architectures with the help of real-world examples. Working knowledge of Python programming language is necessary to grasp the concepts covered in this book.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 321
Veröffentlichungsjahr: 2019
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor:Sunith ShettyAcquisition Editor:Devika BattikeContent Development Editor:Roshan KumarSenior Editor: Jack CummingsTechnical Editor: Dinesh ChaudharyCopy Editor: Safis EditingProject Coordinator:Aishwarya MohanProofreader: Safis EditingIndexer:Tejal Daruwale SoniProduction Designer:Deepika Naik
First published: December 2019
Production reference: 1111219
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78953-051-3
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
John Hany received his master's degree and bachelor's degree in calculational mathematics at the University of Electronic Science and Technology of China. He majors in pattern recognition and has years of experience in machine learning and computer vision. He has taken part in several practical projects, including intelligent transport systems and facial recognition systems. His current research interests lie in reducing the computation costs of deep neural networks while improving their performance on image classification and detection tasks. He is enthusiastic about open source projects and has contributed to many of them.
Greg Walters has been involved with computers and computer programming since 1972. He is well-versed in Visual Basic, Visual Basic .NET, Python, and SQL and is an accomplished user of MySQL, SQLite, Microsoft SQL Server, Oracle, C++, Delphi, Modula-2, Pascal, C, 80x86 Assembler, COBOL, and Fortran. He is a programming trainer and has trained numerous people on many pieces of computer software, including MySQL, Open Database Connectivity, Quattro Pro, Corel Draw!, Paradox, Microsoft Word, Excel, DOS, Windows 3.11, Windows for Workgroups, Windows 95, Windows NT, Windows 2000, Windows XP, and Linux. He is semi-retired and has written over 100 articles for Full Circle Magazine. He is also a musician and loves to cook. He is open to working as a freelancer on various projects.
Sarit Ritwiruneis did a BSc. in Physics and an MSc. in Computer Science at Mahidol University. Proven tracks of projects are mainly POS (Point of Sales), IoT. Sarit recently started to get back into computer science research after doing some Python programming while being a senior software engineer. Sarit currently works in a small office of a big insurance company in Thailand. Sarit dreams about intelligent and friendly websites using A.I.
Sandeep Singh Kushwaha has a master’s degree from the Indian Institute of Technology, Kanpur, and he currently works with Aegon as Assistant Vice President, Analytics. As part of the Center of Excellence in Aegon, he is driving Analytics, ML/AI, Digitization, and Innovation for several countries in Europe and Asia. He is passionate about data science, ML/AI, InsurTech and MarTech, and has developed many AI solutions using deep learning algorithms to accelerate businesses. As a real-life problem solver using Machine learning, Sandeep also echoes that GAN is the coolest idea in Machine Learning in the last twenty years, and he believes that readers are going to learn a lot from this book to develop AI solutions using GANs.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On Generative Adversarial Networks with PyTorch 1.x
About Packt
Why subscribe?
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Introduction to GANs and PyTorch
Generative Adversarial Networks Fundamentals
Fundamentals of machine learning
Machine learning – classification and generation
Introducing adversarial learning
Generator and discriminator networks
Mathematical background of GANs
Using NumPy to train a sine signal generator
Designing the network architectures
Defining activation functions and the loss function
Working on forward pass and backpropagation
Training our GAN model
What GAN we do?
Image processing
Image synthesis
Image translation
Video synthesis and translation
NLP
3D modeling
Summary
References and useful reading list
Getting Started with PyTorch 1.3
What's new in PyTorch 1.3?
Easy switching from eager mode to graph mode
The C++ frontend
The redesigned distributed library
Better research reproducibility
Miscellaneous
The PyTorch ecosystem
Cloud support
Migrating your previous code to 1.x
CUDA – GPU acceleration for fast training and evaluation
Installing NVIDIA driver
Installing CUDA
Installing cuDNN
Evaluating your CUDA installation
Installing PyTorch on Windows and Linux
Setting up the Python environment
Installing Python
Installing Anaconda Python
Prerequisites before we move on
Installing PyTorch
Installing official binaries
Building Pytorch from source
Evaluating your PyTorch installation
Bonus: setting up VS Code for Python coding
Configuring VS Code for Python development
Recommended VS Code extensions
References and useful reading list
Summary
Best Practices for Model Design and Training
Model design cheat sheet
Overall model architecture design
Choosing a convolution operation method
Choosing a downsampling operation method
More on model design
Model training cheat sheet
Parameter initialization
Adjusting the loss function
Choosing an optimization method
Adjusting the learning rate
Gradient clipping, weight clipping, and more
Efficient coding in Python
Reinventing the wheel wisely
Advice for beginners in deep learning
Summary 
Section 2: Typical GAN Models for Image Synthesis
Building Your First GAN with PyTorch
Introduction to Deep Convolutional GANs
The architecture of generator
The architecture of a discriminator
Creating a DCGAN with PyTorch
Generator network
Discriminator network
Model training and evaluation
Training iteration
Visualizing generated samples
Checking GPU usage information
Moving to larger datasets
Generating human faces from the CelebA dataset
Generating bedroom photos from the LSUN dataset
Having fun with the generator network
Image interpolation
Semantic vector arithmetic
Summary
References and useful reading list
Generating Images Based on Label Information
CGANs – how are labels used?
Combining labels with the generator
Integrating labels into the discriminator
Generating images from labels with the CGAN
One-stop model training API
Argument parsing and model training
Working with Fashion-MNIST
InfoGAN – unsupervised attribute extraction
Network definitions of InfoGAN
Training and evaluation of InfoGAN
References and useful reading list
Summary
Image-to-Image Translation and Its Applications
Using pixel-wise labels to translate images with pix2pix
Generator architecture
Discriminator architecture
Training and evaluation of pix2pix
Pix2pixHD – high-resolution image translation
Model architecture
Model training
CycleGAN – image-to-image translation from unpaired collections
Cycle consistency-based model design
Model training and evaluation
Summary
Furthering reading
Image Restoration with GANs
Image super-resolution with SRGAN
Creating a generator
Creating the discriminator
Defining training loss
Training SRGAN to generate high-resolution images
Generative image inpainting
Efficient convolution – from im2col to nn.Unfold
WGAN – understanding the Wasserstein distance
Analyzing the problems with vanilla GAN loss
The advantages of Wasserstein distance
Training GAN for image inpainting
Model design for image inpainting
Implementation of Wasserstein loss
Summary
Useful reading list and references
Training Your GANs to Break Different Models
Adversarial examples – attacking deep learning models
What are adversarial examples and how are they created?
Adversarial attacking with PyTorch
Generative adversarial examples
Preparing an ensemble classifier for Kaggle's Cats vs. Dogs
Breaking the classifier with advGAN
Summary
References and further reading list
Image Generation from Description Text
Text-to-image synthesis with GANs
Quick introduction to word embedding
Translating text to image with zero-shot transfer learning
Zero-shot learning
GAN architecture and training
Generating photo-realistic images with StackGAN++
High-resolution text-to-image synthesis with StackGAN
From StackGAN to StackGAN++
Training StackGAN++ to generate images with better quality
Summary 
Further reading
Sequence Synthesis with GANs
Text generation via SeqGAN – teaching GANs how to tell jokes
Design of SeqGAN – GAN, LSTM, and RL
A quick introduction to RNN and LSTM
Reinforcement learning versus supervised learning
Architecture of SeqGAN
Creating your own vocabulary for training
Speech quality enhancement with SEGAN
SEGAN architecture
Training SEGAN to enhance speech quality
Summary
Further reading
Reconstructing 3D models with GANs
Fundamental concepts in computer graphics
Representation of 3D objects
Attributes of a 3D object
Camera and projection
Designing GANs for 3D data synthesis
Generators and discriminators in 3D-GAN
Training 3D-GAN
Summary
Further reading
Other Books You May Enjoy
Leave a review - let other readers know what you think
With continuously evolving research and development, Generative Adversarial Networks (GANs) are the next big thing in the field of deep learning. This book highlights the key improvements in GANs over traditional generative models and shows you how to make the best out of GANs with the help of hands-on examples.
This book will help you understand how GAN architecture works using PyTorch. You will get familiar with the most flexible deep learning toolkit and use it to transform ideas into actual working code. You will apply GAN models to areas such as computer vision, multimedia, and natural language processing using a sample-generation methodology.
This book is for machine learning practitioners and deep learning researchers looking to get hands-on guidance on implementing GAN models using PyTorch 1.0. You'll become familiar with state-of-the-art GAN architectures with the help of real-world examples. Working knowledge of the Python programming language is necessary to grasp the concepts covered in this book.
Chapter 1, Generative Adversarial Networks Fundamentals, exploits the new features of PyTorch. You will also learn how to build a simple GAN with NumPy to generate sine signals.
Chapter 2, Getting Started with PyTorch 1.3, introduces how to install CUDA in order to take advantage of the GPU for faster training and evaluation. We will also look into the step-by-step installation process of PyTorch on Windows and Ubuntu and build PyTorch from source.
Chapter 3, Best Practices in Model Design and Training, looks at the overall design of the model architecture and the steps that need to be followed to choose the required convolutional operation.
Chapter 4, Building Your First GAN with PyTorch, introduces you to a classic and well-performing GAN model, called DCGAN, for generating 2D images. You will also be introduced to the architecture of DCGANs and learn how to train and evaluate them. Following this, you will learn how to use a DCGAN to generate hand-written digits and human faces, and take a look at adversarial learning with autoencoders. You will also be shown how to efficiently organize your source code for easy adjustments and extensions.
Chapter 5, Generating Images Based on Label Information, shows how to use a CGAN to generate images based on a given label and how to implement adversarial learning with autoencoders.
Chapter 6, Image-to-Image Translation and Its Applications, shows how to use pixel-wise label information to perform image-to-image translation with pix2pix and how to translate high-resolution images with pix2pixHD. You will also learn how to flexibly design model architectures to accomplish your goals, including generating larger images and transferring textures between different types of images.
Chapter 7, Image Restoration with GANs, shows you to how to perform image super-resolution with SRGAN to generate high-resolution images from low-resolution ones and how to use a data prefetcher to speed up data loading and increase your GPU's efficiency during training. You will also learn how to train a GAN model to perform image inpainting and fill in the missing parts of an image.
Chapter 8, Training Your GANs to Break Different Models, looks into the fundamentals of adversarial examples and how to attack and confuse a CNN model with FGSM (Fast Gradient Sign Method). After this, we will look at how to use an accimage library to speed up your image loading even more and train a GAN model to generate adversarial examples and fool the image classifier.
Chapter 9, Image Generation from Description Text, provides basic knowledge on word embeddings and how are they used in the NLP field. You will also learn how to design a text-to-image GAN model to generate images based on one sentence of description text.
Chapter 10, Sequence Synthesis with GANs, covers commonly used techniques in the NLP field, such as RNN and LSTM. You will also learn some of the basic concepts of reinforcement learning and see how it differ from supervised learning (such as SGD-based CNNs). You will also learn how to use SEGAN to remove background noise and enhance the quality of speech audio.
Chapter 11, Reconstructing 3D Models with GANs, shows how 3D objects are represented in computer graphics (CG). We will also look at the fundamental concepts of CG, including camera and projection matrices. You will then learn how to construct a 3D-GAN model with 3D convolutions and train it to generate 3D objects.
You should have basic knowledge of Python and PyTorch.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
Support
tab.
Click on
Code Downloads
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Generative-Adversarial-Networks-with-PyTorch-1.x. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789530513_ColorImages.pdf.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
In this section, you will be introduced to the basic concepts of GANs, how to install PyTorch 1.0, and how you can build your own models with PyTorch.
This section contains the following chapters:
Chapter 1
,
Generative Adversarial Networks Fundamentals
Chapter 2
,
Getting Started with PyTorch 1.3
Chapter 3
,
Best Practices in Model Design and Training
Generative Adversarial Networks (GANs) have brought about a revolutionary storm in the machine learning(ML) community. They, to some extent, have changed the way people solve practical problems in Computer Vision (CV) and Natural Language Processing (NLP). Before we dive right into the storm, let's prepare you with the fundamental insights of GANs.
In this chapter, you will understand the idea behind adversarial learning and the basic components of a GAN model. You will also get a brief understanding on how GANs work and how it can be built with NumPy.
Before we start exploiting the new features in PyTorch, we will first learn to build a simple GAN with NumPy to generate sine signals so that you may have a profound understanding of the mechanism beneath GANs. By the end of this chapter, you may relax a little as we walk you through many showcases about how GANs are used to address practical problems in CV and NLP fields.
The following topics will be covered in this chapter:
Fundamentals of machine learning
Generator and discriminator networks
What GAN we do?
References and a useful reading list
To introduce how GANs work, let's use an analogy:
ML is the study of recognizing patterns from data without hardcoded rules given by humans. The recognizing of patterns (Pattern Recognition or PR) is the automatic discovering of the similarities and differences among raw data, which is an essential way to realize Artificial Intelligence (AI) that only exists in novels and movies. Although it is hard to tell when exactly real AI will come to birth in the future, the development of ML has given us much confidence in recent years. ML has already been vastly used in many fields, such as CV, NLP, recommendation systems, Intelligent Transportation Systems (ITS), medical diagnoses, robotics, and advertising.
A ML model is typically described as a system that takes in data and gives certain outputs based on the parameters it contains. The learning of the model is actually adjusting the parameters to get better outputs. As illustrated in the following diagram, we feed training data into the model and get a certain output. We then use one or several criteria to measure the output, to tell how well our model performs. In this step, a set of desired outputs (or ground truth) with respect to the training data would be very helpful. If ground truth data is used in training, this process is often called supervised learning. If not, it is often regarded as unsupervised learning.
We constantly adjust the model's parameters based on its performance (in other words, whether it gives us the results we want) so that it yields better results in the future. This process is called model training. The training of a model takes as long as it pleases us. Typically, we stop the training after a certain number of iterations or when the performance is good enough. When the training process has finished, we apply the trained model to predict on new data (testing data). This process is called model testing. Sometimes, people use different sets of data for training and testing to see how well the model performs on samples it never meets, which is called the generalization capability. Sometimes an additional step called modelevaluation is involved, when the parameters of the model are so complicated that we need another set of data to see whether our model or training process has been designed well:
What types of problemsthis model can solve is essentially determined by the types of input and output data we want. For example, a classification model takes an input of any number of dimensions (audio, text, image, or video) and gives a 1-dimension output (single values indicating the predicted labels). A generative model typically takes a 1-dimension input (a latent vector) and generates high-dimension outputs (images, videos, or 3D models). It maps low-dimensional data to high-dimensional data, at the same time, trying to make the output samples look as convincing as possible. However, it is worth pointing out that we'll meet generative models that don't obey this rule in the future chapters. Until Chapter 5, Generating Images Based on Label Information, it's a simple rule to bear in mind.
Traditionally, generative problems are solved by statistics-based methods such as a Boltzmann machine, Markov chain, or variational encoder. As mathematically profound as they are, the generated samples are as of yet far from perfect. A classification model maps high-dimension data to low-dimension, while a generative model often maps low-dimension data to high-dimension ones. People in both fields have been working hard to improve their models. Let's look back to the little made-up opening story. Can we get the two different models to work against each other and improve themselves at the same time? If we take the output of a generative model as the input of the classification model, we can measure the performance of the generative model (the armor) with the classification model (the sword). At the same time, we can improve the classification model (the sword) by feeding generated samples (the armor) along with real samples, since we can agree that more data is often better for the training of ML models.
The training process where the two models try to weaken each other and, as a result, improve each other is called adversarial learning. As demonstrated in the following diagram, the models, A and B, have totally opposite agendas (for example, classification and generation). However, during each step of the training, the output of Model A improves Model B, and the output of Model B improves Model A:
GANs are designed based on this very idea, which was proposed by Goodfellow, Pouget-Abadie, Mirza, et al in 2014. Now, GANs have become the most thriving and popular method to synthesize audio, text, images, video, and 3D models in the ML community. In this book, we will walk you through the basic components and mechanisms of different types of GANs and learn how to use them to address various practical problems. In the next section, we will introduce the basic structure of GANs to show you how and why they work so well.
Here, we will show you the basic components of GANs and explain how they work with/against each other to achieve our goal to generate realistic samples. A typical structure of a GAN is shown in the following diagram. It contains two different networks: a generator network and a discriminator network. The generator network typically takes random noises as input and generates fake samples. Our goal is to let the fake samples be as close to the real samples as possible. That's where the discriminator comes in. The discriminator is, in fact, a classification network, whose job is to tell whether a given sample is fake or real. The generator tries its best to trick and confuse the discriminator to make the wrong decision, while the discriminator tries its best to distinguish the fake samples from the real ones.
In this process, the differences between fake and real samples are used to improve the generator. Therefore, the generator gets better at generating realistic-looking samples while the discriminator gets better at picking them out. Since real samples are used to train the discriminator, the training process is therefore supervised. Even though the generator always gives fake samples without the knowledge of ground truth, the overall training of GAN is still supervised:
Let's take a look at the math behind this process to get a better understanding of the mechanism. Let and represent the generator and discriminator networks, respectively. Let represent the performance criterion of the system. The optimization objective is described as follows:
In this equation, is the real sample, is the generated sample, and is the random noise that uses to generate fake samples. is the expectation over , which means the average value of any function, , over all samples.
As mentioned before, the goal of the discriminator, , is to maximize the prediction confidence of real samples. Therefore, needs to be trained with gradient ascent (the operator in the objective). The update rule for, , is as follows:
In this formula, is the parameter of (such as convolution kernels and weights in fully-connected layers), is the size of the mini batch (or batch size for short), and is the index of the sample in the mini-batch. Here, we assume that we are using mini-batches to feed the training data, which is fairly reasonable since it's the most commonly used and empirically effective strategy. Therefore, the gradients need to be averaged over samples.
The goal of the generator network, , is to fool the discriminator, , and let believe that the generated samples are real. Therefore, the training of is to maximize or minimize . Therefore, needs to be trained with gradient descent(the operator in the objective). The update rule for is as follows:
In this formula, is the parameters of , is the size of the mini-batch, and is the index of the sample in the mini-batch.
Maybe math is even more confusing than a big chunk of code to some. Now, let's look at some code to digest the equations we've thrown at you. Here, we will use Python to implement a very simple adversarial learning example to generate sine (sin) signals.
The architecture of the generator network is described in the following diagram. It takes a 1-dimension random value as input and gives a 10-dimension vector as output. It has 2 hidden layers with each containing 10 neurons. The calculation in each layer is a matrix multiplication. Therefore, the network is, in fact, a Multilayer Perceptron (MLP):
The architecture of the discriminator network is described in the following diagram. It takes a 10-dimension vector as input and gives a 1-dimension value as output. The output is the prediction label (real or fake) of the input sample. The discriminator network is also an MLP with two hidden layers and each containing 10 neurons:
Now, let's create our generator and discriminator networks. We put the code in the same simple_gan.py file as well:
Define the parameters of the generator network:
class Generator(object):
