31,19 €
Machines are excelling at creative human skills such as painting, writing, and composing music. Could you be more creative than generative AI?
In this book, you’ll explore the evolution of generative models, from restricted Boltzmann machines and deep belief networks to VAEs and GANs. You’ll learn how to implement models yourself in TensorFlow and get to grips with the latest research on deep neural networks.
There’s been an explosion in potential use cases for generative models. You’ll look at Open AI’s news generator, deepfakes, and training deep learning agents to navigate a simulated environment.
Recreate the code that’s under the hood and uncover surprising links between text, image, and music generation.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 583
Veröffentlichungsjahr: 2021
Generative AI with Python and TensorFlow 2
Create images, text, and music with VAEs, GANs, LSTMs, Transformer models
Joseph Babcock
Raghav Bali
BIRMINGHAM - MUMBAI
Generative AI with Python and TensorFlow 2
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Producer: Tushar Gupta
Acquisition Editor – Peer Reviews: Suresh Jain, Saby D'silva
Content Development Editors: Lucy Wan, Joanne Lovell
Technical Editor: Gaurav Gavas
Project Editor: Janice Gonsalves
Copy Editor: Safis Editing
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Presentation Designer: Pranit Padwal
First published: April 2021
Production reference: 3070721
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-80020-088-3
www.packt.com
Joseph Babcock has over a decade of experience in machine learning and developing big data solutions. He applied predictive modeling to drug discovery and genomics during his doctoral studies in neurosciences, and has since worked and led data science teams in the streaming media, e-commerce, and financial services industries. He previously authored Mastering Predictive Analytics with Python and Python: Advanced Predictive Analytics with Packt.
I would like to acknowledge my family for their support during the composition of this book.
Raghav Bali is a data scientist and a published author. He has led advanced analytics initiatives working with several Fortune 500 companies like Optum (UHG), Intel, and American Express. His work involves research and development of enterprise solutions leveraging machine learning and deep learning. He holds a Master of Technology degree (gold medalist) from IIIT Bangalore, with specializations in machine learning and software engineering. Raghav has authored several books on R, Python, machine learning, and deep learning, including Hands-On Transfer Learning with Python.
To my wife, parents, and brother, without whom this would not have been possible. To all the researchers whose work continues to inspire me to learn. And to my co-author, reviewers, and the Packt team (especially Tushar, Janice, and Lucy) for their hard work in transforming our work into this amazing book.
Hao-Wen Dong is currently a PhD student in Computer Science and Engineering at the University of California, San Diego, working with Prof. Julian McAuley and Prof. Taylor Berg-Kirkpatrick. His research interests lie at the intersection of music and machine learning, with a recent focus on music generation. He is interested in building tools that could lower the barrier of entry for music composition and potentially lead to the democratization of music creation. Previously, he did a research internship in the R&D Division at Yamaha Corporation. Before that, he was a research assistant in the Music and AI Lab directed by Dr. Yi-Hsuan Yang at Academia Sinica. He received his bachelor's degree in Electrical Engineering from National Taiwan University.
Gokula Krishnan Santhanam is a Python developer who lives in Zurich, Switzerland. He has been working with deep learning techniques for more than 5 years. He has worked on problems in generative modeling, adversarial attacks, interpretability, and predictive maintenance while working at IBM Research and interning at Google. He finished his master's in Computer Science at ETH Zurich and his bachelor's at BITS Pilani. When he's not working, you can find him enjoying board games with his wife or hiking in the beautiful Alps.
I would like to thank my wife, Sadhana, for her continuous help and support and for always being there when I need her.
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
An Introduction to Generative AI: "Drawing" Data from Models
Applications of AI
Discriminative and generative models
Implementing generative models
The rules of probability
Discriminative and generative modeling and Bayes' theorem
Why use generative models?
The promise of deep learning
Building a better digit classifier
Generating images
Style transfer and image transformation
Fake news and chatbots
Sound composition
The rules of the game
Unique challenges of generative models
Summary
References
Setting Up a TensorFlow Lab
Deep neural network development and TensorFlow
TensorFlow 2.0
VSCode
Docker: A lightweight virtualization solution
Important Docker commands and syntax
Connecting Docker containers with docker-compose
Kubernetes: Robust management of multi-container applications
Important Kubernetes commands
Kustomize for configuration management
Kubeflow: an end-to-end machine learning lab
Running Kubeflow locally with MiniKF
Installing Kubeflow in AWS
Installing Kubeflow in GCP
Installing Kubeflow on Azure
Installing Kubeflow using Terraform
A brief tour of Kubeflow's components
Kubeflow notebook servers
Kubeflow pipelines
Using Kubeflow Katib to optimize model hyperparameters
Summary
References
Building Blocks of Deep Neural Networks
Perceptrons – a brain in a function
From tissues to TLUs
From TLUs to tuning perceptrons
Multi-layer perceptrons and backpropagation
Backpropagation in practice
The shortfalls of backpropagation
Varieties of networks: Convolution and recursive
Networks for seeing: Convolutional architectures
Early CNNs
AlexNet and other CNN innovations
AlexNet architecture
Networks for sequence data
RNNs and LSTMs
Building a better optimizer
Gradient descent to ADAM
Xavier initialization
Summary
References
Teaching Networks to Generate Digits
The MNIST database
Retrieving and loading the MNIST dataset in TensorFlow
Restricted Boltzmann Machines: generating pixels with statistical mechanics
Hopfield networks and energy equations for neural networks
Modeling data with uncertainty with Restricted Boltzmann Machines
Contrastive divergence: Approximating a gradient
Stacking Restricted Boltzmann Machines to generate images: the Deep Belief Network
Creating an RBM using the TensorFlow Keras layers API
Creating a DBN with the Keras Model API
Summary
References
Painting Pictures with Neural Networks Using VAEs
Creating separable encodings of images
The variational objective
The reparameterization trick
Inverse Autoregressive Flow
Importing CIFAR
Creating the network from TensorFlow 2
Summary
References
Image Generation with GANs
The taxonomy of generative models
Generative adversarial networks
The discriminator model
The generator model
Training GANs
Non-saturating generator cost
Maximum likelihood game
Vanilla GAN
Improved GANs
Deep Convolutional GAN
Vector arithmetic
Conditional GAN
Wasserstein GAN
Progressive GAN
The overall method
Progressive growth-smooth fade-in
Minibatch standard deviation
Equalized learning rate
Pixelwise normalization
TensorFlow Hub implementation
Challenges
Training instability
Mode collapse
Uninformative loss and evaluation metrics
Summary
References
Style Transfer with GANs
Paired style transfer using pix2pix GAN
The U-Net generator
The Patch-GAN discriminator
Loss
Training pix2pix
Use cases
Unpaired style transfer using CycleGAN
Overall setup for CycleGAN
Adversarial loss
Cycle loss
Identity loss
Overall loss
Hands-on: Unpaired style transfer with CycleGAN
Generator setup
Discriminator setup
GAN setup
The training loop
Related works
DiscoGAN
DualGAN
Summary
References
Deepfakes with GANs
Deepfakes overview
Modes of operation
Replacement
Re-enactment
Editing
Key feature set
Facial Action Coding System (FACS)
3D Morphable Model
Facial landmarks
Facial landmark detection using OpenCV
Facial landmark detection using dlib
Facial landmark detection using MTCNN
High-level workflow
Common architectures
Encoder-Decoder (ED)
Generative Adversarial Networks (GANs)
Replacement using autoencoders
Task definition
Dataset preparation
Autoencoder architecture
Training our own face swapper
Results and limitations
Re-enactment using pix2pix
Dataset preparation
Pix2pix GAN setup and training
Results and limitations
Challenges
Ethical issues
Technical challenges
Generalization
Occlusions
Temporal issues
Off-the-shelf implementations
Summary
References
The Rise of Methods for Text Generation
Representing text
Bag of Words
Distributed representation
Word2vec
GloVe
FastText
Text generation and the magic of LSTMs
Language modeling
Hands-on: Character-level language model
Decoding strategies
Greedy decoding
Beam search
Sampling
Hands-on: Decoding strategies
LSTM variants and convolutions for text
Stacked LSTMs
Bidirectional LSTMs
Convolutions and text
Summary
References
NLP 2.0: Using Transformers to Generate Text
Attention
Contextual embeddings
Self-attention
Transformers
Overall architecture
Multi-head self-attention
Positional encodings
BERT-ology
GPT 1, 2, 3…
Generative pre-training: GPT
GPT-2
Hands-on with GPT-2
Mammoth GPT-3
Summary
References
Composing Music with Generative Models
Getting started with music generation
Representing music
Music generation using LSTMs
Dataset preparation
LSTM model for music generation
Music generation using GANs
Generator network
Discriminator network
Training and results
MuseGAN – polyphonic music generation
Jamming model
Composer model
Hybrid model
Temporal model
MuseGAN
Generators
Critic
Training and results
Summary
References
Play Video Games with Generative AI: GAIL
Reinforcement learning: Actions, agents, spaces, policies, and rewards
Deep Q-learning
Inverse reinforcement learning: Learning from experts
Adversarial learning and imitation
Running GAIL on PyBullet Gym
The agent: Actor-Critic network
The discriminator
Training and results
Summary
References
Emerging Applications in Generative AI
Finding new drugs with generative models
Searching chemical space with generative molecular graph networks
Folding proteins with generative models
Solving partial differential equations with generative modeling
Few shot learning for creating videos from images
Generating recipes with deep learning
Summary
References
Why subscribe?
Other Books You May Enjoy
Index
Now that you have seen all the amazing applications of generative models in Chapter 1, An Introduction to Generative AI: "Drawing" Data from Models, you might be wondering how to get started with implementing these projects that use these kinds of algorithms. In this chapter, we will walk through a number of tools that we will use throughout the rest of the book to implement the deep neural networks that are used in various generative AI models. Our primary tool is the TensorFlow 2.0 framework, developed by Google1 2; however, we will also use a number of additional resources to make the implementation process easier (summarized in Table 2.1).
We can broadly categorize these tools:
Resources for replicable dependency management (Docker, Anaconda)Exploratory tools for data munging and algorithm hacking (Jupyter)Utilities to deploy these resources to the cloud and manage their lifecycle (Kubernetes, Kubeflow, Terraform)Tool
Project site
Use
Docker
https://www.docker.com/
Application runtime dependency encapsulation
Anaconda
https://www.anaconda.com/
Python language package management
Jupyter
https://jupyter.org/
Interactive Python runtime and plotting / data exploration tool
Kubernetes
https://kubernetes.io/
Docker container orchestration and resource management
Kubeflow
https://www.kubeflow.org/
Machine learning workflow engine developed on Kubernetes
Terraform
https://www.terraform.io/
Infrastructure scripting language for configurable and consistent deployments of Kubeflow and Kubernetes
VSCode
https://code.visualstudio.com/
Integrated development environment (IDE)
Table 2.1: Tech stack for generative adversarial model development
On our journey to bring our code from our laptops to the cloud in this chapter, we will first describe some background on how TensorFlow works when running locally. We will then describe a wide array of software tools that will make it easier to run an end-to-end TensorFlow lab locally or in the cloud, such as notebooks, containers, and cluster managers. Finally, we will walk through a simple practical example of setting up a reproducible research environment, running local and distributed training, and recording our results. We will also examine how we might parallelize TensorFlow across multiple CPU/GPU units within a machine (vertical scaling) and multiple machines in the cloud (horizontal scaling) to accelerate training. By the end of this chapter, we will be all ready to extend this laboratory framework to tackle implementing projects using various generative AI models.
First, let's start by diving more into the details of TensorFlow, the library we will use to develop models throughout the rest of this book. What problem does TensorFlow solve for neural network model development? What approaches does it use? How has it evolved over the years? To answer these questions, let us review some of the history behind deep neural network libraries that led to the development of TensorFlow.
As we will see in Chapter 3, Building Blocks of Deep Neural Networks, a deep neural network in essence consists of matrix operations (addition, subtraction, multiplication), nonlinear transformations, and gradient-based updates computed by using the derivatives of these components.
In the world of academia, researchers have historically often used efficient prototyping tools such as MATLAB3 to run models and prepare analyses. While this approach allows for rapid experimentation, it lacks elements of industrial software development, such as object-oriented (OO) development, that allow for reproducibility and clean software abstractions that allow tools to be adopted by large organizations. These tools also had difficulty scaling to large datasets and could carry heavy licensing fees for such industrial use cases. However, prior to 2006, this type of computational tooling was largely sufficient for most use cases. However, as the datasets being tackled with deep neural network algorithms grew, groundbreaking results were achieved such as:
Image classification on the ImageNet dataset4Large-scale unsupervised discovery of image patterns in YouTube videos5The creation of artificial agents capable of playing Atari video games and the Asian board game GO with human-like skill6 7State-of-the-art language translation via the BERT model developed by Google8The models developed in these studies exploded in complexity along with the size of the datasets they were applied to (see Table 2.2 to get a sense of the immense scale of some of these models). As industrial use cases required robust and scalable frameworks to develop and deploy new neural networks, several academic groups and large technology companies invested in the development of generic toolkits for the implementation of deep learning models. These software libraries codified common patterns into reusable abstractions, allowing even complex models to be often embodied in relatively simple experimental scripts.
Model Name
Year
# Parameters
AlexNet
2012
61M
YouTube CNN
2012
1B
Inception
2014
5M
VGG-16
2014
138M
BERT
2018
340M
GPT-3
2020