23,99 €
Convolutional Neural Networks (CNN) are one of the most popular architectures used in computer vision apps. This book is an introduction to CNNs through solving real-world problems in deep learning while teaching you their implementation in popular Python library - TensorFlow. By the end of the book, you will be training CNNs in no time!
We start with an overview of popular machine learning and deep learning models, and then get you set up with a TensorFlow development environment. This environment is the basis for implementing and training deep learning models in later chapters. Then, you will use Convolutional Neural Networks to work on problems such as image classification, object detection, and semantic segmentation.
After that, you will use transfer learning to see how these models can solve other deep learning problems. You will also get a taste of implementing generative models such as autoencoders and generative adversarial networks.
Later on, you will see useful tips on machine learning best practices and troubleshooting. Finally, you will learn how to apply your models on large datasets of millions of images.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 272
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Amey VarangaonkarAcquisition Editor: Siddharth MandalContent Development Editor: Aditi GourTechnical Editor: Vaibhav DwivediCopy Editor: Safis EditingProject Coordinator: Hardik BhindeProofreader: Safis EditingIndexer: Tejal Daruwale SoniGraphics: Jason MonteiroProduction Coordinator: Deepika Naik
First published: August 2018
Production reference: 1240818
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78913-033-1
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Iffat Zafar was born in Pakistan. She received her Ph.D. from the Loughborough University in Computer Vision and Machine Learning in 2008. After her Ph.D. in 2008, she worked as research associate at the Department of Computer Science, Loughborough University, for about 4 years. She currently works in the industry as an AI engineer, researching and developing algorithms using Machine Learning and Deep Learning for object detection and general Deep Learning tasks for edge and cloud-based applications.
Giounona Tzanidou is a PhD in computer vision from Loughborough University, UK, where she developed algorithms for runtime surveillance video analytics. Then, she worked as a research fellow at Kingston University, London, on a project aiming at prediction detection and understanding of terrorist interest through intelligent video surveillance. She was also engaged in teaching computer vision and embedded systems modules at Loughborough University. Now an engineer, she investigates the application of deep learning techniques for object detection and recognition in videos.
Richard Burton graduated from the University of Leicester with a master's degree in mathematics. After graduating, he worked as a research engineer at the University of Leicester for a number of years, where he developed deep learning object detection models for their industrial partners. Now, he is working as a software engineer in the industry, where he continues to research the applications of deep learning in computer vision.
Nimesh Patel graduated from the University of Leicester with an MSc in applied computation and numerical modeling. During this time, a project collaboration with one of University of Leicester’s partners was undertaken, dealing with Machine Learning for Hand Gesture recognition. Since then, he has worked in the industry, researching Machine Learning for Computer Vision related tasks, such as Depth Estimation.
Leonardo Araujo is just the regular, Brazilian, curious engineer, who has worked in the industry for the past 19 years (yes, in Brazil, people work before graduation), doing HW/SW development and research on the topics of control engineering and computer vision. For the past 6 years, he has focused more on Machine Learning methods. His passions are too many to put on the book.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On Convolutional Neural Networks with TensorFlow
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the authors
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
Setup and Introduction to TensorFlow
The TensorFlow way of thinking
Setting up and installing TensorFlow
Conda environments
Checking whether your installation works
TensorFlow API levels
Eager execution
Building your first TensorFlow model
One-hot vectors
Splitting into training and test sets
Creating TensorFlow graphs
Variables
Operations
Feeding data with placeholders
Initializing variables
Training our model
Loss functions
Optimization
Evaluating a trained model
The session
Summary
Deep Learning and Convolutional Neural Networks
AI and ML
Types of ML
Old versus new ML
Artificial neural networks
Activation functions
The XOR problem
Training neural networks
Backpropagation and the chain rule
Batches
Loss functions
The optimizer and its hyperparameters
Underfitting versus overfitting
Feature scaling
Fully connected layers
A TensorFlow example for the XOR problem
Convolutional neural networks
Convolution
Input padding
Calculating the number of parameters (weights)
 Calculating the number of operations
Converting convolution layers into fully connected layers
The pooling layer
1x1 Convolution
Calculating the receptive field
Building a CNN model in TensorFlow
TensorBoard
Other types of convolutions
Summary
Image Classification in TensorFlow
CNN model architecture
Cross-entropy loss (log loss)
Multi-class cross entropy loss
The train/test dataset split
Datasets
ImageNet
CIFAR
Loading CIFAR
Image classification with TensorFlow
Building the CNN graph
Learning rate scheduling
Introduction to the tf.data API
The main training loop
Model Initialization
Do not initialize all weights with zeros
Initializing with a mean zero distribution
Xavier-Bengio and the Initializer
Improving generalization by regularizing
L2 and L1 regularization
Dropout
The batch norm layer
Summary
Object Detection and Segmentation
Image classification with localization
Localization as regression
TensorFlow implementation
Other applications of localization
Object detection as classification – Sliding window
Using heuristics to guide us (R-CNN)
Problems
Fast R-CNN
Faster R-CNN
Region Proposal Network
RoI Pooling layer
Conversion from traditional CNN to Fully Convnets
Single Shot Detectors – You Only Look Once
Creating training set for Yolo object detection
Evaluating detection (Intersection Over Union)
Filtering output
Anchor Box
Testing/Predicting in Yolo
Detector Loss function (YOLO loss)
Loss Part 1
Loss Part 2
Loss Part 3
Semantic segmentation
Max Unpooling
Deconvolution layer (Transposed convolution)
The loss function
Labels
Improving results
Instance segmentation
Mask R-CNN
Summary
VGG, Inception Modules, Residuals, and MobileNets
Substituting big convolutions
Substituting the 3x3 convolution
VGGNet
Architecture
Parameters and memory calculation
Code
More about VGG
GoogLeNet
Inception module
More about GoogLeNet
Residual Networks
MobileNets
Depthwise separable convolution
Control parameters
More about MobileNets
Summary
Autoencoders, Variational Autoencoders, and Generative Adversarial Networks
Why generative models
Autoencoders
Convolutional autoencoder example
Uses and limitations of autoencoders
Variational autoencoders
Parameters to define a normal distribution
VAE loss function
Kullback-Leibler divergence
Training the VAE
The reparameterization trick
Convolutional Variational Autoencoder code
Generating new data
Generative adversarial networks
The discriminator
The generator
GAN loss function
Generator loss
Discriminator loss
Putting the losses together
Training the GAN
Deep convolutional GAN
WGAN
BEGAN
Conditional GANs
Problems with GANs
Loss interpretability
Mode collapse
Techniques to improve GANs' trainability
Minibatch discriminator
Summary
Transfer Learning
When?
How? An overview
How? Code example
TensorFlow useful elements
An autoencoder without the decoder
Selecting layers
Training only some layers
Complete source
Summary
Machine Learning Best Practices and Troubleshooting
Building Machine Learning Systems
Data Preparation
Split of Train/Development/Test set
Mismatch of the Dev and Test set
When to Change Dev/Test Set
Bias and Variance
Data Imbalance
Collecting more data
Look at your performance metric
Data synthesis/Augmentation
Resample Data
Loss function Weighting
Evaluation Metrics
Code Structure best Practice
Singleton Pattern
Recipe for CNN creation
Summary
Training at Scale
Storing data in TFRecords
Making a TFRecord
Storing encoded images
Sharding
Making efficient pipelines
Parallel calls for map transformations
Getting a batch
Prefetching
Tracing your graph
Distributed computing in TensorFlow
Model/data parallelism
Synchronous/asynchronous SGD
When data does not fit on one computer
The advantages of NoSQL systems
Installing Cassandra (Ubuntu 16.04)
The CQLSH tool
Creating databases, tables, and indexes
Doing queries in Python
Populating tables in Python
Doing backups
Scaling computation in the cloud
EC2
AMI
Storage (S3)
SageMaker
Summary
References
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 7
Chapter 9
Other Books You May Enjoy
Leave a review - let other readers know what you think
This book is all about giving a practical, hands-on introduction to machine learning with the aim of enabling anyone to start working in the field. We'll focus mainly on deep learning methods and how they can be used to solve important computer vision problems, but the knowledge acquired here can be transferred to many different domains. Along the way, the reader will also get a grip of how to use the popular deep learning library, TensorFlow.
Anyone interested in a practical guide to machine learning, specifically deep learning and computer vision, will particularly benefit from reading this book. In addition, the following people will also benefit:
Machine learning engineers
Data scientists
Developers interested in learning about the deep learning and computer vision fields
Students studying machine learning
Chapter 1, Setup and Introduction to Tensorflow, covers the setting up and installation of TensorFlow along with writing a simple Tensorflow model for machine learning.
Chapter 2, Deep Learning and Convolutional Neural Networks, introduces you to machine learning, and artificial intelligence as well as artificial neural networks and how to train them. It also covers CNNs and how to use TensorFlow to train your own CNN.
Chapter 3, Image Classification in Tensorflow, talks about building CNN models and how to train them for classifying the CIFAR10 dataset. It also looks at ways to help improve the quality of our trained model by talking about different methods of initialization and regularization.
Chapter 4, Object Detection and Segmentation, teaches the basics of object localization, detection and segmentation and the most famous algorithms related to those topics.
Chapter 5, VGG, Inception Modules, Residuals, and MobileNets, introduces you to different convolutional neural network designs like VGGNet, GoggLeNet, and MobileNet.
Chapter 6, AutoEncoders, Variational Autoencoders, and Generative Adversarial Networks, introduces you to generative models, generative adversarial network, and different types of encoders.
Chapter 7, Transfer Learning, covers the usage of transfer learning and implementing it in our own tasks.
Chapter 8, Machine Learning Best Practices and Troubleshooting, introduces us to preparing and splitting a dataset into subsets and performing meaningful tests. The chapter also talks about underfitting and overfitting along with the best practices for addressing them.
Chapter 9, Training at Scale, teaches you how to train TensorFlow models across multiple GPUs and machines. It also covers best practices for storing your data and feeding it to your model.
To get the most of the book, the reader should have some knowledge of the Python programming language and how to install some required packages. All the rest will be covered by the book with an easy language approach. Installation instructions will be given in the book and in the repository.
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packtpub.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-on-Convolutional-Neural-Networks-with-Tensorflow. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
TensorFlow is an open source software library created by Google that allows you to build and execute data flow graphs for numerical computation. In these graphs, every node represents some computation or function to be executed, and the graph edges connecting up nodes represent the data flowing between them. In TensorFlow, the data is multi-dimensional arrays called Tensors. Tensors flow around the graph, hence the name TensorFlow.
Machine learning (ML) models, such as convolutional neural networks, can be represented with these kinds of graphs, and this is exactly what TensorFlow was originally designed for.
In this chapter, we'll cover the following topics:
Understanding the TensorFlow way of thinking
Setting up and installing TensorFlow
Introduction to TensorFlow API levels
Building and training a linear classifier in TensorFlow
Evaluating a trained model
Using TensorFlow requires a slightly different approach to programming than what you might be used to using, so let's explore what makes it different.
At their core, all TensorFlow programs have two main parts to them:
Construction of a computational graph called
tf.Graph
Running the computational graph using
tf.Session
In TensorFlow, a computational graph is a series of TensorFlow operations arranged into a graph structure. The TensorFlow graph contains two main types of components:
Operations
: More commonly called
ops
, for short, these are the nodes in your graph. Ops carry out any computation that needs to be done in your graph. Generally, they consume and produce Tensors. Some ops are special and can have certain side effects when they run.
Tensors
: These are the edges of your graph; they connect up the nodes and represent data that flows through it. Most TensorFlow ops will produce and consume these
tf.Tensors
.
In TensorFlow, the main object that you work with is called a Tensor. Tensors are the generalization of vectors and matrices. Even though vectors are one-dimensional and matrices are two-dimensional, a Tensor can be n-dimensional. TensorFlow represents Tensors as n-dimensional arrays of a user-specified data type, for example, float32.
TensorFlow programs work by first building a graph of computation. This graph will produce some tf.Tensor output. To evaluate this output, you must run it within a tf.Session by calling tf.Session.run on your output Tensor. When you do this, TensorFlow will execute all the parts of your graph that need to be executed in order to evaluate the tf.Tensor you asked it to run.
TensorFlow is supported on the latest versions of Ubuntu and Windows. TensorFlow on Windows only supports the use of Python 3, while use on Ubuntu allows the use of both Python 2 and 3. We recommend using Python 3, and that is what we will use in this book for code examples.
There are several ways you can install TensorFlow on your system, and here we will go through two of the main ways. The easiest is by simply using the pip package manager. Issuing the following command from a terminal will install the CPU-only version of TensorFlow to your system Python:
$ pip3 install --upgrade tensorflow
To install the version of Tensorflow that supports using your Nvidia GPU, simply type the following:
$ pip3 install --upgrade tensorflow-gpu
One of the advantages of TensorFlow is that it allows you to write code that can run directly on your GPU. With a few exceptions, almost all the major operations in TensorFlow can be run on a GPU to accelerate their execution speed. We will see that this is going to be essential in order to train the large convolutional neural networks described later in this book.
Using pip may be the quickest to get started, but I see that the most convenient method involves using conda environments.
Conda environments allow you to create isolated Python environments, which are completely separate from your system Python or any other Python programs. This way, there is no chance of your TensorFlow installation messing with anything already installed, and vice versa.
To use conda, you must download Anaconda from here: https://www.anaconda.com/download/. This will include conda with it. Once you've installed Anaconda, installing TensorFlow can be done by entering the certain commands in your Command Prompt. First, enter the following:
$ conda create -n tf_env pip python=3.5
This will create your conda environment with the name tf_env, the environment will use Python 3.5, and pip will also be installed for us to use.
Once this environment is created, you can start using it by entering the following on Windows:
$ activate tf_env
If you are using Ubuntu, enter the following command:
$ source activate tf_env
It should now display (tf_env) next to your Command Prompt. To install TensorFlow, we simply do a pip install as previously, depending on if you want CPU only or you want GPU support:
(tf_env)$ pip install --upgrade tensorflow
(tf_env)$ pip install --upgrade tensorflow-gpu
At the time of this writing, Google had just introduced the eager execution API to TensorFlow. Eager Execution is TensorFlow's answer to another deep learning library called PyTorch. It allows you to bypass the usual TensorFlow way of working where you must first define a computational graph and then execute the graph to get a result. This is known as static graph computation. Instead, with Eager Execution, you can now create the so-called dynamic graphs that are defined on the fly as you run your program. This allows for a more traditional, imperative way of programming when using TensorFlow. Unfortunately, eager execution is still under development with some features still missing, and will not be featured in this book. More information on Eager Execution can be found at the TensorFlow website.
After shuffling, we do some preprocessing on the data labels. The labels loaded with the dataset is just a 150-length vector of integers representing which target class each datapoint belongs to, either 1, 2, or 3 in this case. When creating machine learning models, we like to transform our labels into a new form that is easier to work with by doing something called one-hot encoding.
Rather than a single number being the label for each datapoint, we use
