34,79 €
Transfer learning is a machine learning (ML) technique where knowledge gained during training a set of problems can be used to solve other similar problems.
The purpose of this book is two-fold; firstly, we focus on detailed coverage of deep learning (DL) and transfer learning, comparing and contrasting the two with easy-to-follow concepts and examples. The second area of focus is real-world examples and research problems using TensorFlow, Keras, and the Python ecosystem with hands-on examples.
The book starts with the key essential concepts of ML and DL, followed by depiction and coverage of important DL architectures such as convolutional neural networks (CNNs), deep neural networks (DNNs), recurrent neural networks (RNNs), long short-term memory (LSTM), and capsule networks. Our focus then shifts to transfer learning concepts, such as model freezing, fine-tuning, pre-trained models including VGG, inception, ResNet, and how these systems perform better than DL models with practical examples. In the concluding chapters, we will focus on a multitude of real-world case studies and problems associated with areas such as computer vision, audio analysis and natural language processing (NLP).
By the end of this book, you will be able to implement both DL and transfer learning principles in your own systems.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 481
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Sunith ShettyAcquisition Editor: Tushar GuptaContent Development Editor:Unnati GuhaTechnical Editor: Sayli NikaljeCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Rekha NairGraphics: Jisha ChirayilProduction Coordinator: Shantanu Zagade
First published: August 2018
Production reference: 1300818
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78883-130-7
www.packtpub.com
This book wouldn't have been possible without several people who made this from a mere concept into reality. I would like to thank my parents, Digbijoy and Sampa, my partner, Durba, my pets, family, and friends for supporting me constantly in my endeavors. A big thank you to the entire team at Packt especially Tushar, Sayli, and Unnati for working tirelessly and supporting us throughout our journey. Also thanks to Matthew Mayo for gracing our book with his foreword and doing great things with KDnuggets.
Thanks to Adrian Rosebrock and PyImageSearch for some excellent visuals and content around pretrained models for computer vision, Federico Baldassarre, Diego Gonzalez-Morin, Lucas Rodes-Guirao, and Emil Wallner for some excellent strategies and implementations for image colorization, Anurag Mishra for giving tips for build an efficient image captioning model, François Chollet for building Keras and writing some very useful and engaging content on transfer learning and to the entire Python AI eco-system for helping the community democratize deep learning and artificial intelligence for everyone.
Finally, I would like to thank my managers and mentors Gopalan, Sanjeev, and Nagendra and all my friends and colleagues at Intel for encouraging me and giving me the opportunity to explore new domains in the world of AI. Shoutout also to the folks from Springboard, especially Srdjan Santic for not just giving me an opportunity to learn and interact with some amazing people but also for the passion, zeal, and vision of educating more people on Data Science and AI. Towards Data Science and Ludovic Benistant thanks for helping me learn and share more about AI to the rest of the world and helping me explore cutting-edge research and work in these domains. Last but not the least, I owe a ton of gratitude to my co-authors Raghav and Tamoghna and our reviewer Nitin Panwar for embarking on this journey with me and without whom this book wouldn't have been possible!
– Dipanjan Sarkar
I would like to take this opportunity to express gratitude to my parents, Sunil and Neeru, my wife, Swati, my brother, Rajan, family, teachers, friends, colleagues, and mentors who have encouraged, supported and taught me over the years. I would also like to thank my co-authors and good friends Dipanjan Sarkar and Tamoghna Ghosh, for taking me along on this amazing journey. A big thanks to my managers and mentors Vineet, Ravi, and Vamsi along with all my teammates at Optum for their support and encouragement to explore new domains in the Data Science world.
I would like to thank Tushar Gupta, Aaryaman Singh, Sayli Nikalje, Unnati Guha, and Packt for the opportunity and their support throughout this journey. This book wouldn't have been complete without Nitin Panwar's insightful feedback and suggestions. Last but not the least, special thanks to François Chollet for Keras, the Python ecosystem and community, fellow authors and researchers who are striving every day to bring these amazing technologies and tools at our fingertips.
– Raghav Bali
I would like to thank entire Packt team for giving me this unique opportunity and also guiding me throughout the journey. For this book my co-authors here acted as my mentor as well. They helped me with their insightful suggestions and guidance. Thanks to Nitin for patiently reviewing this book and providing great feedback. I would like to thank my wife Doyel, my son Anurag, and my parents for being a constant source of inspiration for me and tolerating me for working extended hours. Also, I am grateful to my Intel managers for their encouragement and support.
– Tamoghna Ghosh
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Chances are you are familiar with the recent and seemingly endless machine learning innovations, but do you know about what goes into training a machine learning model? Generally, a given machine learning model is trained on specific data for a particular task. This training process can be exceptionally resource and time-consuming, and since the resulting models are task-specific, the maximum potential of the resulting model is not realized.
Optimally-performing neural network models, for example, are often the result of many iterations of fine-tuning from researchers or practitioners. Could these trained models not be additionally exploited for a wider assortment of tasks? Transfer learning involves the leveraging of existing machine learning models for use in scenarios in which the models were not originally trained.
Much as humans do not discard everything they have previously learned and start a fresh each time they take up a new task, transfer learning allows a machine learning model to port the knowledge it has acquired during training to new tasks, extending the reach of the combination of computation and expertise having been used as fuel for the original model. Simply put, transfer learning can save training time and extend the usefulness of existing machine learning models. It is also an invaluable technique for tasks where the large amounts of training data typically required for training a model from scratch are not available.
Becoming familiar with complex concepts and implementing these concepts in practice are two very different things, and this is where Hands-On Transfer Learning with Python shines. The book starts with a deep dive into both deep learning and transfer learning, conceptually. This is followed by practical implementations of these concepts with real-world examples and research problems, using modern deep learning tools from the Python ecosystem, such as TensorFlow and Keras. Dipanjan, Raghav, and Tamoghna excel at elegantly marrying the theoretical and the practical, a remarkable advantage for the reader of such a well-crafted publication.
Transfer learning has shown much promise of late in many domains, and is a very active area of contemporary machine learning research. If you are looking for a complete guide to both deep learning and transfer learning, starting from zero, Hands-On Transfer Learning with Python should be your first stop.
Matthew Mayo
Editor, KDnuggets
@mattmayo13
Dipanjan (DJ) Sarkar is a Data Scientist at Intel, leveraging data science, machine learning, and deep learning to build large-scale intelligent systems. He holds a master of technology degree with specializations in Data Science and Software Engineering.
He has been an analytics practitioner for several years now, specializing in machine learning, NLP, statistical methods, and deep learning. He is passionate about education and also acts as a Data Science Mentor at various organizations like Springboard, helping people learn data science. He is also a key contributor and editor for Towards Data Science, a leading online journal on AI and Data Science. He has also authored several books on R, Python, machine learning, NLP, and deep learning.
Raghav Bali is a Data Scientist at Optum (United Health Group). His work involves research and development of enterprise-level solutions based on machine learning, deep learning, and NLP for Healthcare and Insurance related use cases. In his previous role at Intel, he was involved in enabling proactive data driven IT initiatives. He has also worked in ERP and finance domains with some of the leading organizations in the world. Raghav has also authored multiple books with leading publishers.
Raghav has a master's degree (gold medalist) in Information Technology from International Institute of Information Technology, Bangalore. He loves reading and is a shutterbug capturing moments when he isn't busy solving problems.
Tamoghna Ghosh is a machine learning engineer at Intel Corporation. He has overall 11 years of work experience including 4 years of core research experience at Microsoft Research (MSR) India. At MSR he worked as a research assistant in cryptanalysis of block ciphers.
His technical expertise's are in big data, machine learning, NLP, information retrieval, data visualization and software development. He received M.Tech (Computer Science) degree from the Indian Statistical Institute, Kolkata and M.Sc. (Mathematics) from University of Calcutta with specialization in functional analysis and mathematical modeling/dynamical systems. He is passionate about teaching and conducts internal training in data science for Intel at various levels.
Nitin Panwar has a master's degree in Computer Science from the Indian Institute of Information technology, Gwalior. He is a Technical Lead (data science) at Naukri, India's No.1 job site, where he works on data science, machine learning, and text analytics. He has also worked as a data scientist at Intel, the world's largest silicon company. Nitin's interests include learning about new technology, AI-powered start-ups, and data science.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On Transfer Learning with Python
Dedication
Packt Upsell
Why subscribe?
PacktPub.com
Foreword
Contributors
About the authors
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Machine Learning Fundamentals
Why ML?
Formal definition
Shallow and deep learning
ML techniques
Supervised learning
Classification
Regression
Unsupervised learning
Clustering
Dimensionality reduction
Association rule mining
Anomaly detection
CRISP-DM
Business understanding
Data understanding
Data preparation
Modeling
Evaluation
Deployment
Standard ML workflow
Data retrieval
Data preparation
Exploratory data analysis
Data processing and wrangling
Feature engineering and extraction
Feature scaling and selection
Modeling
Model evaluation and tuning
Model evaluation
Bias variance trade-off
Bias
Variance
Trade-off
Underfitting
Overfitting
Generalization
Model tuning
Deployment and monitoring
Exploratory data analysis
Feature extraction and engineering
Feature engineering strategies
Working with numerical data
Working with categorical data
Working with image data
Deep learning based automated feature extraction
Working with text data
Text preprocessing
Feature engineering
Feature selection
Summary
Deep Learning Essentials
What is deep learning?
Deep learning frameworks
Setting up a cloud-based deep learning environment with GPU support
Choosing a cloud provider
Setting up your virtual server
Configuring your virtual server
Installing and updating deep learning dependencies
Accessing your deep learning cloud environment
Validating GPU-enablement on your deep learning environment
Setting up a robust, on-premise deep learning environment with GPU support
Neural network basics
A simple linear neuron
Gradient-based optimization
The Jacobian and Hessian matrices
Chain rule of derivatives
Stochastic Gradient Descent
Non-linear neural units
Learning a simple non-linear unit – logistic unit
Loss functions
Data representations
Tensor examples
Tensor operations
Multilayered neural networks
Backprop – training deep neural networks
Challenges in neural network learning
Ill-conditioning
Local minima and saddle points
Cliffs and exploding gradients
Initialization – bad correspondence between the local and global structure of the objective
Inexact gradients
Initialization of model parameters
Initialization heuristics
Improvements of SGD
The momentum method
Nesterov momentum
Adaptive learning rate – separate for each connection
AdaGrad
RMSprop
Adam
Overfitting and underfitting in neural networks
Model capacity
How to avoid overfitting – regularization
Weight-sharing
Weight-decay
Early stopping
Dropout
Batch normalization
Do we need more data?
Hyperparameters of the neural network
Automatic hyperparameter tuning
Grid search
Summary
Understanding Deep Learning Architectures
Neural network architecture
Why different architectures are needed
Various architectures
MLPs and deep neural networks
Autoencoder neural networks
Variational autoencoders
Generative Adversarial Networks
Text-to-image synthesis using the GAN architecture
CNNs
The convolution operator
Stride and padding mode in convolution
The convolution layer
LeNet architecture
AlexNet
ZFNet
GoogLeNet (inception network)
VGG
Residual Neural Networks
Capsule networks
Recurrent neural networks
LSTMs
Stacked LSTMs
Encoder-decoder – Neural Machine Translation
Gated Recurrent Units
Memory Neural Networks
MemN2Ns
Neural Turing Machine
Selective attention
Read operation
Write operation
The attention-based neural network model
Summary
Transfer Learning Fundamentals
Introduction to transfer learning
Advantages of transfer learning
Transfer learning strategies
Transfer learning and deep learning
Transfer learning methodologies
Feature-extraction
Fine-tuning
Pretrained models
Applications
Deep transfer learning types
Domain adaptation
Domain confusion
Multitask learning
One-shot learning
Zero-shot learning
Challenges of transfer learning
Negative transfer
Transfer bounds
Summary
Unleashing the Power of Transfer Learning
The need for transfer learning
Formulating our real-world problem
Building our dataset
Formulating our approach
Building CNN models from scratch
Basic CNN model
CNN model with regularization
CNN model with image augmentation
Leveraging transfer learning with pretrained CNN models
Understanding the VGG-16 model
Pretrained CNN model as a feature extractor
Pretrained CNN model as a feature extractor with image augmentation
Pretrained CNN model with fine-tuning and image augmentation
Evaluating our deep learning models
Model predictions on a sample test image
Visualizing what a CNN model perceives
Evaluation model performance on test data
Summary
Image Recognition and Classification
Deep learning-based image classification
Benchmarking datasets
State-of-the-art deep image classification models
Image classification and transfer learning
CIFAR-10
Building an image classifier
Transferring knowledge
Dog Breed Identification dataset
Exploratory analysis
Data preparation
Dog classifier using transfer learning
Summary
Text Document Categorization
Text categorization
Traditional text categorization
Shortcomings of BoW models
Benchmark datasets
Word representations
Word2vec model
Word2vec using gensim
GloVe model
CNN document model
Building a review sentiment classifier
What has embedding changed most?
Transfer learning – application to the IMDB dataset
Training on the full IMDB dataset with Word2vec embeddings
Creating document summaries with CNN model
Multiclass classification with the CNN model
Visualizing document embeddings
Summary
Audio Event Identification and Classification
Understanding audio event classification
Formulating our real-world problem
Exploratory analysis of audio events
Feature engineering and representation of audio events
Audio event classification with transfer learning
Building datasets from base features
Transfer learning for feature extraction
Building the classification model
Evaluating the classifier performance
Building a deep learning audio event identifier
Summary
DeepDream
Introduction
Algorithmic pareidolia in computer vision
Visualizing feature maps
DeepDream
Examples
Summary
Style Transfer
Understanding neural style transfer
Image preprocessing methodology
Building loss functions
Content loss
Style loss
Total variation loss
Overall loss function
Constructing a custom optimizer
Style transfer in action
Summary
Automated Image Caption Generator
Understanding image captioning
Formulating our objective
Understanding the data
Approach to automated image captioning
Conceptual approach
Practical hands-on approach
Image feature extractor – DCNN model with transfer learning
Text caption generator – sequence-based language model with LSTM
Encoder-decoder model
Image feature extraction with transfer learning
Building a vocabulary for our captions
Building an image caption dataset generator
Building our image language encoder-decoder deep learning model
Training our image captioning deep learning model
Evaluating our image captioning deep learning model
Loading up data and models
Understanding greedy and beam search
Implementing a beam search-based caption generator
Understanding and implementing BLEU scoring
Evaluating model performance on test data
Automated image captioning in action!
Captioning sample images from outdoor scenes
Captioning sample images from popular sports
Future scope for improvement
Summary
Image Colorization
Problem statement
Color images
Color theory
Color models and color spaces
RGB
YUV
LAB
Problem statement revisited
Building a coloring deep neural network
Preprocessing
Standardization
Loss function
Encoder
Transfer learning – feature extraction
Fusion layer
Decoder
Postprocessing
Training and results
Challenges
Further improvements
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
With the world moving towards digitization and automation, as a technologist/programmer it is important to keep oneself updated and learn how to leverage these tools and techniques. This book, Hands-On Transfer Learning with Python, is an attempt to help practitioners get acquainted with and equipped to use these advancements in their respective domains. This book is structured broadly into three sections:
Deep learning foundations
Essentials of transfer learning
Transfer learning case studies
Transfer learning is a machine learning (ML) technique where knowledge gained during the training of one set of ML problems can be used to train other similar types of problems.
The purpose of this book is two-fold. We will focus on detailed coverage of deep learning and transfer learning, comparing and contrasting the two with easy-to-follow concepts and examples. The second area of focus will be on real-world examples and research problems using TensorFlow, Keras, and the Python ecosystem with hands-on examples.
The book starts with core essential concepts of ML and deep learning, followed by some depictions and coverage of important deep learning architectures, such as CNNs, DNNs, RNNs, LSTMs, and capsule networks. Our focus then shifts to transfer learning concepts and pretrained state of the art networks such as VGG, Inception, and ResNet. We also learn how these systems can be leveraged to improve performance of our deep learning models. Finally, we focus on a multitude of real-world case studies and problems in areas such as computer vision, audio analysis, and natural language processing (NLP).
By the end of this book, you will be all ready to implement both deep learning and transfer learning principles in your own systems.
Hands-On Transfer Learning with Python is for data scientists, ML engineers, analysts, and developers with an interest in data and applying state-of-the-art transfer learning methodologies to solve tough real-world problems.
Basic proficiency in ML and Python is required.
Chapter 1, Machine Learning Fundamentals, introduces the CRISP-DM model, which presents an industry standard framework/workflow for any data science, ML, or deep learning project. We will also touch upon various important concepts covering the fundamentals in the ML landscape such as exploratory data analysis, feature extraction and engineering, evaluation metrics, and so on.
Chapter 2, Deep Learning Essentials, provides a whirlwind tour of deep learning essentials, providing an overview of the basic building blocks of neural networks and also how deep neural networks are trained. Starting from the basics of how a single neural unit works, important concepts like activation functions, loss functions, optimizers and neural network hyperparameters are covered. Special focus is also emphasized on setting up on-premise and cloud-based deep learning environments.
Chapter 3, Understanding Deep Learning Architectures, focuses on understanding the various standard model architectures present today in deep learning. We have come a long way since traditional ANNs in the 1960s, and essential model architectures such as fully connected deep neural networks (DNNs), Convolutional Neural Networks (CNNs), recurrent neural networks (RNNs), Long-Short Term Memory (LSTM) networks, and the most recent Capsule Networks will be covered, to name a few.
Chapter 4, Transfer Learning Fundamentals, looks at the core concepts, terminology, and model architectures associated with the concept of transfer learning. Concepts and architectures pertaining to pretrained models will be discussed in detail. We will also compare and contrast transfer learning with deep learning and talk about types and strategies of transfer learning.
Chapter 5, Unleashing the Power of Transfer Learning, takes an actual example with a dataset from Kaggle, leverages deep learning models on it, and gives readers an understanding of the challenges faced when we have a small number of data points, and how transfer learning can unleash its true power and potential to give us superior models in these scenarios. We will tackle the very popular dogs and cats classification task here with the twist of a less data availability constraint.
Chapter 6, Image Recognition and Classification, is the first in a series of real-world applications/case studies of concepts discussed in detail in the previous two parts of the book. The chapter begins with an introduction to the task of image classification, and goes on to discuss and implement some of the popular, state-of-the-art deep learning models on diverse image classification problems.
Chapter 7, Text Document Categorization, discusses the application of transfer learning to a very popular natural language processing problem, text document categorization. The chapter begins with a high-level introduction to the multi-class text classification problem, traditional models, benchmark text classification datasets such as 20-newsgroups and performance. Later, it introduces deep learning document models for text classification and their advantages over traditional models. We learn about word feature representation using dense vectors and how to leverage the same for applying transfer learning in our text categorization problem such that our source and target domains might be different. Other unsupervised tasks like document summarization are also depicted.
Chapter 8, Audio Identification and Classification, solves the tough problem of identifying and classifying very short audio clips. Here we leverage transfer learning using some innovation of applying the power of pre-trained deep learning models from the computer vision domain into a totally different domain of audio identification.
Chapter 9, DeepDream, focuses on a gentle introduction to the domain of generative deep learning, which is one of the core ideas at the forefront of true artificial intelligence. We will be focusing on how convnets (CNNs) think or dream and visualize patterns in images by leveraging transfer learning. First released by Google in 2015, it became a viral sensation due to the interesting patterns deep networks started to generate from images, as if thinking and dreaming on their own!
Chapter 10, Style Transfer, leverages concepts from deep learning, transfer learning, and generative learning to showcase artistic image neural style transfer with hands-on examples on different content images and styles.
Chapter 11, Automated Image Caption Generator, covers one of the most complex problems in computer vision, as well as natural language generation—image captioning. While classifying images into fixed categories is challenging yet not impossible, this is a slightly more complex task involving generating human-like natural language textual captions for any photo or scene. Leveraging the power of transfer learning, natural language processing and generative models, you will learn how to build your own automated image captioning system from scratch.
Chapter 12, Image Colorization, presents a unique case-study where the task is to colorize black and white or grayscale images. This chapter introduces readers to the basics of various color scales and why image colorization is such a difficult task.
It would be great if you have basic proficiency in ML and Python.
An avid interest in data analysis, ML, and deep learning would be beneficial.
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packtpub.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Transfer-Learning-with-Python. If there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/HandsOnTransferLearningwithPython_ColorImages.pdf.
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
This quote may seem exaggerated to the core and difficult to digest, yet, with the pace at which technology and science are improving, who knows? We as a species have always dreamt of creating intelligent, self-aware machines. With recent advancements in research, technology, and the democratization of computing power, artificial intelligence (AI), machine learning (ML), and deep learning have gotten enormous attention and hype amongst technologists and the population in general. Though Hollywood's promised future is debatable, we have started to see and use glimpses of intelligent systems in our daily lives. From intelligent conversational engines, such as Google Now, Siri, Alexa, and Cortana, to self-driving cars, we are gradually accepting such smart technologies in our daily routines.
As we step into the new era of learning machines, it is important to understand that the fundamental ideas and concepts have existed for some time and have constantly been improved upon by intelligent people across the planet. It is well known that 90% of the world's data has been created in just the last couple of years, and we continue to create far more data at ever increasing rates. The realm of ML, deep learning, and AI helps us utilize these massive amounts of data to solve various real-world problems.
This book is divided into three sections. In this first section, we will get started with the basic concepts and terminologies associated with AI, ML, and deep learning, followed by in-depth details on deep learning architectures.
This chapter provides our readers with a quick primer on the basic concepts of ML before we get started with deep learning in subsequent chapters. This chapter covers the following aspects:
Introduction to ML
ML methodologies
CRISP-DM—workflow for ML projects
ML pipelines
Exploratory data analysis
Feature extraction and engineering
Feature selection
Every chapter of the book builds upon concepts and techniques from the previous chapters. Readers who are well-versed with the basics of ML and deep learning may pick and choose the topics as they deem necessary, yet it is advised to go through the chapters sequentially. The code for this chapter is available for quick reference in the Chapter 1 folder in the GitHub repository at https://github.com/dipanjanS/hands-on-transfer-learning-with-python which you can refer to as needed to follow along with the chapter.
We live in a world where our daily routine involves multiple contact points with the digital world. We have computers assisting us with communication, travel, entertainment, and whatnot. The digital online products (apps, websites, software, and so on) that we use seamlessly all the time help us avoid mundane and repetitive tasks. These software have been developed using computer programming languages (like C, C++, Python, Java, and so on) by programmers who have explicitly programmed each instruction to enable these software to perform defined tasks. A typical interaction between a compute device (computer, phone, and so on) and an explicitly programmed software application with inputs and defined outputs is depicted in the following diagram:
Though the current paradigm has been helping us develop amazingly complex software/systems to address tasks from different domains and aspects in a pretty efficient way, they require somebody to define and code explicit rules for such programs to work. These are the tasks that are easy for a computer to solve but difficult or time consuming for humans. For instance, performing complex calculations, storing massive amounts of data, searching through huge databases, and so on are tasks that can be performed efficiently by a computer once the rules are defined.
Yet, there is another class of problems that can be solved intuitively by humans but are difficult to program. Problems like object identification, playing games, and so on are natural to us yet difficult to define with a set of rules. Alan Turing, in his landmark paper Computing Machinery and Intelligence (https://www.csee.umbc.edu/courses/471/papers/turing.pdf), which introduced the Turing test, discussed general purpose computers and whether they could be capable of such tasks.
This new paradigm, which embodies the thoughts about general purpose computing, is what gave rise to AI in a broader sense. This new paradigm, better termed as an ML paradigm, is one where computers or machines learn from experience (analogous to human learning) to solve tasks rather than being explicitly programmed to do so.
AI is thus an encompassing field of research, with ML and deep learning being specific subfields of study within it. AI is a general field that includes other subfields as well, which may or may not involve learning (for instance, see symbolic AI). In this book we will concentrate our time and efforts upon ML and deep learning only. The scope of artificial intelligence, machine learning, and deep learning can be visualized as follows:
A formal definition of ML, as stated by Tom Mitchell, is explained as follows.
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
This definition beautifully captures the essence of what ML is in a very concise manner. Let's take an example from the real world to understand it better. Let's consider a task (T) is to identify spam emails. We may now present many examples (or experiences E) to a system about spam and non-spam emails, from which it learns rather than being explicitly programmed. The program or system may then be measured for its performance (P) on the learned task of identifying spam emails. Interesting, isn't it?
ML is thus the task of identifying patterns from training examples and applying these learned patterns (or representations) to new unseen data. ML is also sometimes termed as shallow learning because of its nature of learning single layered representations (in most cases). This brings us to the questions of what layers of representation are? and what deep learning is? We will answer these questions in the subsequent chapters. Let's have a quick overview of deep learning.
Deep learning is a subfield of ML that is concerned with learning successive meaningful representations from training examples to solve a given task. Deep learning is closely associated with artificial neural networks that consist of multiple layers stacked one after the other, which capture successive representations.
Do not worry if it was difficult to digest and understand, as mentioned, we will cover more in considerable depth in subsequent chapters.
ML has become a buzzword thanks to the amount of data we are generating and collecting along with faster compute. Let's look at ML in more depth in the following sections.
ML is a popular subfield of AI, one which covers a very wide scope. One of the reasons for this popularity is the comprehensive toolbox of sophisticated algorithms, techniques, and methodologies under its gambit. This toolbox has been developed and improved over the years, and new ones are being researched on an ongoing basis. To understand and use the ML toolbox wisely, consider the following few ways of categorizing it.
Categorization based on amount of human supervision:
Supervised learning
: This class of learning involves high-human supervision. The algorithms under supervised learning utilize the training data and associated outputs to learn a mapping between the two and apply the same on unseen data. Classification and regression are two major types of supervised learning algorithms.
Unsupervised learning
: This class of algorithms attempts to learn inherent latent structures, patterns, and relationships from the input data without any associated outputs/labels (human supervision). Clustering, dimensionality reduction, association rule mining, and so on are a few major types of unsupervised learning algorithms.
Semi-supervised learning
: This class of algorithms is a hybrid of supervised and unsupervised learning. In this case, the algorithms work with small amounts of labeled training data and more of unlabeled data. Thus making a creative use of both supervised and unsupervised methods to solve a given task.
Reinforcement learning
: This class of algorithms is a bit different from supervised and unsupervised learning methods. The central entity here is an agent, which trains over a period while interacting with its environment to maximize some reward/award. The agent iteratively learns and changes strategies/policies based on rewards/penalties from interacting with the environment.
Categorization based on data availability:
Batch learning
: This is also termed as
offline learning
. This type of learning is utilized when the required training data is available, and a model can be trained and fine-tuned before deploying into production/real world.
Online learning
: As the name suggests, in this case the learning is not stopped once the data is available. Rather, in this case, data is fed into the system in mini-batches and the training process continues with new batches of data.
The previously discussed categorizations give us an abstract view of how ML algorithms can be organized, understood, and utilized. The most common way to categorize them is into supervised and unsupervised learning algorithms. Let's go into a bit more detail about these two categories as this should help us get started for further advanced topics to be introduced later.
Supervised learning algorithms are a class of algorithms that utilize data samples (also called training samples) and corresponding outputs (or labels) to infer a mapping function between the two. The inferred mapping function or the learned function is the output of this training process. The learned function is then utilized to correctly map new and unseen data points (input elements) to test the performance of the learned function.
Some key concepts for supervised learning algorithms are as follows:
Training dataset
: The training samples and corresponding outputs utilized during the training process are termed as
training data
. Formally, a training dataset is a two-element tuple consisting of an input element (usually a vector) and a corresponding output element or signal.
Test dataset
: The unseen dataset that is utilized to test the performance of the learned function. This dataset is also a two-element tuple containing input data points and corresponding output signals. Data points in this set are not used for the training phase (this dataset is further divided into the validation set as well; we will discuss this in more detail in subsequent chapters).
Learned function
: This is the output of the training phase. Also termed as
inferred function
or the
model
. This function is inferred based on the training examples (input data points and their corresponding outputs) from the training dataset. An ideal model/learned function would learn the mapping in such a way that the results can be generalized for unseen data as well.
There are various supervised learning algorithms available. Based on the use case requirements, they can be majorly categorize into classification and regression models.
In the simplest terms, these algorithms help us answer objective questions or a yes-no prediction. For instance, these algorithms are useful in scenarios like is it going to rain today?, or can this tumour be cancerous?, and so on.
Formally, the key objective of classification algorithms is to predict output labels that are categorical in nature depending upon the input data points. The output labels are categorical in nature; namely, they each belong to a discrete class or category.
Logistic regression, Support Vector Machines (SVMs), Neural Networks, Random Forests, k-Nearest Neighbours (KNN), Decision Trees, and so on are some of the popular classification algorithms.
Suppose we have a real-world use case to evaluate different car models. To keep things simple, let's assume that the model is expected to predict an output for every car model as either acceptable or unacceptable based on multiple input training samples. The input training samples have attributes such as buying price, number of doors, capacity (in number of persons), and safety.
The level apart from the class label denotes each data point as either acceptable or unacceptable. The following diagram depicts the binary classification problem at hand. The classification algorithm takes the training samples as input to prepare a supervised model. This model is then utilized to predict the evaluation label for a new data point:
Since output labels are discrete classes in case of classification problems, if there are only two possible output classes the task is termed as a binary classification problem, and a multi-class classification otherwise. Predicting whether it will rain tomorrow or not would be a binary classification problem (with output being a yes or a no) while predicting a numeric digit from scanned handwritten images would be multi-class classification with 10 labels (zero to nine possible output labels).
This class of supervised learning algorithms helps us answer quantitative questions of the type how many or how much?. Formally, the key objective for regression models is value estimation. In this case, the output labels are continuous in nature (as opposed to being discrete in classification).
In the case of regression problems, the input data points are termed as independent or explanatory variables, while the output is termed as a dependent variable. Regression models are also trained using training data samples consisting of input (or independent) data points along with output (or dependent) signals. Linear regression, multivariate regression, regression trees, and so on are a few supervised regression algorithms.
Simple linear regression models work with single independent and single dependent variables. Ordinary Least Squares (OLS)regression is a popular linear regression model. Multiple regression or multivariate regression is where there is a single dependent variable, while each observation is a vector composed of multiple explanatory variables.
Polynomial regression models are a special case of multivariate regression. Here the dependent variable is modeled to the nth degree of the independent variable. Since polynomial regression models fit or map nonlinear relationships between dependent and independent variables, these are also termed as nonlinear regression models.
The following is an example of linear regression:
To understand different regression types, let's consider a real-world use case of estimating the stopping distance of a car, based on its speed. Here, based on the training data we have, we can model the stopping distance as a linear function of speed or as a polynomial function of the speed of the car. Remember that the main objective is to minimize the error without overfitting the training data itself.
The preceding graph depicts a linear fit while the following one depicts a polynomial fit for the the same dataset:
As the name suggests, this class of algorithms learns/infers concepts without supervision. Unlike supervised learning algorithms, which infer a mapping function based on training dataset consisting of input data points and output signals, unsupervised algorithms are tasked with finding patterns and relationships in the training data without any output signals available in the training dataset. This class of algorithms utilizes the input dataset to detect patterns, and mine for rules or group/cluster data points so as to extract meaningful insights from the raw input dataset.
Unsupervised algorithms come in handy when we do not have the liberty of a training set that contains corresponding output signals or labels. In many real-world scenarios, datasets are available without output signals and it is difficult to manually label them. Thus, unsupervised algorithms are helpful in plugging such gaps.
Similar to supervised learning algorithms, unsupervised algorithms can also be categorized for ease of understanding and learning. The following are different categories of unsupervised learning algorithms.
The unsupervised equivalent of classification is termed as clustering. These algorithms help us cluster or group data points into different groups or categories, without the availability of any output label in the input/training dataset. These algorithms try to find patterns and relationships from the input dataset, utilizing inherent features to group them into various groups based on some similarity measure, as shown in the following diagram:
A real-world example to help understand clustering could be news articles. There are hundreds of news articles written daily, each catering to different topics ranging from politics and sports to entertainment, and so on. An unsupervised approach to group these articles together can be achieved using clustering, as shown in the preceding figure.
There are different approaches to perform the process of clustering. The most popular ones are:
Centroid based methods. Popular ones are K-means and K-medoids.
Agglomerative and divisive hierarchical clustering methods. Popular ones are Ward's and affinity propagation.
Data distribution based methods, for instance, Gaussian mixture models.
Density based methods such as DBSCAN and so on.
Data and ML are the best of friends, yet a lot of issues come with more and bigger data. A large number of attributes or a bloated-up feature space is one common problem. A large feature space poses problems in analyzing and visualizing the data along with issues related to training, memory, and space constraints. This is also known as the curse of dimensionality. Since unsupervised methods help us extract insights and patterns from unlabeled training datasets, they are also useful in helping us reduce dimensionality.
In other words, unsupervised methods help us reduce feature space by helping us select a representative set of features from the complete available list:
Principal Component Analysis (PCA), nearest neighbors, and discriminant analysis are some of the popular dimensionality reduction techniques.
The preceding diagram is a famous depiction of the workings of the PCA based dimensionality reduction technique. It shows a swiss roll shape with data represented in three-dimensional space. Application of PCA results in transformation of the data into two-dimensional space, as shown on the right-hand side of the diagram.
This class of unsupervised ML algorithms helps us understand and extract patterns from transactional datasets. Also termed as Market Basket Analysis (MBA), these algorithms help us identify interesting relationships and associations between items across transactions.
Using association rule mining, we can answer questions like what items are bought together by people at a given store?, or dopeople who buy wine also tend to buy cheese?, and many more. FP-growth, ECLAT, and Apriori are some of the most widely used algorithms for association rule mining tasks.
Anomaly detection is the task of identifying rare events/observations based on historical data. Anomaly detection is also termed as outlier detection. Anomalies or outliers usually have characteristics such as being infrequent or occurring in short sudden bursts over time.
For such tasks, we provide a historical dataset for the algorithm so it can identify and learn the normal behavior of data in an unsupervised manner. Once learned, the algorithm helps us identify patterns that differ from this learned behavior.
Cross Industry Standard Process for Data Mining (CRISP-DM) is one of the most popular and widely used processes for data mining and analytics projects. CRISP-DM provides the required framework, which clearly outlines the necessary steps and workflows for executing a data mining and analytics project, from business requirements to the final deployment stages and everything in between.
More popularly known by the acronym itself, CRISP-DM is a tried, tested, and robust industry standard process model followed for data mining and analytics projects. CRISP-DM clearly depicts the necessary steps, processes, and workflows for executing any project, right from formalizing business requirements to testing and deploying a solution to transform data into insights. Data science, data mining, and ML are all about trying to run multiple iterative processes to extract insights and information from data. Hence, we can say that analyzing data is truly both an art as well as a science, because it is not always about running algorithms without reason; a lot of the major effort involves understanding the business, the actual value of the efforts being invested, and proper methods for articulating end results and insights.
Data science and data mining projects are iterative in nature to extract meaningful insights and information from data. Data science is as much art as science and thus a lot of time is spent understanding the business value and the data at hand before applying the actual algorithms (these again go through multiple iterations) and finally evaluations and deployment.
Similar to software engineering projects, which have different life cycle models, CRISP-DM helps us track a data mining and analytics project from start to end. This model is divided into six major steps that cover from aspects of business and data understanding to evaluation and finally deployment, all of which are iterative in nature. See the following diagram:
Let's now have a deeper look into each of the six stages to better understand the CRISP-DM model.
The first and the foremost step is understanding the business. This crucial step begins with setting the business context and requirements for the problem. Defining the business requirements formally is important to transform them into a data science and analytics problem statement. This step also used to set the expectations and success criteria for both business and data science teams to be on the same page and track the progress of the project.
The main deliverable of this step is a detailed plan consisting of major milestones, timelines, assumptions, constraints, caveats, issues expected, and success criteria.
Data collection and understanding is the second step in the CRISP-DM framework. In this step we take a deeper dive to understand and analyze the data for the problem statement formalized in the previous step. This step begins with investigating the various sources of data outlined in the detailed project plan previously. These sources of data are then used to collect data, analyze different attributes, and make a note of data quality. This step also involves what is generally termed as exploratory data analysis.
Exploratory data analysis (EDA) is a very important sub-step. It is during EDA we analyze different attributes of data, their properties and characteristics. We also visualize data during EDA for a better understanding and uncovering patterns that might be previously unseen or ignored. This step lays down the foundation for the coming step and hence this step cannot be neglected at all.
This is the third and the most time-consuming step in any data science project. Data preparation takes place once we have understood the business problem and explored the data available. This step involves data integration, cleaning, wrangling, feature selection, and feature engineering. First and the foremost is data integration. There are times when data is available from various sources and hence needs to be combined based on certain keys or attributes for better usage.
Data cleaning and wrangling are very important steps. This involves handling missing values, data inconsistencies, fixing incorrect values, and converting data to ingestible formats such that they can be used by ML algorithms.
Data preparation is the most time-consuming step, taking over 60-70% of the overall time taken for any data science project. Apart from data integration and wrangling, this step involves selecting key features based on relevance, quality, assumptions, and constraints. This is also termed as feature selection. There are also times when we have to derive or generate features from existing ones. For example, deriving age from date of birth and so on, depending upon the use case requirements. This step is termed as feature engineering and is again required based on use case.
The fourth step or the modeling step is where the actual analysis and ML takes place. This step utilizes the clean and formatted data prepared in the previous step for modeling purposes. This is an iterative process and works in sync with the data preparation step as models/algorithms require data in different settings/formats with varying set of attributes.
This step involves selecting relevant tools and frameworks along with the selection of a modeling technique or algorithms. This step includes model building, evaluation, and fine-tuning of models, based on the expectations and criteria laid down during the business understanding phase.
Once the modeling step results in a model(s) that satisfies the success criteria, performance benchmarks, and model evaluation metrics, a thorough evaluation step comes into picture. In this step, we consider the following activities before moving ahead with the deployment stage:
Model result assessment based on quality and alignment with business objectives
Identifying any additional assumptions made or constraints relaxed
Data quality, missing information, and other feedback from data science team and/or
subject matter experts
(
SMEs
)
Cost of deployment of the end-to-end ML solution
The final step of the CRISP-DM model is deployment to production. The models that have been developed, fined-tuned, validated, and tested during multiple iterations are saved and prepared for production environment. A proper deployment plan is built, which includes details on hardware and software requirements. The deployment stage also includes putting in place checks and monitoring aspects to evaluate the model in production for results, performance, and other metrics.
The CRISP-DM model provides a high-level workflow for management of ML and related projects. In this section, we will discuss the technical aspects and implementation of standard workflows for handling ML projects. Simply put, an ML pipeline is an end-to-end workflow consisting of various aspects of a data intensive project. Once the initial phases such as business understanding, risk assessments, and ML or data mining techniques selection have been covered, we proceed towards the solution space of driving the project. A typical ML pipeline or workflow with different sub-components is shown in the following diagram:
A standard ML pipeline broadly consists of the following stages.
Data collection and extraction is where the story usually begins. Datasets come in all forms including structured and unstructured data that often includes missing or noisy data. Each data type and format needs special mechanisms for data handling as well as management. For instance, if a project concerns analysis of tweets, we need to work with Twitter APIs and develop mechanisms to extract the required tweets, which are usually in JSON format.
Other scenarios may involve already existing structured or unstructured public datasets or private ones, both may require additional permissions apart from just developing extraction mechanisms. A fairly detailed account pertaining to working with diverse data formats is discussed in Chapter 3 of the book Practical Machine Learning with Python, Sarkar and their co-authors, Springer, 2017 in case you are interested in diving deeper into further details.
