Hands-On Transfer Learning with Python - Dipanjan Sarkar - E-Book

Hands-On Transfer Learning with Python E-Book

Dipanjan Sarkar

0,0
34,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Transfer learning is a machine learning (ML) technique where knowledge gained during training a set of problems can be used to solve other similar problems.

The purpose of this book is two-fold; firstly, we focus on detailed coverage of deep learning (DL) and transfer learning, comparing and contrasting the two with easy-to-follow concepts and examples. The second area of focus is real-world examples and research problems using TensorFlow, Keras, and the Python ecosystem with hands-on examples.

The book starts with the key essential concepts of ML and DL, followed by depiction and coverage of important DL architectures such as convolutional neural networks (CNNs), deep neural networks (DNNs), recurrent neural networks (RNNs), long short-term memory (LSTM), and capsule networks. Our focus then shifts to transfer learning concepts, such as model freezing, fine-tuning, pre-trained models including VGG, inception, ResNet, and how these systems perform better than DL models with practical examples. In the concluding chapters, we will focus on a multitude of real-world case studies and problems associated with areas such as computer vision, audio analysis and natural language processing (NLP).

By the end of this book, you will be able to implement both DL and transfer learning principles in your own systems.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 481

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Hands-On Transfer Learning with Python
Implement advanced deep learning and neural network models using TensorFlow and Keras
Dipanjan Sarkar
Raghav Bali
Tamoghna Ghosh
BIRMINGHAM - MUMBAI

Hands-On Transfer Learning with Python

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Sunith ShettyAcquisition Editor: Tushar GuptaContent Development Editor:Unnati GuhaTechnical Editor: Sayli NikaljeCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Rekha NairGraphics: Jisha ChirayilProduction Coordinator: Shantanu Zagade

First published: August 2018

Production reference: 1300818

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78883-130-7

www.packtpub.com

This book wouldn't have been possible without several people who made this from a mere concept into reality. I would like to thank my parents, Digbijoy and Sampa, my partner, Durba, my pets, family, and friends for supporting me constantly in my endeavors. A big thank you to the entire team at Packt especially Tushar, Sayli, and Unnati for working tirelessly and supporting us throughout our journey. Also thanks to Matthew Mayo for gracing our book with his foreword and doing great things with KDnuggets.

Thanks to Adrian Rosebrock and PyImageSearch for some excellent visuals and content around pretrained models for computer vision, Federico Baldassarre, Diego Gonzalez-Morin, Lucas Rodes-Guirao, and Emil Wallner for some excellent strategies and implementations for image colorization, Anurag Mishra for giving tips for build an efficient image captioning model, François Chollet for building Keras and writing some very useful and engaging content on transfer learning and to the entire Python AI eco-system for helping the community democratize deep learning and artificial intelligence for everyone.

Finally, I would like to thank my managers and mentors Gopalan, Sanjeev, and Nagendra and all my friends and colleagues at Intel for encouraging me and giving me the opportunity to explore new domains in the world of AI. Shoutout also to the folks from Springboard, especially Srdjan Santic for not just giving me an opportunity to learn and interact with some amazing people but also for the passion, zeal, and vision of educating more people on Data Science and AI. Towards Data Science and Ludovic Benistant thanks for helping me learn and share more about AI to the rest of the world and helping me explore cutting-edge research and work in these domains. Last but not the least, I owe a ton of gratitude to my co-authors Raghav and Tamoghna and our reviewer Nitin Panwar for embarking on this journey with me and without whom this book wouldn't have been possible!

– Dipanjan Sarkar

I would like to take this opportunity to express gratitude to my parents, Sunil and Neeru, my wife, Swati, my brother, Rajan, family, teachers, friends, colleagues, and mentors who have encouraged, supported and taught me over the years. I would also like to thank my co-authors and good friends Dipanjan Sarkar and Tamoghna Ghosh, for taking me along on this amazing journey. A big thanks to my managers and mentors Vineet, Ravi, and Vamsi along with all my teammates at Optum for their support and encouragement to explore new domains in the Data Science world.

I would like to thank Tushar Gupta, Aaryaman Singh, Sayli Nikalje, Unnati Guha, and Packt for the opportunity and their support throughout this journey. This book wouldn't have been complete without Nitin Panwar's insightful feedback and suggestions. Last but not the least, special thanks to François Chollet for Keras, the Python ecosystem and community, fellow authors and researchers who are striving every day to bring these amazing technologies and tools at our fingertips.

– Raghav Bali

I would like to thank entire Packt team for giving me this unique opportunity and also guiding me throughout the journey. For this book my co-authors here acted as my mentor as well. They helped me with their insightful suggestions and guidance. Thanks to Nitin for patiently reviewing this book and providing great feedback. I would like to thank my wife Doyel, my son Anurag, and my parents for being a constant source of inspiration for me and tolerating me for working extended hours. Also, I am grateful to my Intel managers for their encouragement and support.

– Tamoghna Ghosh

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Foreword

Chances are you are familiar with the recent and seemingly endless machine learning innovations, but do you know about what goes into training a machine learning model? Generally, a given machine learning model is trained on specific data for a particular task. This training process can be exceptionally resource and time-consuming, and since the resulting models are task-specific, the maximum potential of the resulting model is not realized.

Optimally-performing neural network models, for example, are often the result of many iterations of fine-tuning from researchers or practitioners. Could these trained models not be additionally exploited for a wider assortment of tasks? Transfer learning involves the leveraging of existing machine learning models for use in scenarios in which the models were not originally trained.

Much as humans do not discard everything they have previously learned and start a fresh each time they take up a new task, transfer learning allows a machine learning model to port the knowledge it has acquired during training to new tasks, extending the reach of the combination of computation and expertise having been used as fuel for the original model. Simply put, transfer learning can save training time and extend the usefulness of existing machine learning models. It is also an invaluable technique for tasks where the large amounts of training data typically required for training a model from scratch are not available.

Becoming familiar with complex concepts and implementing these concepts in practice are two very different things, and this is where Hands-On Transfer Learning with Python shines. The book starts with a deep dive into both deep learning and transfer learning, conceptually. This is followed by practical implementations of these concepts with real-world examples and research problems, using modern deep learning tools from the Python ecosystem, such as TensorFlow and Keras. Dipanjan, Raghav, and Tamoghna excel at elegantly marrying the theoretical and the practical, a remarkable advantage for the reader of such a well-crafted publication.

Transfer learning has shown much promise of late in many domains, and is a very active area of contemporary machine learning research. If you are looking for a complete guide to both deep learning and transfer learning, starting from zero, Hands-On Transfer Learning with Python should be your first stop.

Matthew Mayo

Editor, KDnuggets

@mattmayo13

Contributors

About the authors

Dipanjan (DJ) Sarkar is a Data Scientist at Intel, leveraging data science, machine learning, and deep learning to build large-scale intelligent systems. He holds a master of technology degree with specializations in Data Science and Software Engineering.

He has been an analytics practitioner for several years now, specializing in machine learning, NLP, statistical methods, and deep learning. He is passionate about education and also acts as a Data Science Mentor at various organizations like Springboard, helping people learn data science. He is also a key contributor and editor for Towards Data Science, a leading online journal on AI and Data Science. He has also authored several books on R, Python, machine learning, NLP, and deep learning.

Raghav Bali is a Data Scientist at Optum (United Health Group). His work involves research and development of enterprise-level solutions based on machine learning, deep learning, and NLP for Healthcare and Insurance related use cases. In his previous role at Intel, he was involved in enabling proactive data driven IT initiatives. He has also worked in ERP and finance domains with some of the leading organizations in the world. Raghav has also authored multiple books with leading publishers.

Raghav has a master's degree (gold medalist) in Information Technology from International Institute of Information Technology, Bangalore. He loves reading and is a shutterbug capturing moments when he isn't busy solving problems.

Tamoghna Ghosh is a machine learning engineer at Intel Corporation. He has overall 11 years of work experience including 4 years of core research experience at Microsoft Research (MSR) India. At MSR he worked as a research assistant in cryptanalysis of block ciphers.

His technical expertise's are in big data, machine learning, NLP, information retrieval, data visualization and software development. He received M.Tech (Computer Science) degree from the Indian Statistical Institute, Kolkata and M.Sc. (Mathematics) from University of Calcutta with specialization in functional analysis and mathematical modeling/dynamical systems. He is passionate about teaching and conducts internal training in data science for Intel at various levels.

About the reviewer

Nitin Panwar has a master's degree in Computer Science from the Indian Institute of Information technology, Gwalior. He is a Technical Lead (data science) at Naukri, India's No.1 job site, where he works on data science, machine learning, and text analytics. He has also worked as a data scientist at Intel, the world's largest silicon company. Nitin's interests include learning about new technology, AI-powered start-ups, and data science.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Hands-On Transfer Learning with Python

Dedication

Packt Upsell

Why subscribe?

PacktPub.com

Foreword

Contributors

About the authors

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Machine Learning Fundamentals

Why ML?

Formal definition

Shallow and deep learning

ML techniques

Supervised learning

Classification

Regression

Unsupervised learning

Clustering

Dimensionality reduction

Association rule mining

Anomaly detection

CRISP-DM

Business understanding

Data understanding

Data preparation

Modeling

Evaluation

Deployment

Standard ML workflow

Data retrieval

Data preparation

Exploratory data analysis

Data processing and wrangling

Feature engineering and extraction

Feature scaling and selection

Modeling

Model evaluation and tuning

Model evaluation

Bias variance trade-off

Bias

Variance

Trade-off

Underfitting

Overfitting

Generalization

Model tuning

Deployment and monitoring

Exploratory data analysis

Feature extraction and engineering

Feature engineering strategies

Working with numerical data

Working with categorical data

Working with image data

Deep learning based automated feature extraction

Working with text data

Text preprocessing

Feature engineering

Feature selection

Summary

Deep Learning Essentials

What is deep learning?

Deep learning frameworks

Setting up a cloud-based deep learning environment with GPU support

Choosing a cloud provider

Setting up your virtual server

Configuring your virtual server

Installing and updating deep learning dependencies

Accessing your deep learning cloud environment

Validating GPU-enablement on your deep learning environment

Setting up a robust, on-premise deep learning environment with GPU support

Neural network basics

A simple linear neuron

Gradient-based optimization

The Jacobian and Hessian matrices

Chain rule of derivatives

Stochastic Gradient Descent

Non-linear neural units

Learning a simple non-linear unit – logistic unit

Loss functions

Data representations

Tensor examples

Tensor operations

Multilayered neural networks

Backprop – training deep neural networks

Challenges in neural network learning

Ill-conditioning

Local minima and saddle points

Cliffs and exploding gradients

Initialization – bad correspondence between the local and global structure of the objective

Inexact gradients

Initialization of model parameters

Initialization heuristics

Improvements of SGD

The momentum method

Nesterov momentum

Adaptive learning rate – separate for each connection

AdaGrad

RMSprop

Adam

Overfitting and underfitting in neural networks

Model capacity

How to avoid overfitting – regularization

Weight-sharing

Weight-decay

Early stopping

Dropout

Batch normalization

Do we need more data?

Hyperparameters of the neural network

Automatic hyperparameter tuning

Grid search

Summary

Understanding Deep Learning Architectures

Neural network architecture

Why different architectures are needed

Various architectures

MLPs and deep neural networks

Autoencoder neural networks

Variational autoencoders

Generative Adversarial Networks

Text-to-image synthesis using the GAN architecture

CNNs

The convolution operator

Stride and padding mode in convolution

The convolution layer

LeNet architecture

AlexNet

ZFNet

GoogLeNet (inception network)

VGG

Residual Neural Networks

Capsule networks

Recurrent neural networks

LSTMs

Stacked LSTMs

Encoder-decoder – Neural Machine Translation

Gated Recurrent Units

Memory Neural Networks

MemN2Ns

Neural Turing Machine

Selective attention

Read operation

Write operation

The attention-based neural network model

Summary

Transfer Learning Fundamentals

Introduction to transfer learning

Advantages of transfer learning

Transfer learning strategies

Transfer learning and deep learning

Transfer learning methodologies

Feature-extraction

Fine-tuning

Pretrained models

Applications

Deep transfer learning types

Domain adaptation

Domain confusion

Multitask learning

One-shot learning

Zero-shot learning

Challenges of transfer learning

Negative transfer

Transfer bounds

Summary

Unleashing the Power of Transfer Learning

The need for transfer learning

Formulating our real-world problem

Building our dataset

Formulating our approach

Building CNN models from scratch

Basic CNN model

CNN model with regularization

CNN model with image augmentation

Leveraging transfer learning with pretrained CNN models

Understanding the VGG-16 model

Pretrained CNN model as a feature extractor

Pretrained CNN model as a feature extractor with image augmentation

Pretrained CNN model with fine-tuning and image augmentation

Evaluating our deep learning models

Model predictions on a sample test image

Visualizing what a CNN model perceives

Evaluation model performance on test data

Summary

Image Recognition and Classification

Deep learning-based image classification

Benchmarking datasets

State-of-the-art deep image classification models

Image classification and transfer learning

CIFAR-10

Building an image classifier

Transferring knowledge

Dog Breed Identification dataset

Exploratory analysis

Data preparation

Dog classifier using transfer learning

Summary

Text Document Categorization

Text categorization

Traditional text categorization

Shortcomings of BoW models

Benchmark datasets

Word representations

Word2vec model

Word2vec using gensim

GloVe model

CNN document model

Building a review sentiment classifier

What has embedding changed most?

Transfer learning – application to the IMDB dataset

Training on the full IMDB dataset with Word2vec embeddings

Creating document summaries with CNN model

Multiclass classification with the CNN model

Visualizing document embeddings

Summary

Audio Event Identification and Classification

Understanding audio event classification

Formulating our real-world problem

Exploratory analysis of audio events

Feature engineering and representation of audio events

Audio event classification with transfer learning

Building datasets from base features

Transfer learning for feature extraction

Building the classification model

Evaluating the classifier performance

Building a deep learning audio event identifier

Summary

DeepDream

Introduction

Algorithmic pareidolia in computer vision

Visualizing feature maps

DeepDream

Examples

Summary

Style Transfer

Understanding neural style transfer

Image preprocessing methodology

Building loss functions

Content loss

Style loss

Total variation loss

Overall loss function

Constructing a custom optimizer

Style transfer in action

Summary

Automated Image Caption Generator

Understanding image captioning

Formulating our objective

Understanding the data

Approach to automated image captioning

Conceptual approach

Practical hands-on approach

Image feature extractor – DCNN model with transfer learning

Text caption generator – sequence-based language model with LSTM

Encoder-decoder model

Image feature extraction with transfer learning

Building a vocabulary for our captions

Building an image caption dataset generator

Building our image language encoder-decoder deep learning model

Training our image captioning deep learning model

Evaluating our image captioning deep learning model

Loading up data and models

Understanding greedy and beam search

Implementing a beam search-based caption generator

Understanding and implementing BLEU scoring

Evaluating model performance on test data

Automated image captioning in action!

Captioning sample images from outdoor scenes

Captioning sample images from popular sports

Future scope for improvement

Summary

Image Colorization

Problem statement

Color images

Color theory

Color models and color spaces

RGB

YUV

LAB

Problem statement revisited

Building a coloring deep neural network

Preprocessing

Standardization

Loss function

Encoder

Transfer learning – feature extraction

Fusion layer

Decoder

Postprocessing

Training and results

Challenges

Further improvements

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

With the world moving towards digitization and automation, as a technologist/programmer it is important to keep oneself updated and learn how to leverage these tools and techniques. This book, Hands-On Transfer Learning with Python, is an attempt to help practitioners get acquainted with and equipped to use these advancements in their respective domains. This book is structured broadly into three sections:

Deep learning foundations

Essentials of transfer learning

Transfer learning case studies

Transfer learning is a machine learning (ML) technique where knowledge gained during the training of one set of ML problems can be used to train other similar types of problems.

The purpose of this book is two-fold. We will focus on detailed coverage of deep learning and transfer learning, comparing and contrasting the two with easy-to-follow concepts and examples. The second area of focus will be on real-world examples and research problems using TensorFlow, Keras, and the Python ecosystem with hands-on examples.

The book starts with core essential concepts of ML and deep learning, followed by some depictions and coverage of important deep learning architectures, such as CNNs, DNNs, RNNs, LSTMs, and capsule networks. Our focus then shifts to transfer learning concepts and pretrained state of the art networks such as VGG, Inception, and ResNet. We also learn how these systems can be leveraged to improve performance of our deep learning models. Finally, we focus on a multitude of real-world case studies and problems in areas such as computer vision, audio analysis, and natural language processing (NLP).

By the end of this book, you will be all ready to implement both deep learning and transfer learning principles in your own systems.

Who this book is for

Hands-On Transfer Learning with Python is for data scientists, ML engineers, analysts, and developers with an interest in data and applying state-of-the-art transfer learning methodologies to solve tough real-world problems.

Basic proficiency in ML and Python is required.

What this book covers

Chapter 1, Machine Learning Fundamentals, introduces the CRISP-DM model, which presents an industry standard framework/workflow for any data science, ML, or deep learning project. We will also touch upon various important concepts covering the fundamentals in the ML landscape such as exploratory data analysis, feature extraction and engineering, evaluation metrics, and so on.

Chapter 2, Deep Learning Essentials, provides a whirlwind tour of deep learning essentials, providing an overview of the basic building blocks of neural networks and also how deep neural networks are trained. Starting from the basics of how a single neural unit works, important concepts like activation functions, loss functions, optimizers and neural network hyperparameters are covered. Special focus is also emphasized on setting up on-premise and cloud-based deep learning environments.

Chapter 3, Understanding Deep Learning Architectures, focuses on understanding the various standard model architectures present today in deep learning. We have come a long way since traditional ANNs in the 1960s, and essential model architectures such as fully connected deep neural networks (DNNs), Convolutional Neural Networks (CNNs), recurrent neural networks (RNNs), Long-Short Term Memory (LSTM) networks, and the most recent Capsule Networks will be covered, to name a few.

Chapter 4, Transfer Learning Fundamentals, looks at the core concepts, terminology, and model architectures associated with the concept of transfer learning. Concepts and architectures pertaining to pretrained models will be discussed in detail. We will also compare and contrast transfer learning with deep learning and talk about types and strategies of transfer learning.

Chapter 5, Unleashing the Power of Transfer Learning, takes an actual example with a dataset from Kaggle, leverages deep learning models on it, and gives readers an understanding of the challenges faced when we have a small number of data points, and how transfer learning can unleash its true power and potential to give us superior models in these scenarios. We will tackle the very popular dogs and cats classification task here with the twist of a less data availability constraint.

Chapter 6, Image Recognition and Classification, is the first in a series of real-world applications/case studies of concepts discussed in detail in the previous two parts of the book. The chapter begins with an introduction to the task of image classification, and goes on to discuss and implement some of the popular, state-of-the-art deep learning models on diverse image classification problems.

Chapter 7, Text Document Categorization, discusses the application of transfer learning to a very popular natural language processing problem, text document categorization. The chapter begins with a high-level introduction to the multi-class text classification problem, traditional models, benchmark text classification datasets such as 20-newsgroups and performance. Later, it introduces deep learning document models for text classification and their advantages over traditional models. We learn about word feature representation using dense vectors and how to leverage the same for applying transfer learning in our text categorization problem such that our source and target domains might be different. Other unsupervised tasks like document summarization are also depicted.

Chapter 8, Audio Identification and Classification, solves the tough problem of identifying and classifying very short audio clips. Here we leverage transfer learning using some innovation of applying the power of pre-trained deep learning models from the computer vision domain into a totally different domain of audio identification.

Chapter 9, DeepDream, focuses on a gentle introduction to the domain of generative deep learning, which is one of the core ideas at the forefront of true artificial intelligence. We will be focusing on how convnets (CNNs) think or dream and visualize patterns in images by leveraging transfer learning. First released by Google in 2015, it became a viral sensation due to the interesting patterns deep networks started to generate from images, as if thinking and dreaming on their own!

Chapter 10, Style Transfer, leverages concepts from deep learning, transfer learning, and generative learning to showcase artistic image neural style transfer with hands-on examples on different content images and styles.

Chapter 11, Automated Image Caption Generator, covers one of the most complex problems in computer vision, as well as natural language generation—image captioning. While classifying images into fixed categories is challenging yet not impossible, this is a slightly more complex task involving generating human-like natural language textual captions for any photo or scene. Leveraging the power of transfer learning, natural language processing and generative models, you will learn how to build your own automated image captioning system from scratch.

Chapter 12, Image Colorization, presents a unique case-study where the task is to colorize black and white or grayscale images. This chapter introduces readers to the basics of various color scales and why image colorization is such a difficult task.

To get the most out of this book

It would be great if you have basic proficiency in ML and Python.

An avid interest in data analysis, ML, and deep learning would be beneficial.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packtpub.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Transfer-Learning-with-Python. If there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/HandsOnTransferLearningwithPython_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Machine Learning Fundamentals

One day the AIs are going to look back on us the same way we look at fossil skeletons on the plains of Africa. An upright ape living in dust with crude language and tools, all set for extinction.
– Nathan Bateman, Ex Machina (Movie 2014)

This quote may seem exaggerated to the core and difficult to digest, yet, with the pace at which technology and science are improving, who knows? We as a species have always dreamt of creating intelligent, self-aware machines. With recent advancements in research, technology, and the democratization of computing power, artificial intelligence (AI), machine learning (ML), and deep learning have gotten enormous attention and hype amongst technologists and the population in general. Though Hollywood's promised future is debatable, we have started to see and use glimpses of intelligent systems in our daily lives. From intelligent conversational engines, such as Google Now, Siri, Alexa, and Cortana, to self-driving cars, we are gradually accepting such smart technologies in our daily routines.

As we step into the new era of learning machines, it is important to understand that the fundamental ideas and concepts have existed for some time and have constantly been improved upon by intelligent people across the planet. It is well known that 90% of the world's data has been created in just the last couple of years, and we continue to create far more data at ever increasing rates. The realm of ML, deep learning, and AI helps us utilize these massive amounts of data to solve various real-world problems.

This book is divided into three sections. In this first section, we will get started with the basic concepts and terminologies associated with AI, ML, and deep learning, followed by in-depth details on deep learning architectures.

This chapter provides our readers with a quick primer on the basic concepts of ML before we get started with deep learning in subsequent chapters. This chapter covers the following aspects:

Introduction to ML

ML methodologies

CRISP-DM—workflow for ML projects

ML pipelines

Exploratory data analysis

Feature extraction and engineering

Feature selection

Every chapter of the book builds upon concepts and techniques from the previous chapters. Readers who are well-versed with the basics of ML and deep learning may pick and choose the topics as they deem necessary, yet it is advised to go through the chapters sequentially. The code for this chapter is available for quick reference in the Chapter 1 folder in the GitHub repository at https://github.com/dipanjanS/hands-on-transfer-learning-with-python which you can refer to as needed to follow along with the chapter.

Why ML?

We live in a world where our daily routine involves multiple contact points with the digital world. We have computers assisting us with communication, travel, entertainment, and whatnot. The digital online products (apps, websites, software, and so on) that we use seamlessly all the time help us avoid mundane and repetitive tasks. These software have been developed using computer programming languages (like C, C++, Python, Java, and so on) by programmers who have explicitly programmed each instruction to enable these software to perform defined tasks. A typical interaction between a compute device (computer, phone, and so on) and an explicitly programmed software application with inputs and defined outputs is depicted in the following diagram:

Tradition programming paradigm

Though the current paradigm has been helping us develop amazingly complex software/systems to address tasks from different domains and aspects in a pretty efficient way, they require somebody to define and code explicit rules for such programs to work. These are the tasks that are easy for a computer to solve but difficult or time consuming for humans. For instance, performing complex calculations, storing massive amounts of data, searching through huge databases, and so on are tasks that can be performed efficiently by a computer once the rules are defined.

Yet, there is another class of problems that can be solved intuitively by humans but are difficult to program. Problems like object identification, playing games, and so on are natural to us yet difficult to define with a set of rules. Alan Turing, in his landmark paper Computing Machinery and Intelligence (https://www.csee.umbc.edu/courses/471/papers/turing.pdf), which introduced the Turing test, discussed general purpose computers and whether they could be capable of such tasks.

This new paradigm, which embodies the thoughts about general purpose computing, is what gave rise to AI in a broader sense. This new paradigm, better termed as an ML paradigm, is one where computers or machines learn from experience (analogous to human learning) to solve tasks rather than being explicitly programmed to do so.

AI is thus an encompassing field of research, with ML and deep learning being specific subfields of study within it. AI is a general field that includes other subfields as well, which may or may not involve learning (for instance, see symbolic AI). In this book we will concentrate our time and efforts upon ML and deep learning only. The scope of artificial intelligence, machine learning, and deep learning can be visualized as follows:

Scope of artificial learning, with machine learning, and deep learning as its subfields

Formal definition

A formal definition of ML, as stated by Tom Mitchell, is explained as follows.

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

This definition beautifully captures the essence of what ML is in a very concise manner. Let's take an example from the real world to understand it better. Let's consider a task (T) is to identify spam emails. We may now present many examples (or experiences E) to a system about spam and non-spam emails, from which it learns rather than being explicitly programmed. The program or system may then be measured for its performance (P) on the learned task of identifying spam emails. Interesting, isn't it?

Shallow and deep learning

ML is thus the task of identifying patterns from training examples and applying these learned patterns (or representations) to new unseen data. ML is also sometimes termed as shallow learning because of its nature of learning single layered representations (in most cases). This brings us to the questions of what layers of representation are? and what deep learning is? We will answer these questions in the subsequent chapters. Let's have a quick overview of deep learning.

Deep learning is a subfield of ML that is concerned with learning successive meaningful representations from training examples to solve a given task. Deep learning is closely associated with artificial neural networks that consist of multiple layers stacked one after the other, which capture successive representations.

Do not worry if it was difficult to digest and understand, as mentioned, we will cover more in considerable depth in subsequent chapters.

ML has become a buzzword thanks to the amount of data we are generating and collecting along with faster compute. Let's look at ML in more depth in the following sections.

ML techniques

ML is a popular subfield of AI, one which covers a very wide scope. One of the reasons for this popularity is the comprehensive toolbox of sophisticated algorithms, techniques, and methodologies under its gambit. This toolbox has been developed and improved over the years, and new ones are being researched on an ongoing basis. To understand and use the ML toolbox wisely, consider the following few ways of categorizing it.

Categorization based on amount of human supervision:

Supervised learning

: This class of learning involves high-human supervision. The algorithms under supervised learning utilize the training data and associated outputs to learn a mapping between the two and apply the same on unseen data. Classification and regression are two major types of supervised learning algorithms.

Unsupervised learning

: This class of algorithms attempts to learn inherent latent structures, patterns, and relationships from the input data without any associated outputs/labels (human supervision). Clustering, dimensionality reduction, association rule mining, and so on are a few major types of unsupervised learning algorithms.

Semi-supervised learning

: This class of algorithms is a hybrid of supervised and unsupervised learning. In this case, the algorithms work with small amounts of labeled training data and more of unlabeled data. Thus making a creative use of both supervised and unsupervised methods to solve a given task.

Reinforcement learning

: This class of algorithms is a bit different from supervised and unsupervised learning methods. The central entity here is an agent, which trains over a period while interacting with its environment to maximize some reward/award. The agent iteratively learns and changes strategies/policies based on rewards/penalties from interacting with the environment.

Categorization based on data availability:

Batch learning

: This is also termed as

offline learning

. This type of learning is utilized when the required training data is available, and a model can be trained and fine-tuned before deploying into production/real world.

Online learning

: As the name suggests, in this case the learning is not stopped once the data is available. Rather, in this case, data is fed into the system in mini-batches and the training process continues with new batches of data.

The previously discussed categorizations give us an abstract view of how ML algorithms can be organized, understood, and utilized. The most common way to categorize them is into supervised and unsupervised learning algorithms. Let's go into a bit more detail about these two categories as this should help us get started for further advanced topics to be introduced later.

Supervised learning

Supervised learning algorithms are a class of algorithms that utilize data samples (also called training samples) and corresponding outputs (or labels) to infer a mapping function between the two. The inferred mapping function or the learned function is the output of this training process. The learned function is then utilized to correctly map new and unseen data points (input elements) to test the performance of the learned function.

Some key concepts for supervised learning algorithms are as follows:

Training dataset

: The training samples and corresponding outputs utilized during the training process are termed as

training data

. Formally, a training dataset is a two-element tuple consisting of an input element (usually a vector) and a corresponding output element or signal.

Test dataset

: The unseen dataset that is utilized to test the performance of the learned function. This dataset is also a two-element tuple containing input data points and corresponding output signals. Data points in this set are not used for the training phase (this dataset is further divided into the validation set as well; we will discuss this in more detail in subsequent chapters).

Learned function

: This is the output of the training phase. Also termed as

inferred function

or the

model

. This function is inferred based on the training examples (input data points and their corresponding outputs) from the training dataset. An ideal model/learned function would learn the mapping in such a way that the results can be generalized for unseen data as well.

There are various supervised learning algorithms available. Based on the use case requirements, they can be majorly categorize into classification and regression models.

Classification

In the simplest terms, these algorithms help us answer objective questions or a yes-no prediction. For instance, these algorithms are useful in scenarios like is it going to rain today?, or can this tumour be cancerous?, and so on.

Formally, the key objective of classification algorithms is to predict output labels that are categorical in nature depending upon the input data points. The output labels are categorical in nature; namely, they each belong to a discrete class or category.

Logistic regression, Support Vector Machines (SVMs), Neural Networks, Random Forests, k-Nearest Neighbours (KNN), Decision Trees, and so on are some of the popular classification algorithms.

Suppose we have a real-world use case to evaluate different car models. To keep things simple, let's assume that the model is expected to predict an output for every car model as either acceptable or unacceptable based on multiple input training samples. The input training samples have attributes such as buying price, number of doors, capacity (in number of persons), and safety.

The level apart from the class label denotes each data point as either acceptable or unacceptable. The following diagram depicts the binary classification problem at hand. The classification algorithm takes the training samples as input to prepare a supervised model. This model is then utilized to predict the evaluation label for a new data point:

Supervised learning: Binary classification for car model evaluation

Since output labels are discrete classes in case of classification problems, if there are only two possible output classes the task is termed as a binary classification problem, and a multi-class classification otherwise. Predicting whether it will rain tomorrow or not would be a binary classification problem (with output being a yes or a no) while predicting a numeric digit from scanned handwritten images would be multi-class classification with 10 labels (zero to nine possible output labels).

Regression

This class of supervised learning algorithms helps us answer quantitative questions of the type how many or how much?. Formally, the key objective for regression models is value estimation. In this case, the output labels are continuous in nature (as opposed to being discrete in classification).

In the case of regression problems, the input data points are termed as independent or explanatory variables, while the output is termed as a dependent variable. Regression models are also trained using training data samples consisting of input (or independent) data points along with output (or dependent) signals. Linear regression, multivariate regression, regression trees, and so on are a few supervised regression algorithms.

Regression models can be further categorized based on how they model the relationship between dependent and independent variables.

Simple linear regression models work with single independent and single dependent variables. Ordinary Least Squares (OLS)regression is a popular linear regression model. Multiple regression or multivariate regression is where there is a single dependent variable, while each observation is a vector composed of multiple explanatory variables.

Polynomial regression models are a special case of multivariate regression. Here the dependent variable is modeled to the nth degree of the independent variable. Since polynomial regression models fit or map nonlinear relationships between dependent and independent variables, these are also termed as nonlinear regression models.

The following is an example of linear regression:

Supervised learning: Linear regression

To understand different regression types, let's consider a real-world use case of estimating the stopping distance of a car, based on its speed. Here, based on the training data we have, we can model the stopping distance as a linear function of speed or as a polynomial function of the speed of the car. Remember that the main objective is to minimize the error without overfitting the training data itself.

The preceding graph depicts a linear fit while the following one depicts a polynomial fit for the the same dataset:

Supervised learning: Polynomial regression

Unsupervised learning

As the name suggests, this class of algorithms learns/infers concepts without supervision. Unlike supervised learning algorithms, which infer a mapping function based on training dataset consisting of input data points and output signals, unsupervised algorithms are tasked with finding patterns and relationships in the training data without any output signals available in the training dataset. This class of algorithms utilizes the input dataset to detect patterns, and mine for rules or group/cluster data points so as to extract meaningful insights from the raw input dataset.

Unsupervised algorithms come in handy when we do not have the liberty of a training set that contains corresponding output signals or labels. In many real-world scenarios, datasets are available without output signals and it is difficult to manually label them. Thus, unsupervised algorithms are helpful in plugging such gaps.

Similar to supervised learning algorithms, unsupervised algorithms can also be categorized for ease of understanding and learning. The following are different categories of unsupervised learning algorithms.

Clustering

The unsupervised equivalent of classification is termed as clustering. These algorithms help us cluster or group data points into different groups or categories, without the availability of any output label in the input/training dataset. These algorithms try to find patterns and relationships from the input dataset, utilizing inherent features to group them into various groups based on some similarity measure, as shown in the following diagram:

Unsupervised learning: Clustering news articles

A real-world example to help understand clustering could be news articles. There are hundreds of news articles written daily, each catering to different topics ranging from politics and sports to entertainment, and so on. An unsupervised approach to group these articles together can be achieved using clustering, as shown in the preceding figure.

There are different approaches to perform the process of clustering. The most popular ones are:

Centroid based methods. Popular ones are K-means and K-medoids.

Agglomerative and divisive hierarchical clustering methods. Popular ones are Ward's and affinity propagation.

Data distribution based methods, for instance, Gaussian mixture models.

Density based methods such as DBSCAN and so on.

Dimensionality reduction

Data and ML are the best of friends, yet a lot of issues come with more and bigger data. A large number of attributes or a bloated-up feature space is one common problem. A large feature space poses problems in analyzing and visualizing the data along with issues related to training, memory, and space constraints. This is also known as the curse of dimensionality. Since unsupervised methods help us extract insights and patterns from unlabeled training datasets, they are also useful in helping us reduce dimensionality.

In other words, unsupervised methods help us reduce feature space by helping us select a representative set of features from the complete available list:

Unsupervised learning: Dimensionality reduction using PCA

Principal Component Analysis (PCA), nearest neighbors, and discriminant analysis are some of the popular dimensionality reduction techniques.

The preceding diagram is a famous depiction of the workings of the PCA based dimensionality reduction technique. It shows a swiss roll shape with data represented in three-dimensional space. Application of PCA results in transformation of the data into two-dimensional space, as shown on the right-hand side of the diagram.

Association rule mining

This class of unsupervised ML algorithms helps us understand and extract patterns from transactional datasets. Also termed as Market Basket Analysis (MBA), these algorithms help us identify interesting relationships and associations between items across transactions.

Using association rule mining, we can answer questions like what items are bought together by people at a given store?, or dopeople who buy wine also tend to buy cheese?, and many more. FP-growth, ECLAT, and Apriori are some of the most widely used algorithms for association rule mining tasks.

Anomaly detection

Anomaly detection is the task of identifying rare events/observations based on historical data. Anomaly detection is also termed as outlier detection. Anomalies or outliers usually have characteristics such as being infrequent or occurring in short sudden bursts over time.

For such tasks, we provide a historical dataset for the algorithm so it can identify and learn the normal behavior of data in an unsupervised manner. Once learned, the algorithm helps us identify patterns that differ from this learned behavior.

CRISP-DM

Cross Industry Standard Process for Data Mining (CRISP-DM) is one of the most popular and widely used processes for data mining and analytics projects. CRISP-DM provides the required framework, which clearly outlines the necessary steps and workflows for executing a data mining and analytics project, from business requirements to the final deployment stages and everything in between.

More popularly known by the acronym itself, CRISP-DM is a tried, tested, and robust industry standard process model followed for data mining and analytics projects. CRISP-DM clearly depicts the necessary steps, processes, and workflows for executing any project, right from formalizing business requirements to testing and deploying a solution to transform data into insights. Data science, data mining, and ML are all about trying to run multiple iterative processes to extract insights and information from data. Hence, we can say that analyzing data is truly both an art as well as a science, because it is not always about running algorithms without reason; a lot of the major effort involves understanding the business, the actual value of the efforts being invested, and proper methods for articulating end results and insights.

Data science and data mining projects are iterative in nature to extract meaningful insights and information from data. Data science is as much art as science and thus a lot of time is spent understanding the business value and the data at hand before applying the actual algorithms (these again go through multiple iterations) and finally evaluations and deployment.

Similar to software engineering projects, which have different life cycle models, CRISP-DM helps us track a data mining and analytics project from start to end. This model is divided into six major steps that cover from aspects of business and data understanding to evaluation and finally deployment, all of which are iterative in nature. See the following diagram:

CRISP-DM model depicting workflow for ML projects

Let's now have a deeper look into each of the six stages to better understand the CRISP-DM model.

Business understanding

The first and the foremost step is understanding the business. This crucial step begins with setting the business context and requirements for the problem. Defining the business requirements formally is important to transform them into a data science and analytics problem statement. This step also used to set the expectations and success criteria for both business and data science teams to be on the same page and track the progress of the project.

The main deliverable of this step is a detailed plan consisting of major milestones, timelines, assumptions, constraints, caveats, issues expected, and success criteria.

Data understanding

Data collection and understanding is the second step in the CRISP-DM framework. In this step we take a deeper dive to understand and analyze the data for the problem statement formalized in the previous step. This step begins with investigating the various sources of data outlined in the detailed project plan previously. These sources of data are then used to collect data, analyze different attributes, and make a note of data quality. This step also involves what is generally termed as exploratory data analysis.

Exploratory data analysis (EDA) is a very important sub-step. It is during EDA we analyze different attributes of data, their properties and characteristics. We also visualize data during EDA for a better understanding and uncovering patterns that might be previously unseen or ignored. This step lays down the foundation for the coming step and hence this step cannot be neglected at all.

Data preparation

This is the third and the most time-consuming step in any data science project. Data preparation takes place once we have understood the business problem and explored the data available. This step involves data integration, cleaning, wrangling, feature selection, and feature engineering. First and the foremost is data integration. There are times when data is available from various sources and hence needs to be combined based on certain keys or attributes for better usage.

Data cleaning and wrangling are very important steps. This involves handling missing values, data inconsistencies, fixing incorrect values, and converting data to ingestible formats such that they can be used by ML algorithms.

Data preparation is the most time-consuming step, taking over 60-70% of the overall time taken for any data science project. Apart from data integration and wrangling, this step involves selecting key features based on relevance, quality, assumptions, and constraints. This is also termed as feature selection. There are also times when we have to derive or generate features from existing ones. For example, deriving age from date of birth and so on, depending upon the use case requirements. This step is termed as feature engineering and is again required based on use case.

Modeling

The fourth step or the modeling step is where the actual analysis and ML takes place. This step utilizes the clean and formatted data prepared in the previous step for modeling purposes. This is an iterative process and works in sync with the data preparation step as models/algorithms require data in different settings/formats with varying set of attributes.

This step involves selecting relevant tools and frameworks along with the selection of a modeling technique or algorithms. This step includes model building, evaluation, and fine-tuning of models, based on the expectations and criteria laid down during the business understanding phase.

Evaluation

Once the modeling step results in a model(s) that satisfies the success criteria, performance benchmarks, and model evaluation metrics, a thorough evaluation step comes into picture. In this step, we consider the following activities before moving ahead with the deployment stage:

Model result assessment based on quality and alignment with business objectives

Identifying any additional assumptions made or constraints relaxed

Data quality, missing information, and other feedback from data science team and/or

subject matter experts

(

SMEs

)

Cost of deployment of the end-to-end ML solution

Deployment

The final step of the CRISP-DM model is deployment to production. The models that have been developed, fined-tuned, validated, and tested during multiple iterations are saved and prepared for production environment. A proper deployment plan is built, which includes details on hardware and software requirements. The deployment stage also includes putting in place checks and monitoring aspects to evaluate the model in production for results, performance, and other metrics.

Standard ML workflow

The CRISP-DM model provides a high-level workflow for management of ML and related projects. In this section, we will discuss the technical aspects and implementation of standard workflows for handling ML projects. Simply put, an ML pipeline is an end-to-end workflow consisting of various aspects of a data intensive project. Once the initial phases such as business understanding, risk assessments, and ML or data mining techniques selection have been covered, we proceed towards the solution space of driving the project. A typical ML pipeline or workflow with different sub-components is shown in the following diagram:

Typical ML pipeline

A standard ML pipeline broadly consists of the following stages.

Data retrieval

Data collection and extraction is where the story usually begins. Datasets come in all forms including structured and unstructured data that often includes missing or noisy data. Each data type and format needs special mechanisms for data handling as well as management. For instance, if a project concerns analysis of tweets, we need to work with Twitter APIs and develop mechanisms to extract the required tweets, which are usually in JSON format.

Other scenarios may involve already existing structured or unstructured public datasets or private ones, both may require additional permissions apart from just developing extraction mechanisms. A fairly detailed account pertaining to working with diverse data formats is discussed in Chapter 3 of the book Practical Machine Learning with Python, Sarkar and their co-authors, Springer, 2017 in case you are interested in diving deeper into further details.

Data preparation