31,19 €
Deep learning is a popular subset of machine learning, and it allows you to build complex models that are faster and give more accurate predictions. This book is your companion to take your first steps into the world of deep learning, with hands-on examples to boost your understanding of the topic.
This book starts with a quick overview of the essential concepts of data science and machine learning which are required to get started with deep learning. It introduces you to Tensorflow, the most widely used machine learning library for training deep learning models. You will then work on your first deep learning problem by training a deep feed-forward neural network for digit classification, and move on to tackle other real-world problems in computer vision, language processing, sentiment analysis, and more. Advanced deep learning models such as generative adversarial networks and their applications are also covered in this book.
By the end of this book, you will have a solid understanding of all the essential concepts in deep learning. With the help of the examples and code provided in this book, you will be equipped to train your own deep learning models with more confidence.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 460
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Vedika NaikAcquisition Editor: Tushar GuptaContent Development Editor:Aishwarya PandereTechnical Editor: Sagar SawantCopy Editor: Vikrant Phadke, Safis EditingProject Coordinator: Nidhi JoshiProofreader: Safis EditingIndexer: Mariammal ChettiyarGraphics:Tania DuttaProduction Coordinator:Aparna Bhagat
First published: February 2018
Production reference: 1260218
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78839-990-6
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Ahmed Menshawy is a Research Engineer at the Trinity College Dublin, Ireland. He has more than 5 years of working experience in the area of ML and NLP. He holds an MSc in Advanced Computer Science. He started his Career as a Teaching Assistant at the Department of Computer Science, Helwan University, Cairo, Egypt. He taught several advanced ML and NLP courses such as ML, Image Processing, and so on. He was involved in implementing the state-of-the-art system for Arabic Text to Speech. He was the main ML specialist at the Industrial research and development lab at IST Networks, based in Egypt.
Md. Rezaul Karim is a Research Scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a Researcher at the Insight Centre for Data Analytics, Ireland. Earlier, he worked as a Lead Engineer at Samsung Electronics, Korea.
He has 9 years of R&D experience with C++, Java, R, Scala, and Python. He has published research papers concerning bioinformatics, big data, and deep learning. He has practical working experience with Spark, Zeppelin, Hadoop, Keras, Scikit-Learn, TensorFlow, DeepLearning4j, MXNet, and H2O.
Doug Ortiz is an experienced Enterprise Cloud, Big Data, Data Analytics and Solutions Architect who has architected, designed, developed, re-engineered and integrated enterprise solutions. Other expertise: Amazon Web Services, Azure, Google Cloud, Business Intelligence, Hadoop, Spark, NoSQL Databases, SharePoint to mention a few.
Is the founder of Illustris, LLC reachable at: [email protected]
Huge thanks to my wonderful wife Milla, Maria, Nikolay and our children for all their support.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Deep Learning By Example
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Data Science - A Birds' Eye View
Understanding data science by an example
Design procedure of data science algorithms
Data pre-processing
Data cleaning
Data pre-processing
Feature selection
Model selection
Learning process
Evaluating your model
Getting to learn
Challenges of learning
Feature extraction – feature engineering
Noise
Overfitting
Selection of a machine learning algorithm
Prior knowledge
Missing values
Implementing the fish recognition/detection model
Knowledge base/dataset
Data analysis pre-processing
Model building
Model training and testing
Fish recognition – all together
Different learning types
Supervised learning
Unsupervised learning
Semi-supervised learning
Reinforcement learning
Data size and industry needs
Summary
Data Modeling in Action - The Titanic Example
Linear models for regression
Motivation
Advertising – a financial example
Dependencies
Importing data with pandas
Understanding the advertising data
Data analysis and visualization
Simple regression model
Learning model coefficients
Interpreting model coefficients
Using the model for prediction
Linear models for classification
Classification and logistic regression
Titanic example – model building and training
Data handling and visualization
Data analysis – supervised machine learning
Different types of errors
Apparent (training set) error
Generalization/true error
Summary
Feature Engineering and Model Complexity – The Titanic Example Revisited
Feature engineering
Types of feature engineering
Feature selection
Dimensionality reduction
Feature construction
Titanic example revisited
Missing values
Removing any sample with missing values in it
Missing value inputting
Assigning an average value
Using a regression or another simple model to predict the values of missing variables
Feature transformations
Dummy features
Factorizing
Scaling
Binning
Derived features
Name
Cabin
Ticket
Interaction features
The curse of dimensionality
Avoiding the curse of dimensionality
Titanic example revisited – all together
Bias-variance decomposition
Learning visibility
Breaking the rule of thumb
Summary
Get Up and Running with TensorFlow
TensorFlow installation
TensorFlow GPU installation for Ubuntu 16.04
Installing NVIDIA drivers and CUDA 8
Installing TensorFlow
TensorFlow CPU installation for Ubuntu 16.04
TensorFlow CPU installation for macOS X
TensorFlow GPU/CPU installation for Windows
The TensorFlow environment
Computational graphs
TensorFlow data types, variables, and placeholders
Variables
Placeholders
Mathematical operations
Getting output from TensorFlow
TensorBoard – visualizing learning
Summary
TensorFlow in Action - Some Basic Examples
Capacity of a single neuron
Biological motivation and connections
Activation functions
Sigmoid
Tanh
ReLU
Feed-forward neural network
The need for multilayer networks
Training our MLP – the backpropagation algorithm
Step 1 – forward propagation
Step 2 – backpropagation and weight updation
TensorFlow terminologies – recap
Defining multidimensional arrays using TensorFlow
Why tensors?
Variables
Placeholders
Operations
Linear regression model – building and training
Linear regression with TensorFlow
Logistic regression model – building and training
Utilizing logistic regression in TensorFlow
Why use placeholders?
Set model weights and bias
Logistic regression model
Training
Cost function
Summary
Deep Feed-forward Neural Networks - Implementing Digit Classification
Hidden units and architecture design
MNIST dataset analysis
The MNIST data
Digit classification – model building and training
Data analysis
Building the model
Model training
Summary
Introduction to Convolutional Neural Networks
The convolution operation
Motivation
Applications of CNNs
Different layers of CNNs
Input layer
Convolution step
Introducing non-linearity
The pooling step
Fully connected layer
Logits layer
CNN basic example – MNIST digit classification
Building the model
Cost function
Performance measures
Model training
Summary
Object Detection – CIFAR-10 Example
Object detection
CIFAR-10 – modeling, building, and training
Used packages
Loading the CIFAR-10 dataset
Data analysis and preprocessing
Building the network
Model training
Testing the model
Summary
Object Detection – Transfer Learning with CNNs
Transfer learning
The intuition behind TL
Differences between traditional machine learning and TL
CIFAR-10 object detection – revisited
Solution outline
Loading and exploring CIFAR-10
Inception model transfer values
Analysis of transfer values
Model building and training
Summary
Recurrent-Type Neural Networks - Language Modeling
The intuition behind RNNs
Recurrent neural networks architectures
Examples of RNNs
Character-level language models
Language model using Shakespeare data
The vanishing gradient problem
The problem of long-term dependencies
LSTM networks
Why does LSTM work?
Implementation of the language model
Mini-batch generation for training
Building the model
Stacked LSTMs
Model architecture
Inputs
Building an LSTM cell
RNN output
Training loss
Optimizer
Building the network
Model hyperparameters
Training the model
Saving checkpoints
Generating text
Summary
Representation Learning - Implementing Word Embeddings
Introduction to representation learning
Word2Vec
Building Word2Vec model
A practical example of the skip-gram architecture
Skip-gram Word2Vec implementation
Data analysis and pre-processing
Building the model
Training
Summary
Neural Sentiment Analysis
General sentiment analysis architecture
RNNs – sentiment analysis context
Exploding and vanishing gradients - recap
Sentiment analysis – model implementation
Keras
Data analysis and preprocessing
Building the model
Model training and results analysis
Summary
Autoencoders – Feature Extraction and Denoising
Introduction to autoencoders
Examples of autoencoders
Autoencoder architectures
Compressing the MNIST dataset
The MNIST dataset
Building the model
Model training
Convolutional autoencoder
Dataset
Building the model
Model training
Denoising autoencoders
Building the model
Model training
Applications of autoencoders
Image colorization
More applications
Summary
Generative Adversarial Networks
An intuitive introduction
Simple implementation of GANs
Model inputs
Variable scope
Leaky ReLU
Generator
Discriminator
Building the GAN network
Model hyperparameters
Defining the generator and discriminator
Discriminator and generator losses
Optimizers
Model training
Generator samples from training
Sampling from the generator
Summary
Face Generation and Handling Missing Labels
Face generation
Getting the data
Exploring the Data
Building the model
Model inputs
Discriminator
Generator
Model losses
Model optimizer
Training the model
Semi-supervised learning with Generative Adversarial Networks (GANs)
Intuition
Data analysis and preprocessing
Building the model
Model inputs
Generator
Discriminator
Model losses
Model optimizer
Model training
Summary
Implementing Fish Recognition
Code for fish recognition
Other Books You May Enjoy
Leave a review - let other readers know what you think
This book will start off by introducing the foundations of machine learning, what makes learning visible, demonstrating the traditional machine learning techniques with some examples and eventually deep learning. You will then move to creating machine learning models that will eventually lead you to neural networks. You will get familiar with the basics of deep learning and explore various tools that enable deep learning in a powerful yet user-friendly manner. With a very low starting point, this book will enable a regular developer to get hands-on experience with deep learning. You will learn all the essentials needed to explore and understand what deep learning is and will perform deep learning tasks first-hand. Also, we will be using one of the most widely used deep learning frameworks. TensorFlow has big community support that is growing day by day, which makes it a good option for building your complex deep learning applications.
This book is a starting point for those who are keen on knowing about deep learning and implementing it but do not have an extensive background in machine learning, complex statistics, and linear algebra.
Chapter 1, Data science - Bird's-eye view, explains that data science or machine learning is the process of giving the machines the ability to learn from a dataset without being told or programmed. For instance, it will be extremely hard to write a program that takes a hand-written digit as an input image and outputs a value from 0-9 according to the number that's written in this image. The same applies to the task of classifying incoming emails as spam or non-spam. To solve such tasks, data scientists uses learning methods and tools from the field of data science or machine learning to teach the computer how to automatically recognize digits by giving it some explanatory features that can distinguish each digit from another. The same for the spam/non-spam problem, instead of using regular expressions and writing hundred of rules to classify the incoming emails, we can teach the computer through specific learning algorithms how to distinguish between spam and non-spam emails.
Chapter 2, Data Modeling in Action - The Titanic Example, linear models are the basic learning algorithms in the field of data science. Understanding how a linear model works is crucial in your journey of learning data science because it's the basic building block for most of the sophisticated learning algorithms out there, including neural networks.
Chapter 3, Feature Engineering and Model Complexity – Titanic Example Revisited, covers model complexity and assessment. This is an important towards building a successful data science system. There are lots of tools that you can use to assess and choose your model. In this chapter, we are going to address some of tools that can help you to increase the value of your data by adding more descriptive features and extracting meaningful information from existing ones. We are also going to address other tools related to optimal number features and learn why it's a problem to have a large number of features and fewer training samples/observations.
Chapter 4, Get Up and Running with TensorFlow, gives an overview of one of the most widely used deep learning frameworks. TensorFlow has big community support that is growing day by day, which makes it a good option for building your complex deep learning applications
Chapter 5,Tensorflow in Action - Some Basic Examples, will explain the main computational concept behind TensorFlow, which is the computational graph model, and demonstrate how to get you on track by implementing linear regression and logistic regression.
Chapter 6, Deep Feed-forward Neural Networks - Implementing Digit Classification, explains that a feed-forward neural network(FNN) is a special type of neural network wherein links/connections between neurons do not form a cycle. As such, it is different from other architectures in a neural network that we will get to study later on in this book (recurrent-type neural networks). The FNN is a widely used architecture and it was the first and simplest type of neural network. In this chapter, we will go through the architecture of a typical ;FNN, and we will be using the TensorFlow library for this. After covering these concepts, we will give a practical example of digit classification. The question of this example is, Given a set of images that contain handwritten digits, how can you classify these images into 10 different classes (0-9)?
Chapter 7, Introduction to Convolutional Neural Networks, explains that in data science, a convolutional neural network (CNN) is specific kind of deep learning architecture that uses the convolution operation to extract relevant explanatory features for the input image. CNN layers are connected as an FNN while using this convolution operation to mimic how the human brain functions when trying to recognize objects. Individual cortical neurons respond to stimuli in a restricted region of space known as the receptive field. In particular, biomedical imaging problems could be challenge sometimes but in this chapter, we'll see how to use a CNN in order to discover patterns in this image.
Chapter 8, Object Detection – CIFAR-10 Example, covers the basics and the intuition/motivation behind CNNs, before demonstrating this on one of the most popular datasets available for object detection. We'll also see how the initial layers of the CNN get very basic features about our objects, but the final convolutional layers will get more semantic-level features that are built up from those basic features in the first layers.
Chapter 9, Object Detection – Transfer Learning with CNNs, explains that Transfer learning (TL) is a research problem in data science that is mainly concerned with persisting knowledge acquired during solving a specific task and using this acquired knowledge to solve another different but similar task. In this chapter, we will demonstrate one of the modern practices and common themes used in the field of data science with TL. The idea here is how to get the help from domains with very large datasets to domains that have smaller datasets. Finally, we will revisit our object detection example of CIFAR-10 and try to reduce both the training time and performance error via TL.
Chapter 10, Recurrent-Type Neural Networks - Language Modeling, explains that Recurrent neural networks (RNNs) are a class of deep learning architectures that are widely used for natural language processing. This set of architectures enables us to provide contextual information for current predictions and also have specific architecture that deals with long-term dependencies in any input sequence. In this chapter, we'll demonstrate how to make a sequence-to-sequence model, which will be useful in many applications in NLP. We will demonstrate these concepts by building a character-level language model and see how our model generates sentences similar to original input sequences.
Chapter 11, Representation Learning - Implementing Word Embeddings, explains that machine learning is a science that is mainly based on statistics and linear algebra. Applying matrix operations is very common among most machine learning or deep learning architectures because of backpropagation. This is the main reason deep learning, or machine learning in general, accepts only real-valued quantities as input. This fact contradicts many applications, such as machine translation, sentiment analysis, and so on; they have text as an input. So, in order to use deep learning for this application, we need to have it in the form that deep learning accepts! In this chapter, we are going to introduce the field of representation learning, which is a way to learn a real-valued representation from text while preserving the semantics of the actual text. For example, the representation of love should be very close to the representation of adore because they are used in very similar contexts.
Chapter 12, Neural Sentiment Analysis, addresses one of the hot and trendy applications in natural language processing, which is called sentiment analysis. Most people nowadays express their opinions about something through social media platforms, and making use of this vast amount of text to keep track of customer satisfaction about something is very crucial for companies or even governments.
In this chapter, we are going to use RNNs to build a sentiment analysis solution.
Chapter 13, Autoencoders – Feature Extraction and Denoising, explains that an autoencoder network is nowadays one of the widely used deep learning architectures. It's mainly used for unsupervised learning of efficient decoding tasks. It can also be used for dimensionality reduction by learning an encoding or a representation for a specific dataset. Using autoencoders in this chapter, we'll show how to denoise your dataset by constructing another dataset with the same dimensions but less noise. To use this concept in practice, we will extract the important features from the MNIST dataset and try to see how the performance will be significantly enhanced by this.
Chapter 14, Generative Adversarial Networks, coversGenerative Adversarial Networks (GANs). They are deep neural net architectures that consist of two networks pitted against each other (hence the name adversarial). GANs were introduced in a paper (https://arxiv.org/abs/1406.2661) by Ian Goodfellow and other researchers, including Yoshua Bengio, at the University of Montreal in 2014. Referring to GANs, Facebook's AI research director, Yann LeCun, called adversarial training the most interesting idea in the last 10 years in machine learning. The potential of GANs is huge, because they can learn to mimic any distribution of data. That is, GANs can be taught to create worlds eerily similar to our own in any domain: images, music, speech, or prose. They are robot artists in a sense, and their output is impressive (https://www.nytimes.com/2017/08/14/arts/design/google-how-ai-creates-new-music-and-new-artists-project-magenta.html)—and poignant too.
Chapter 15, Face Generation and Handling Missing Labels, shows that the list of interesting applications that we can use GANs for is endless. In this chapter, we are going to demonstrate another promising application of GANs, which is face generation based on the CelebA database. We'll also demonstrate how to use GANs for semi-supervised learning setups where we've got a poorly labeled dataset with some missing labels.
Appendix, Implementing Fish Recognition, includes entire piece of code of fish recognition example.
Inform the reader of the things that they need to know before they start, and spell out what knowledge you are assuming.
Any additional installation instructions and information they need for getting set up.
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packtpub.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Deep-Learning-By-Example. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/DeepLearningByExample_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."
A block of code is set as follows:
html, body, #map { height: 100%; margin: 0; padding: 0}
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[default]exten => s,1,Dial(Zap/1|30)exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)
Any command-line input or output is written as follows:
$ mkdir css
$ cd css
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
Data science or machine learning is the process of giving the machines the ability to learn from a dataset without being told or programmed. For instance, it is extremely hard to write a program that can take a hand-written digit as an input image and outputs a value from 0-9 according to the image that's written. The same applies to the task of classifying incoming emails as spam or non-spam. For solving such tasks, data scientists use learning methods and tools from the field of data science or machine learning to teach the computer how to automatically recognize digits, by giving it some explanatory features that can distinguish one digit from another. The same for the spam/non-spam problem, instead of using regular expressions and writing hundred of rules to classify the incoming email, we can teach the computer through specific learning algorithms how to distinguish between spam and non-spam emails.
You are probably using applications of data science on a daily basis, often without knowing it. For example, your country might be using a system to detect the ZIP code of your posted letter in order to automatically forward it to the correct area. If you are using Amazon, they often recommend things for you to buy and they do this by learning what sort of things you often search for or buy.
Building a learned/trained machine learning algorithm will require a base of historical data samples from which it's going to learn how to distinguish between different examples and to come up with some knowledge and trends from that data. After that, the learned/trained algorithm could be used for making predictions on unseen data. The learning algorithm will be using raw historical data and will try to come up with some knowledge and trends from that data.
In this chapter, we are going to have a bird's-eye view of data science, how it works as a black box, and the challenges that data scientists face on a daily basis. We are going to cover the following topics:
Understanding data science by an example
Design procedure of data science algorithms
Getting to learn
Implementing the fish recognition/detection model
Different learning types
Data size and industry needs
Different learning systems usually follow the same design procedure. They start by acquiring the knowledge base, selecting the relevant explanatory features from the data, going through a bunch of candidate learning algorithms while keeping an eye on the performance of each one, and finally the evaluation process, which measures how successful the training process was.
In this section, we are going to address all these different design steps in more detail:
This component of the learning cycle represents the knowledge base of our algorithm. So, in order to help the learning algorithm give accurate decisions about the unseen data, we need to provide this knowledge base in the best form. Thus, our data may need a lot of cleaning and pre-processing (conversions).
Most datasets require this step, in which you get rid of errors, noise, and redundancies. We need our data to be accurate, complete, reliable, and unbiased, as there are lots of problems that may arise from using bad knowledge base, such as:
Inaccurate and biased conclusions
Increased error
Reduced generalizability, which is the model's ability to perform well over the unseen data that it didn't train on previously
The number of explanatory features (input variables) of a sample can be enormous wherein you get xi=(xi1, xi2, xi3, ... , xid) as a training sample (observation/example) and d is very large. An example of this can be a document classification task3, where you get 10,000 different words and the input variables will be the number of occurrences of different words.
This enormous number of input variables can be problematic and sometimes a curse because we have many input variables and few training samples to help us in the learning procedure. To avoid this curse of having an enormous number of input variables (curse of dimensionality), data scientists use dimensionality reduction techniques in order to select a subset from the input variables. For example, in the text classification task they can do the following:
Extracting relevant inputs (for instance, mutual information measure)
Principal component analysis
(
PCA
)
Grouping (cluster) similar words (this uses a similarity measure)
This step comes after selecting a proper subset of your input variables by using any dimensionality reduction technique. Choosing the proper subset of the input variable will make the rest of the learning process very simple.
In this step, you are trying to figure out the right model to learn.
If you have any prior experience with data science and applying learning methods to different domains and different kinds of data, then you will find this step easy as it requires prior knowledge of how your data looks and what assumptions could fit the nature of your data, and based on this you choose the proper learning method. If you don't have any prior knowledge, that's also fine because you can do this step by guessing and trying different learning methods with different parameter settings and choose the one that gives you better performance over the test set.
Also, initial data analysis and visualization will help you to make a good guess about the form of the distribution and nature of your data.
By learning, we mean the optimization criteria that you are going to use to select the best model parameters. There are various optimization criteria for that:
Mean square error
(
MSE
)
Maximum likelihood
(
ML
) criterion
Maximum a posterior probability
(
MAP
)
The optimization problem may be hard to solve, but the right choice of model and error function makes a difference.
In this step, we try to measure the generalization error of our model on the unseen data. Since we only have the specific data without knowing any unseen data beforehand, we can randomly select a test set from the data and never use it in the training process so that it acts like valid unseen data. There are different ways you can to evaluate the performance of the selected model:
Simple holdout method, which is dividing the data into training and testing sets
Other complex methods, based on cross-validation and random subsampling
Our objective in this step is to compare the predictive performance for different models trained on the same data and choose the one with a better (smaller) testing error, which will give us a better generalization error over the unseen data. You can also be more certain about the generalization error by using a statistical method to test the significance of your results.
Building a machine learning system comes with some challenges and issues; we will try to address them in this section. Many of these issues are domain specific and others aren't.
The following is an overview of the challenges and issues that you will typically face when trying to build a learning system.
Feature extraction is one of the crucial steps toward building a learning system. If you did a good job in this challenge by selecting the proper/right number of features, then the rest of the learning process will be easy. Also, feature extraction is domain dependent and it requires prior knowledge to have a sense of what features could be important for a particular task. For example, the features for our fish recognition system will be different from the ones for spam detection or identifying fingerprints.
The feature extraction step starts from the raw data that you have. Then build derived variables/values (features) that are informative about the learning task and facilitate the next steps of learning and evaluation (generalization).
Some tasks will have a vast number of features and fewer training samples (observations) to facilitate the subsequent learning and generalization processes. In such cases, data scientists use dimensionality reduction techniques to reduce the vast number of features to a smaller set.
In the fish recognition task, you can see that the length, weight, fish color, as well as the boat color may vary, and there could be shadows, images with low resolution, and other objects in the image. All these issues affect the significance of the proposed explanatory features that should be informative about our fish classification task.
Work-arounds will be helpful in this case. For example, someone might think of detecting the boat ID and mask out certain parts of the boat that most likely won't contain any fish to be detected by our system. This work-around will limit our search space.
As we have seen in our fish recognition task, we have tried to enhance our model's performance by increasing the model complexity and perfectly classifying every single instance of the training samples. As we will see later, such models do not work over unseen data (such as the data that we will use for testing the performance of our model). Having trained models that work perfectly over the training samples but fail to perform well over the testing samples is called overfitting.
If you sift through the latter part of the chapter, we build a learning system with an objective to use the training samples as a knowledge base for our model in order to learn from it and generalize over the unseen data. Performance error of the trained model is of no interest to us over the training data; rather, we are interested in the performance (generalization) error of the trained model over the testing samples that haven't been involved in the training phase.
Sometimes you are unsatisfied with the execution of the model that you have utilized for a particular errand and you need an alternate class of models. Each learning strategy has its own presumptions about the information it will utilize as a learning base. As an information researcher, you have to discover which suspicions will fit your information best; by this you will have the capacity to acknowledge to attempt a class of models and reject another.
As discussed in the concepts of model selection and feature extraction, the two issues can be dealt with, if you have prior knowledge about:
The appropriate feature
Model selection parts
Having prior knowledge of the explanatory features in the fish recognition system enabled us to differentiate amid different types of fish. We can go promote by endeavoring to envision our information and get some feeling of the information types of the distinctive fish classifications. On the basis of this prior knowledge, apt family of models can be chosen.
Missing features mainly occur because of a lack of data or choosing the prefer-not-to-tell option. How can we handle such a case in the learning process? For example, imagine we find the width of specific a fish type is missing for some reason. There are many ways to handle these missing features.
To introduce the power of machine learning and deep learning in particular, we are going to implement the fish recognition example. No understanding of the inner details of the code will be required. The point of this section is to give you an overview of a typical machine learning pipeline.
Our knowledge base for this task will be a bunch of images, each one of them is labeled as opah or tuna. For this implementation, we are going to use one of the deep learning architectures that made a breakthrough in the area of imaging and computer vision in general. This architecture is called Convolution Neural Networks (CNNs). It is a family of deep learning architectures that use the convolution operation of image processing to extract features from the images that can explain the object that we want to classify. For now, you can think of it as a magic box that will take our images, learn from it how to distinguish between our two classes (opah and tuna), and then we will test the learning process of this box by feeding it with unlabeled images and see if it's able to tell which type of fish is in the image.
In this example, we will be using Keras. For the moment, you can think of Keras as an API that makes building and using deep learning way easier than usual. So let's get started! From the Keras website we have:
As we mentioned earlier, we need a historical base of data that will be used to teach the learning algorithm about the task that it's supposed to do later. But we also need another dataset for testing its ability to perform the task after the learning process. So to sum up, we need two types of datasets during the learning process:
The first one is the knowledge base where we have the input data and their corresponding labels such as the fish images and their corresponding labels (opah or tuna). This data will be fed to the learning algorithm to learn from it and try to discover the patterns/trends that will help later on for classifying unlabeled images.
The second one is mainly for testing the ability of the model to apply what it learned from the knowledge base to unlabeled images or unseen data, in general, and see if it's working well.
As you can see, we only have the data that we will use as a knowledge base for our learning method. All of the data we have at hand will have the correct output associated with it. So we need to somehow make up this data that does not have any correct output associated with it (the one that we are going to apply the model to).
While performing data science, we'll be doing the following:
Training phase
: We present our data from our knowledge base and train our learning method/model by feeding the input data along with its correct output to the model.
Validation/test phase
: In this phase, we are going to measure how well the trained model is doing. We also use different model property techniques in order to measure the performance of our trained model by using (R-square score for regression, classification errors for classifiers, recall and precision for IR models, and so on).
The validation/test phase is usually split into two steps:
In the first step, we use different learning methods/models and choose the best performing one based on our validation data (validation step)
Then we measure and report the accuracy of the selected model based on the test set (test step)
Now let's see how we get this data to which we are going to apply the model and see how well trained it is.
Since we don't have any training samples without the correct output, we can make up one from the original training samples that we will be using. So we can split our data samples into three different sets (as shown in Figure 1.9):
Train set
: This will be used as a knowledge base for our model. Usually, will be 70% from the original data samples.
Validation set
: This will be used to choose the best performing model among a set of models. Usually this will be 10% of the original data samples.
Test set
: This will be used to measure and report the accuracy of the selected model. Usually, it will be as big as the validation set.
In case you have only one learning method that you are using, you can cancel the validation set and re-split your data to be train and test sets only. Usually, data scientists use 75/25 as percentages, or 70/30.
