Deep Learning By Example - Ahmed Menshawy - E-Book

Deep Learning By Example E-Book

Ahmed Menshawy

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Deep learning is a popular subset of machine learning, and it allows you to build complex models that are faster and give more accurate predictions. This book is your companion to take your first steps into the world of deep learning, with hands-on examples to boost your understanding of the topic.
This book starts with a quick overview of the essential concepts of data science and machine learning which are required to get started with deep learning. It introduces you to Tensorflow, the most widely used machine learning library for training deep learning models. You will then work on your first deep learning problem by training a deep feed-forward neural network for digit classification, and move on to tackle other real-world problems in computer vision, language processing, sentiment analysis, and more. Advanced deep learning models such as generative adversarial networks and their applications are also covered in this book.
By the end of this book, you will have a solid understanding of all the essential concepts in deep learning. With the help of the examples and code provided in this book, you will be equipped to train your own deep learning models with more confidence.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 460

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Deep Learning By Example
A hands-on guide to implementing advanced machine learning algorithms and neural networks
Ahmed Menshawy
BIRMINGHAM - MUMBAI

Deep Learning By Example

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Vedika NaikAcquisition Editor: Tushar GuptaContent Development Editor:Aishwarya PandereTechnical Editor: Sagar SawantCopy Editor: Vikrant Phadke, Safis EditingProject Coordinator: Nidhi JoshiProofreader: Safis EditingIndexer: Mariammal ChettiyarGraphics:Tania DuttaProduction Coordinator:Aparna Bhagat

First published: February 2018

Production reference: 1260218

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78839-990-6

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Ahmed Menshawy is a Research Engineer at the Trinity College Dublin, Ireland. He has more than 5 years of working experience in the area of ML and NLP. He holds an MSc in Advanced Computer Science. He started his Career as a Teaching Assistant at the Department of Computer Science, Helwan University, Cairo, Egypt. He taught several advanced ML and NLP courses such as ML, Image Processing, and so on. He was involved in implementing the state-of-the-art system for Arabic Text to Speech. He was the main ML specialist at the Industrial research and development lab at IST Networks, based in Egypt.

I want to thank the people who have been close to me and supported me, especially my wife Sara and my parents

About the reviewers

Md. Rezaul Karim is a Research Scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a Researcher at the Insight Centre for Data Analytics, Ireland. Earlier, he worked as a Lead Engineer at Samsung Electronics, Korea.

He has 9 years of R&D experience with C++, Java, R, Scala, and Python. He has published research papers concerning bioinformatics, big data, and deep learning. He has practical working experience with Spark, Zeppelin, Hadoop, Keras, Scikit-Learn, TensorFlow, DeepLearning4j, MXNet, and H2O.

Doug Ortiz is an experienced Enterprise Cloud, Big Data, Data Analytics and Solutions Architect who has architected, designed, developed, re-engineered and integrated enterprise solutions. Other expertise: Amazon Web Services, Azure, Google Cloud, Business Intelligence, Hadoop, Spark, NoSQL Databases, SharePoint to mention a few.

Is the founder of Illustris, LLC reachable at: [email protected]

Huge thanks to my wonderful wife Milla, Maria, Nikolay and our children for all their support.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Deep Learning By Example

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Data Science - A Birds' Eye View

Understanding data science by an example

Design procedure of data science algorithms

Data pre-processing

Data cleaning

Data pre-processing

Feature selection

Model selection

Learning process

Evaluating your model

Getting to learn

Challenges of learning

Feature extraction – feature engineering

Noise

Overfitting

Selection of a machine learning algorithm

Prior knowledge

Missing values

Implementing the fish recognition/detection model

Knowledge base/dataset

Data analysis pre-processing

Model building

Model training and testing

Fish recognition – all together

Different learning types

Supervised learning

Unsupervised learning

Semi-supervised learning

Reinforcement learning

Data size and industry needs

Summary

Data Modeling in Action - The Titanic Example

Linear models for regression

Motivation

Advertising – a financial example

Dependencies

Importing data with pandas

Understanding the advertising data

Data analysis and visualization

Simple regression model

Learning model coefficients

Interpreting model coefficients

Using the model for prediction

Linear models for classification

Classification and logistic regression

Titanic example – model building and training

Data handling and visualization

Data analysis – supervised machine learning

Different types of errors

Apparent (training set) error

Generalization/true error

Summary

Feature Engineering and Model Complexity – The Titanic Example Revisited

Feature engineering

Types of feature engineering

Feature selection

Dimensionality reduction

Feature construction

Titanic example revisited

Missing values

Removing any sample with missing values in it

Missing value inputting

Assigning an average value

Using a regression or another simple model to predict the values of missing variables

Feature transformations

Dummy features

Factorizing

Scaling

Binning

Derived features

Name

Cabin

Ticket

Interaction features

The curse of dimensionality

Avoiding the curse of dimensionality

Titanic example revisited – all together

Bias-variance decomposition

Learning visibility

Breaking the rule of thumb

Summary

Get Up and Running with TensorFlow

TensorFlow installation

TensorFlow GPU installation for Ubuntu 16.04

Installing NVIDIA drivers and CUDA 8

Installing TensorFlow

TensorFlow CPU installation for Ubuntu 16.04

TensorFlow CPU installation for macOS X

TensorFlow GPU/CPU installation for Windows

The TensorFlow environment

Computational graphs

TensorFlow data types, variables, and placeholders

Variables

Placeholders

Mathematical operations

Getting output from TensorFlow

TensorBoard – visualizing learning

Summary

TensorFlow in Action - Some Basic Examples

Capacity of a single neuron

Biological motivation and connections

Activation functions

Sigmoid

Tanh

ReLU

Feed-forward neural network

The need for multilayer networks

Training our MLP – the backpropagation algorithm

Step 1 – forward propagation

Step 2 – backpropagation and weight updation

TensorFlow terminologies – recap

Defining multidimensional arrays using TensorFlow

Why tensors?

Variables

Placeholders

Operations

Linear regression model – building and training

Linear regression with TensorFlow

Logistic regression model – building and training

Utilizing logistic regression in TensorFlow

Why use placeholders?

Set model weights and bias

Logistic regression model

Training

Cost function

Summary

Deep Feed-forward Neural Networks - Implementing Digit Classification

Hidden units and architecture design

MNIST dataset analysis

The MNIST data

Digit classification – model building and training

Data analysis

Building the model

Model training

Summary

Introduction to Convolutional Neural Networks

The convolution operation

Motivation

Applications of CNNs

Different layers of CNNs

Input layer

Convolution step

Introducing non-linearity

The pooling step

Fully connected layer

Logits layer

CNN basic example – MNIST digit classification

Building the model

Cost function

Performance measures

Model training

Summary

Object Detection – CIFAR-10 Example

Object detection

CIFAR-10 – modeling, building, and training

Used packages

Loading the CIFAR-10 dataset

Data analysis and preprocessing

Building the network

Model training

Testing the model

Summary

Object Detection – Transfer Learning with CNNs

Transfer learning

The intuition behind TL

Differences between traditional machine learning and TL

CIFAR-10 object detection – revisited

Solution outline

Loading and exploring CIFAR-10

Inception model transfer values

Analysis of transfer values

Model building and training

Summary

Recurrent-Type Neural Networks - Language Modeling

The intuition behind RNNs

Recurrent neural networks architectures

Examples of RNNs

Character-level language models

Language model using Shakespeare data

The vanishing gradient problem

The problem of long-term dependencies

LSTM networks

Why does LSTM work?

Implementation of the language model

Mini-batch generation for training

Building the model

Stacked LSTMs

Model architecture

Inputs

Building an LSTM cell

RNN output

Training loss

Optimizer

Building the network

Model hyperparameters

Training the model

Saving checkpoints

Generating text

Summary

Representation Learning - Implementing Word Embeddings

Introduction to representation learning

Word2Vec

Building Word2Vec model

A practical example of the skip-gram architecture

Skip-gram Word2Vec implementation

Data analysis and pre-processing

Building the model

Training

Summary

Neural Sentiment Analysis

General sentiment analysis architecture

RNNs – sentiment analysis context

Exploding and vanishing gradients - recap

Sentiment analysis – model implementation

Keras

Data analysis and preprocessing

Building the model

Model training and results analysis

Summary

Autoencoders – Feature Extraction and Denoising

Introduction to autoencoders

Examples of autoencoders

Autoencoder architectures

Compressing the MNIST dataset

The MNIST dataset

Building the model

Model training

Convolutional autoencoder

Dataset

Building the model

Model training

Denoising autoencoders

Building the model

Model training

Applications of autoencoders

Image colorization

More applications

Summary

Generative Adversarial Networks

An intuitive introduction

Simple implementation of GANs

Model inputs

Variable scope

Leaky ReLU

Generator

Discriminator

Building the GAN network

Model hyperparameters

Defining the generator and discriminator

Discriminator and generator losses

Optimizers

Model training

Generator samples from training

Sampling from the generator

Summary

Face Generation and Handling Missing Labels

Face generation

Getting the data

Exploring the Data

Building the model

Model inputs

Discriminator

Generator

Model losses

Model optimizer

Training the model

Semi-supervised learning with Generative Adversarial Networks (GANs)

Intuition

Data analysis and preprocessing

Building the model

Model inputs

Generator

Discriminator

Model losses

Model optimizer

Model training

Summary

Implementing Fish Recognition

Code for fish recognition

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

This book will start off by introducing the foundations of machine learning, what makes learning visible, demonstrating the traditional machine learning techniques with some examples and eventually deep learning. You will then move to creating machine learning models that will eventually lead you to neural networks. You will get familiar with the basics of deep learning and explore various tools that enable deep learning in a powerful yet user-friendly manner. With a very low starting point, this book will enable a regular developer to get hands-on experience with deep learning. You will learn all the essentials needed to explore and understand what deep learning is and will perform deep learning tasks first-hand. Also, we will be using one of the most widely used deep learning frameworks. TensorFlow has big community support that is growing day by day, which makes it a good option for building your complex deep learning applications.

Who this book is for

This book is a starting point for those who are keen on knowing about deep learning and implementing it but do not have an extensive background in machine learning, complex statistics, and linear algebra.

What this book covers

Chapter 1, Data science - Bird's-eye view, explains that data science or machine learning is the process of giving the machines the ability to learn from a dataset without being told or programmed. For instance, it will be extremely hard to write a program that takes a hand-written digit as an input image and outputs a value from 0-9 according to the number that's written in this image. The same applies to the task of classifying incoming emails as spam or non-spam. To solve such tasks, data scientists uses learning methods and tools from the field of data science or machine learning to teach the computer how to automatically recognize digits by giving it some explanatory features that can distinguish each digit from another. The same for the spam/non-spam problem, instead of using regular expressions and writing hundred of rules to classify the incoming emails, we can teach the computer through specific learning algorithms how to distinguish between spam and non-spam emails.

Chapter 2, Data Modeling in Action - The Titanic Example, linear models are the basic learning algorithms in the field of data science. Understanding how a linear model works is crucial in your journey of learning data science because it's the basic building block for most of the sophisticated learning algorithms out there, including neural networks.

Chapter 3, Feature Engineering and Model Complexity – Titanic Example Revisited, covers model complexity and assessment. This is an important towards building a successful data science system. There are lots of tools that you can use to assess and choose your model. In this chapter, we are going to address some of tools that can help you to increase the value of your data by adding more descriptive features and extracting meaningful information from existing ones. We are also going to address other tools related to optimal number features and learn why it's a problem to have a large number of features and fewer training samples/observations.

Chapter 4, Get Up and Running with TensorFlow, gives an overview of one of the most widely used deep learning frameworks. TensorFlow has big community support that is growing day by day, which makes it a good option for building your complex deep learning applications

Chapter 5,Tensorflow in Action - Some Basic Examples, will explain the main computational concept behind TensorFlow, which is the computational graph model, and demonstrate how to get you on track by implementing linear regression and logistic regression.

Chapter 6, Deep Feed-forward Neural Networks - Implementing Digit Classification, explains that a feed-forward neural network(FNN) is a special type of neural network wherein links/connections between neurons do not form a cycle. As such, it is different from other architectures in a neural network that we will get to study later on in this book (recurrent-type neural networks). The FNN is a widely used architecture and it was the first and simplest type of neural network. In this chapter, we will go through the architecture of a typical ;FNN, and we will be using the TensorFlow library for this. After covering these concepts, we will give a practical example of digit classification. The question of this example is, Given a set of images that contain handwritten digits, how can you classify these images into 10 different classes (0-9)?

Chapter 7, Introduction to Convolutional Neural Networks, explains that in data science, a convolutional neural network (CNN) is specific kind of deep learning architecture that uses the convolution operation to extract relevant explanatory features for the input image. CNN layers are connected as an FNN while using this convolution operation to mimic how the human brain functions when trying to recognize objects. Individual cortical neurons respond to stimuli in a restricted region of space known as the receptive field. In particular, biomedical imaging problems could be challenge sometimes but in this chapter, we'll see how to use a CNN in order to discover patterns in this image.

Chapter 8, Object Detection – CIFAR-10 Example, covers the basics and the intuition/motivation behind CNNs, before demonstrating this on one of the most popular datasets available for object detection. We'll also see how the initial layers of the CNN get very basic features about our objects, but the final convolutional layers will get more semantic-level features that are built up from those basic features in the first layers.

Chapter 9, Object Detection – Transfer Learning with CNNs, explains that Transfer learning (TL) is a research problem in data science that is mainly concerned with persisting knowledge acquired during solving a specific task and using this acquired knowledge to solve another different but similar task. In this chapter, we will demonstrate one of the modern practices and common themes used in the field of data science with TL. The idea here is how to get the help from domains with very large datasets to domains that have smaller datasets. Finally, we will revisit our object detection example of CIFAR-10 and try to reduce both the training time and performance error via TL.

Chapter 10, Recurrent-Type Neural Networks - Language Modeling, explains that Recurrent neural networks (RNNs) are a class of deep learning architectures that are widely used for natural language processing. This set of architectures enables us to provide contextual information for current predictions and also have specific architecture that deals with long-term dependencies in any input sequence. In this chapter, we'll demonstrate how to make a sequence-to-sequence model, which will be useful in many applications in NLP. We will demonstrate these concepts by building a character-level language model and see how our model generates sentences similar to original input sequences.

Chapter 11, Representation Learning - Implementing Word Embeddings, explains that machine learning is a science that is mainly based on statistics and linear algebra. Applying matrix operations is very common among most machine learning or deep learning architectures because of backpropagation. This is the main reason deep learning, or machine learning in general, accepts only real-valued quantities as input. This fact contradicts many applications, such as machine translation, sentiment analysis, and so on; they have text as an input. So, in order to use deep learning for this application, we need to have it in the form that deep learning accepts! In this chapter, we are going to introduce the field of representation learning, which is a way to learn a real-valued representation from text while preserving the semantics of the actual text. For example, the representation of love should be very close to the representation of adore because they are used in very similar contexts.

Chapter 12, Neural Sentiment Analysis, addresses one of the hot and trendy applications in natural language processing, which is called sentiment analysis. Most people nowadays express their opinions about something through social media platforms, and making use of this vast amount of text to keep track of customer satisfaction about something is very crucial for companies or even governments.

In this chapter, we are going to use RNNs to build a sentiment analysis solution.

Chapter 13, Autoencoders – Feature Extraction and Denoising, explains that an autoencoder network is nowadays one of the widely used deep learning architectures. It's mainly used for unsupervised learning of efficient decoding tasks. It can also be used for dimensionality reduction by learning an encoding or a representation for a specific dataset. Using autoencoders in this chapter, we'll show how to denoise your dataset by constructing another dataset with the same dimensions but less noise. To use this concept in practice, we will extract the important features from the MNIST dataset and try to see how the performance will be significantly enhanced by this.

Chapter 14, Generative Adversarial Networks, coversGenerative Adversarial Networks (GANs). They are deep neural net architectures that consist of two networks pitted against each other (hence the name adversarial). GANs were introduced in a paper (https://arxiv.org/abs/1406.2661) by Ian Goodfellow and other researchers, including Yoshua Bengio, at the University of Montreal in 2014. Referring to GANs, Facebook's AI research director, Yann LeCun, called adversarial training the most interesting idea in the last 10 years in machine learning. The potential of GANs is huge, because they can learn to mimic any distribution of data. That is, GANs can be taught to create worlds eerily similar to our own in any domain: images, music, speech, or prose. They are robot artists in a sense, and their output is impressive (https://www.nytimes.com/2017/08/14/arts/design/google-how-ai-creates-new-music-and-new-artists-project-magenta.html)—and poignant too.

Chapter 15, Face Generation and Handling Missing Labels, shows that the list of interesting applications that we can use GANs for is endless. In this chapter, we are going to demonstrate another promising application of GANs, which is face generation based on the CelebA database. We'll also demonstrate how to use GANs for semi-supervised learning setups where we've got a poorly labeled dataset with some missing labels.

Appendix, Implementing Fish Recognition, includes entire piece of code of fish recognition example.

To get the most out of this book

Inform the reader of the things that they need to know before they start, and spell out what knowledge you are assuming.

Any additional installation instructions and information they need for getting set up.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packtpub.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Deep-Learning-By-Example. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/DeepLearningByExample_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

html, body, #map { height: 100%; margin: 0; padding: 0}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]exten => s,1,Dial(Zap/1|30)exten => s,2,Voicemail(u100)

exten => s,102,Voicemail(b100)

exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

$ mkdir css

$ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Data Science - A Birds' Eye View

Data science or machine learning is the process of giving the machines the ability to learn from a dataset without being told or programmed. For instance, it is extremely hard to write a program that can take a hand-written digit as an input image and outputs a value from 0-9 according to the image that's written. The same applies to the task of classifying incoming emails as spam or non-spam. For solving such tasks, data scientists use learning methods and tools from the field of data science or machine learning to teach the computer how to automatically recognize digits, by giving it some explanatory features that can distinguish one digit from another. The same for the spam/non-spam problem, instead of using regular expressions and writing hundred of rules to classify the incoming email, we can teach the computer through specific learning algorithms how to distinguish between spam and non-spam emails.

For the spam filtering application, you can code it by a rule-based approach, but it won't be good enough to be used in production, like the one in your mailing server. Building a learning system is an ideal solution for that.

You are probably using applications of data science on a daily basis, often without knowing it. For example, your country might be using a system to detect the ZIP code of your posted letter in order to automatically forward it to the correct area. If you are using Amazon, they often recommend things for you to buy and they do this by learning what sort of things you often search for or buy.

Building a learned/trained machine learning algorithm will require a base of historical data samples from which it's going to learn how to distinguish between different examples and to come up with some knowledge and trends from that data. After that, the learned/trained algorithm could be used for making predictions on unseen data. The learning algorithm will be using raw historical data and will try to come up with some knowledge and trends from that data.

In this chapter, we are going to have a bird's-eye view of data science, how it works as a black box, and the challenges that data scientists face on a daily basis. We are going to cover the following topics:

Understanding data science by an example

Design procedure of data science algorithms

Getting to learn

Implementing the fish recognition/detection model

Different learning types

Data size and industry needs

Design procedure of data science algorithms

Different learning systems usually follow the same design procedure. They start by acquiring the knowledge base, selecting the relevant explanatory features from the data, going through a bunch of candidate learning algorithms while keeping an eye on the performance of each one, and finally the evaluation process, which measures how successful the training process was.

In this section, we are going to address all these different design steps in more detail:

Figure 1.11: Model learning process outline

Data pre-processing

This component of the learning cycle represents the knowledge base of our algorithm. So, in order to help the learning algorithm give accurate decisions about the unseen data, we need to provide this knowledge base in the best form. Thus, our data may need a lot of cleaning and pre-processing (conversions).

Data cleaning

Most datasets require this step, in which you get rid of errors, noise, and redundancies. We need our data to be accurate, complete, reliable, and unbiased, as there are lots of problems that may arise from using bad knowledge base, such as:

Inaccurate and biased conclusions

Increased error

Reduced generalizability, which is the model's ability to perform well over the unseen data that it didn't train on previously

Feature selection

The number of explanatory features (input variables) of a sample can be enormous wherein you get xi=(xi1, xi2, xi3, ... , xid) as a training sample (observation/example) and d is very large. An example of this can be a document classification task3, where you get 10,000 different words and the input variables will be the number of occurrences of different words.

This enormous number of input variables can be problematic and sometimes a curse because we have many input variables and few training samples to help us in the learning procedure. To avoid this curse of having an enormous number of input variables (curse of dimensionality), data scientists use dimensionality reduction techniques in order to select a subset from the input variables. For example, in the text classification task they can do the following:

Extracting relevant inputs (for instance, mutual information measure)

Principal component analysis

(

PCA

)

Grouping (cluster) similar words (this uses a similarity measure)

Model selection

This step comes after selecting a proper subset of your input variables by using any dimensionality reduction technique. Choosing the proper subset of the input variable will make the rest of the learning process very simple.

In this step, you are trying to figure out the right model to learn.

If you have any prior experience with data science and applying learning methods to different domains and different kinds of data, then you will find this step easy as it requires prior knowledge of how your data looks and what assumptions could fit the nature of your data, and based on this you choose the proper learning method. If you don't have any prior knowledge, that's also fine because you can do this step by guessing and trying different learning methods with different parameter settings and choose the one that gives you better performance over the test set.

Also, initial data analysis and visualization will help you to make a good guess about the form of the distribution and nature of your data.

Learning process

By learning, we mean the optimization criteria that you are going to use to select the best model parameters. There are various optimization criteria for that:

Mean square error

(

MSE

)

Maximum likelihood

(

ML

) criterion

Maximum a posterior probability

(

MAP

)

The optimization problem may be hard to solve, but the right choice of model and error function makes a difference.

Evaluating your model

In this step, we try to measure the generalization error of our model on the unseen data. Since we only have the specific data without knowing any unseen data beforehand, we can randomly select a test set from the data and never use it in the training process so that it acts like valid unseen data. There are different ways you can to evaluate the performance of the selected model:

Simple holdout method, which is dividing the data into training and testing sets

Other complex methods, based on cross-validation and random subsampling

Our objective in this step is to compare the predictive performance for different models trained on the same data and choose the one with a better (smaller) testing error, which will give us a better generalization error over the unseen data. You can also be more certain about the generalization error by using a statistical method to test the significance of your results.

Getting to learn

Building a machine learning system comes with some challenges and issues; we will try to address them in this section. Many of these issues are domain specific and others aren't.

Challenges of learning

The following is an overview of the challenges and issues that you will typically face when trying to build a learning system.

Feature extraction – feature engineering

Feature extraction is one of the crucial steps toward building a learning system. If you did a good job in this challenge by selecting the proper/right number of features, then the rest of the learning process will be easy. Also, feature extraction is domain dependent and it requires prior knowledge to have a sense of what features could be important for a particular task. For example, the features for our fish recognition system will be different from the ones for spam detection or identifying fingerprints.

The feature extraction step starts from the raw data that you have. Then build derived variables/values (features) that are informative about the learning task and facilitate the next steps of learning and evaluation (generalization).

Some tasks will have a vast number of features and fewer training samples (observations) to facilitate the subsequent learning and generalization processes. In such cases, data scientists use dimensionality reduction techniques to reduce the vast number of features to a smaller set.

Noise

In the fish recognition task, you can see that the length, weight, fish color, as well as the boat color may vary, and there could be shadows, images with low resolution, and other objects in the image. All these issues affect the significance of the proposed explanatory features that should be informative about our fish classification task.

Work-arounds will be helpful in this case. For example, someone might think of detecting the boat ID and mask out certain parts of the boat that most likely won't contain any fish to be detected by our system. This work-around will limit our search space.

Overfitting

As we have seen in our fish recognition task, we have tried to enhance our model's performance by increasing the model complexity and perfectly classifying every single instance of the training samples. As we will see later, such models do not work over unseen data (such as the data that we will use for testing the performance of our model). Having trained models that work perfectly over the training samples but fail to perform well over the testing samples is called overfitting.

If you sift through the latter part of the chapter, we build a learning system with an objective to use the training samples as a knowledge base for our model in order to learn from it and generalize over the unseen data. Performance error of the trained model is of no interest to us over the training data; rather, we are interested in the performance (generalization) error of the trained model over the testing samples that haven't been involved in the training phase.

Selection of a machine learning algorithm

Sometimes you are unsatisfied with the execution of the model that you have utilized for a particular errand and you need an alternate class of models. Each learning strategy has its own presumptions about the information it will utilize as a learning base. As an information researcher, you have to discover which suspicions will fit your information best; by this you will have the capacity to acknowledge to attempt a class of models and reject another.

Prior knowledge

As discussed in the concepts of model selection and feature extraction, the two issues can be dealt with, if you have prior knowledge about:

The appropriate feature

Model selection parts

Having prior knowledge of the explanatory features in the fish recognition system enabled us to differentiate amid different types of fish. We can go promote by endeavoring to envision our information and get some feeling of the information types of the distinctive fish classifications. On the basis of this prior knowledge, apt family of models can be chosen.

Missing values

Missing features mainly occur because of a lack of data or choosing the prefer-not-to-tell option. How can we handle such a case in the learning process? For example, imagine we find the width of specific a fish type is missing for some reason. There are many ways to handle these missing features.

Implementing the fish recognition/detection model

To introduce the power of machine learning and deep learning in particular, we are going to implement the fish recognition example. No understanding of the inner details of the code will be required. The point of this section is to give you an overview of a typical machine learning pipeline.

Our knowledge base for this task will be a bunch of images, each one of them is labeled as opah or tuna. For this implementation, we are going to use one of the deep learning architectures that made a breakthrough in the area of imaging and computer vision in general. This architecture is called Convolution Neural Networks (CNNs). It is a family of deep learning architectures that use the convolution operation of image processing to extract features from the images that can explain the object that we want to classify. For now, you can think of it as a magic box that will take our images, learn from it how to distinguish between our two classes (opah and tuna), and then we will test the learning process of this box by feeding it with unlabeled images and see if it's able to tell which type of fish is in the image.

Different types of learning will be addressed in a later section, so you will understand later on why our fish recognition task is under the supervised learning category.

In this example, we will be using Keras. For the moment, you can think of Keras as an API that makes building and using deep learning way easier than usual. So let's get started! From the Keras website we have:

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

Knowledge base/dataset

As we mentioned earlier, we need a historical base of data that will be used to teach the learning algorithm about the task that it's supposed to do later. But we also need another dataset for testing its ability to perform the task after the learning process. So to sum up, we need two types of datasets during the learning process:

The first one is the knowledge base where we have the input data and their corresponding labels such as the fish images and their corresponding labels (opah or tuna). This data will be fed to the learning algorithm to learn from it and try to discover the patterns/trends that will help later on for classifying unlabeled images.

The second one is mainly for testing the ability of the model to apply what it learned from the knowledge base to unlabeled images or unseen data, in general, and see if it's working well.

As you can see, we only have the data that we will use as a knowledge base for our learning method. All of the data we have at hand will have the correct output associated with it. So we need to somehow make up this data that does not have any correct output associated with it (the one that we are going to apply the model to).

While performing data science, we'll be doing the following:

Training phase

: We present our data from our knowledge base and train our learning method/model by feeding the input data along with its correct output to the model.

Validation/test phase

: In this phase, we are going to measure how well the trained model is doing. We also use different model property techniques in order to measure the performance of our trained model by using (R-square score for regression, classification errors for classifiers, recall and precision for IR models, and so on).

The validation/test phase is usually split into two steps:

In the first step, we use different learning methods/models and choose the best performing one based on our validation data (validation step)

Then we measure and report the accuracy of the selected model based on the test set (test step)

Now let's see how we get this data to which we are going to apply the model and see how well trained it is.

Since we don't have any training samples without the correct output, we can make up one from the original training samples that we will be using. So we can split our data samples into three different sets (as shown in Figure 1.9):

Train set

: This will be used as a knowledge base for our model. Usually, will be 70% from the original data samples.

Validation set

: This will be used to choose the best performing model among a set of models. Usually this will be 10% of the original data samples.

Test set

: This will be used to measure and report the accuracy of the selected model. Usually, it will be as big as the validation set.

Figure 1.9: Splitting data into train, validation, and test sets

In case you have only one learning method that you are using, you can cancel the validation set and re-split your data to be train and test sets only. Usually, data scientists use 75/25 as percentages, or 70/30.