E-Book
45,59 €

Predictive Analytics with TensorFlow E-Book

Md. Rezaul Karim

0,0

45,59 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Accomplish the power of data in your business by building advanced predictive modelling applications with Tensorflow.

About This Book

A quick guide to gain hands-on experience with deep learning in different domains such as digit/image classification, and texts
Build your own smart, predictive models with TensorFlow using easy-to-follow approach mentioned in the book
Understand deep learning and predictive analytics along with its challenges and best practices

Who This Book Is For

This book is intended for anyone who wants to build predictive models with the power of TensorFlow from scratch. If you want to build your own extensive applications which work, and can predict smart decisions in the future then this book is what you need!

What You Will Learn

Get a solid and theoretical understanding of linear algebra, statistics, and probability for predictive modeling
Develop predictive models using classification, regression, and clustering algorithms
Develop predictive models for NLP
Learn how to use reinforcement learning for predictive analytics
Factorization Machines for advanced recommendation systems
Get a hands-on understanding of deep learning architectures for advanced predictive analytics
Learn how to use deep Neural Networks for predictive analytics
See how to use recurrent Neural Networks for predictive analytics
Convolutional Neural Networks for emotion recognition, image classification, and sentiment analysis

In Detail

Predictive analytics discovers hidden patterns from structured and unstructured data for automated decision-making in business intelligence.

This book will help you build, tune, and deploy predictive models with TensorFlow in three main sections. The first section covers linear algebra, statistics, and probability theory for predictive modeling.

The second section covers developing predictive models via supervised (classification and regression) and unsupervised (clustering) algorithms. It then explains how to develop predictive models for NLP and covers reinforcement learning algorithms. Lastly, this section covers developing a factorization machines-based recommendation system.

The third section covers deep learning architectures for advanced predictive analytics, including deep neural networks and recurrent neural networks for high-dimensional and sequence data. Finally, convolutional neural networks are used for predictive modeling for emotion recognition, image classification, and sentiment analysis.

Style and approach

TensorFlow, a popular library for machine learning, embraces the innovation and community-engagement of open source, but has the support, guidance, and stability of a large corporation.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 554

Veröffentlichungsjahr: 2017

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Predictive Analytics with TensorFlow

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Basic Python and Linear Algebra for Predictive Analytics

A basic introduction to predictive analytics

Why predictive analytics?

Working principles of a predictive model

A bit of linear algebra

Programming linear algebra

Installing and getting started with Python

Installing on Windows

Installing Python on Linux

Installing and upgrading PIP (or PIP3)

Installing Python on Mac OS

Installing packages in Python

Getting started with Python

Python data types

Using strings in Python

Using lists in Python

Using tuples in Python

Using dictionary in Python

Using sets in Python

Functions in Python

Classes in Python

Vectors, matrices, and graphs

Vectors

Matrices

Matrix addition

Matrix subtraction

Multiplying two matrices

Finding the determinant of a matrix

Finding the transpose of a matrix

Solving simultaneous linear equations

Eigenvalues and eigenvectors

Span and linear independence

Principal component analysis

Singular value decomposition

Data compression in a predictive model using SVD

Predictive analytics tools in Python

Summary

2. Statistics, Probability, and Information Theory for Predictive Modeling

Using statistics in predictive modeling

Statistical models

Parametric versus nonparametric model

Parametric predictive models

Nonparametric predictive models

Population and sample

Random sampling

Expectation

Central limit theorem

Skewness and data distribution

Standard deviation and variance

Covariance and correlation

Interquartile, range, and quartiles

Hypothesis testing

Chi-square tests

Chi-square independence test

Basic probability for predictive modeling

Probability and the random variables

Generating random numbers and setting the seed

Probability distributions

Marginal probability

Conditional probability

The chain rule of conditional probability

Independence and conditional independence

Bayes' rule

Using information theory in predictive modeling

Self-information

Mutual information

Entropy

Shannon entropy

Joint entropy

Conditional entropy

Information gain

Using information theory

Using information theory in Python

Summary

3. From Data to Decisions – Getting Started with TensorFlow

Taking decisions based on data - Titanic example

Data value chain for making decisions

From disaster to decision – Titanic survival example

General overview of TensorFlow

Installing and configuring TensorFlow

Installing TensorFlow on Linux

Installing Python and nVidia driver

Installing NVIDIA CUDA

Installing NVIDIA cuDNN v5.1+

Installing the libcupti-dev library

Installing TensorFlow

Installing TensorFlow with native pip

Installing with virtualenv

Installing TensorFlow from source

Testing your TensorFlow installation

TensorFlow computational graph

TensorFlow programming model

Data model in TensorFlow

Tensors

Rank

Shape

Data type

Variables

Fetches

Feeds and placeholders

TensorBoard

How does TensorBoard work?

Getting started with TensorFlow – linear regression and beyond

Source code for the linear regression

Summary

4. Putting Data in Place - Supervised Learning for Predictive Analytics

Supervised learning for predictive analytics

Linear regression - revisited

Problem statement

Using linear regression for movie rating prediction

From disaster to decision - Titanic example revisited

An exploratory analysis of the Titanic dataset

Feature engineering

Logistic regression for survival prediction

Using TensorFlow contrib

Linear SVM for survival prediction

Ensemble method for survival prediction: random forest

A comparative analysis

Summary

5. Clustering Your Data - Unsupervised Learning for Predictive Analytics

Unsupervised learning and clustering

Using K-means for predictive analytics

How K-means works

Using K-means for predicting neighborhoods

Predictive models for clustering audio files

Using kNN for predictive analytics

Working principles of kNN

Implementing a kNN-based predictive model

Summary

6. Predictive Analytics Pipelines for NLP

NLP analytics pipelines

Using text analytics

Transformers and estimators

Standard transformer

Estimator transformer

StopWordsRemover

N-gram

Using BOW for predictive analytics

Bag-of-words

The problem definition

The dataset description and exploration

Spam prediction using LR and BOW with TensorFlow

TF-IDF model for predictive analytics

How to compute TF, IDF, and TFIDF?

Implementing a TF-IDF model for spam prediction

Using Word2vec for sentiment analysis

Continuous bag-of-words

Continuous skip-gram

Using CBOW for word embedding and model building

CBOW model building

Reusing the CBOW for predicting sentiment

Summary

7. Using Deep Neural Networks for Predictive Analytics

Deep learning for better predictive analytics

Artificial Neural Networks

Deep Neural Networks

DNN architectures

Multilayer perceptrons

Training an MLP

Using MLPs

DNN performance analysis

Fine-tuning DNN hyperparameters

Number of hidden layers

Number of neurons per hidden layer

Activation functions

Weight and biases initialization

Regularization

Using multilayer perceptrons for predictive analytics

Dataset description

Preprocessing

A TensorFlow implementation of MLP

Deep belief networks

Restricted Boltzmann Machines

Construction of a simple DBN

Unsupervised Pretraining

Using deep belief networks for predictive analytics

Summary

8. Using Convolutional Neural Networks for Predictive Analytics

CNNs and the drawbacks of regular DNNs

CNN architecture

Convolutional operations

Applying convolution operations in TensorFlow

Pooling layer and padding operations

Applying subsampling operations in TensorFlow

Tuning CNN hyperparameters

CNN-based predictive model for sentiment analysis

Exploring movie and product review datasets

Using CNN for predictive analytics about movie reviews

CNN model for emotion recognition

Dataset description

CNN architecture design

Testing the model on your own image

Using complex CNN for predictive analytics

Dataset description

CNN predictive model for image classification

Summary

9. Using Recurrent Neural Networks for Predictive Analytics

RNN architecture

Contextual information and the architecture of RNNs

BRNNs

LSTM networks

GRU cell

Using BRNN for image classification

Implementing an RNN for spam prediction

Developing a predictive model for time series data

Description of the dataset

Preprocessing and exploratory analysis

LSTM predictive model

Model evaluation

An LSTM predictive model for sentiment analysis

Network design

LSTM model training

Visualizing through TensorBoard

LSTM model evaluation

Summary

10. Recommendation Systems for Predictive Analytics

Recommendation systems

Collaborative filtering approaches

Content-based filtering approaches

Hybrid recommendation systems

Model-based collaborative filtering

Collaborative filtering approach for movie recommendations

The utility matrix

Dataset description

Ratings data

Movies data

User data

Exploratory analysis of the dataset

Implementing a movie recommendation engine

Training the model with available ratings

Inferencing the saved model

Generating a user-item table

Clustering similar movies

Movie rating prediction by users

Finding the top K movies

Predicting top K similar movies

Computing the user-user similarity

Evaluating the recommendation system

Factorization machines for recommendation systems

Factorization machines

The cold start problem in recommendation systems

Problem definition and formulation

Dataset description

Preprocessing

Implementing an FM model

Improved factorization machines for predictive analytics

Neural factorization machines

Dataset description

Using NFM for movie recommendations

Model training

Model evaluation

Summary

11. Using Reinforcement Learning for Predictive Analytics

Reinforcement learning

Reinforcement learning in predictive analytics

Notation, policy, and utility in RL

Policy

Utility

Developing a multiarmed bandit's predictive model

Developing a stock price predictive model

Summary

Index

Predictive Analytics with TensorFlow

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: October 2017

Production reference: 1251017

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78839-892-3

www.packtpub.com

Credits

Author

Md. Rezaul Karim

Reviewers

Andrea Mostosi

Meng-Chieh Ling

Commissioning Editor

Sunith Shetty

Acquisition Editor

Chandan Kumar

Content Development Editor

Amrita Noronha

Technical Editor

Sayali Thanekar

Copy Editor

Safis Editing

Project Coordinator

Shweta H Birwatkar

Proofreader

Safis Editing

Indexer

Pratik Shirodkar

Graphics

Tania Dutta

Production Coordinator

Aparna Bhagat

About the Author

Md. Rezaul Karim is a Research Scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Aachen, Germany. He holds a BSc and an MSc degree in Computer Science. Before joining Fraunhofer FIT, he worked as a Researcher at Insight Centre for Data Analytics, Ireland. Before this, he worked as a Lead Engineer at Samsung Electronics' distributed R&D Institutes in Korea, India, Turkey, and Bangladesh. Previously, he has worked as a Research Assistant at the database lab, Kyung Hee University, Korea. He also worked as an R&D engineer with BMTech21 Worldwide, Korea. Even before this, he worked as a Software Engineer with i2SoftTechnology, Dhaka, Bangladesh.

He has more than 8 years of experience in the area of research and development with solid understanding of algorithms and data structures in C, C++, Java, Scala, R, and Python. He has published several books, articles, and research papers concerning big data and virtualization technologies, such as Spark, Kafka, DC/OS, Docker, Mesos, Zeppelin, Hadoop, and MapReduce. He is also equally competent with deep learning technologies such as TensorFlow, DeepLearning4j, and H2O. His research interests include Machine Learning, Deep Learning, Semantic Web, Linked Data, Big Data, and Bioinformatics. Also, he is the author of the following book titles:

Large-Scale Machine Learning with Spark (Packt Publishing Ltd.)Deep Learning with TensorFlow (Packt Publishing Ltd.)Scala and Spark for Big Data Analytics (Packt Publishing Ltd.)

Acknowledgments

I am very grateful to my parents, who have always encouraged me to pursue knowledge. I also want to thank my wife, Saroar; son, Shadman; brother, Mamtaz; sister, Josna; and friends who have endured my long monologs about the subjects in this book and always have encouraged and listened to me. Writing this book was made easier by the amazing efforts of the open source community and the great documentation of many projects out there related to TensorFlow and Python. Further, I would like to thank the acquisition, content development, and technical editors of Packt Publishing Ltd. (and, of course, others who were involved in this book title) for their sincere cooperation and coordination. Additionally, without the work of numerous researchers and deep learning practitioners who shared their expertise in publications, lectures, and source code, this book might not have existed at all! Finally, I appreciate the efforts of the TensorFlow community and all those who have contributed to APIs, whose work ultimately brought machine learning to the masses.

About the Reviewers

Andrea Mostosi is a technology enthusiast, a husband, and a father. During the last 10 years, he led the entire life cycle of several projects across different technologies, companies, and markets. He is now working on artificial intelligence, data mining, and a lot of other scary things.

I'd like to thank my wonderful son, Ryan, for every smile, every hug, and every sleepless night he gave me since his birth. When the machines finally take over humanity, you'll be able to say that your father has contributed to making this happen, my son.

Meng-Chieh Ling is a theoretical physics PhD from Karlsruhe Institute of Technology. After finishing his PhD, he attended The Data Incubator Reply to change his career from theoretical physics to data science.

www.PacktPub.com

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1788398920.

If you'd like to join our team of regular reviewers, you can email us at <[email protected]>. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Preface

The continued growth in data, coupled with the need to make increasingly complex decisions against that data, is creating massive hurdles that prevent organizations from deriving insights in a timely manner using traditional approaches. Machine learning is concerned with algorithms that transform raw data into information and then into actionable intelligence. This fact makes machine learning well suited to the predictive analytics. Without machine learning, therefore, it would be nearly impossible to keep up with these massive streams of information altogether.

On the other hand, deep learning is a branch of machine learning algorithms based on learning multiple levels of representation. A deep learning algorithm is nothing more than the implementation of a complex and deep neural network so that it can learn through the analysis of large amounts of data. Thus, it took just a few years to develop powerful deep learning algorithms to recognize images, natural language processing, and perform a myriad of other complex tasks.

Considering these motivations and requirements, this book is dedicated to developers, data analysts, machine learning practitioners, and deep learning enthusiasts who want to build powerful, robust, and accurate predictive models with the power of TensorFlow from scratch, and combining other open source Python libraries.

The first section of this book covers applied math, statistics, and probability theory for predictive analytics. It will then cover useful Python packages to getting started with data science in a practical manner. The second section shows how to develop large-scale predictive analytics pipelines using supervised learning algorithms, for example, classification and regression; and unsupervised learning algorithms, for example, clustering. It'll then demonstrate how to develop predictive models for NLP.

Finally, reinforcement learning and a factorization machine-based recommendation system will be used to develop predictive models. The third section covers practical mastery of deep learning architectures for advanced predictive analytics, including deep neural networks and recurrent neural networks for high-dimensional and sequence data. Finally, it'll show how to develop convolutional neural networks-based predictive models for emotion recognition, image classification, and sentiment analysis.

Happy Reading!

What this book covers

Chapter 1, Basic Python and Linear Algebra for Predictive Analytics, discusses the basic concepts in linear algebra for predictive analytics, such as vectors, matrices, tensors, linear dependence, and span. Then, we move on to a brief introduction to PrincipalComponentAnalysis (PCA) and SingularValueDecomposition (SVD). Finally, some predictive modeling tools in Python will be discussed.

Chapter 2, Statistics, Probability, and Information Theory for Predictive Modeling, covers some statistic, probabilistic, and information theory concepts before getting started on predictive analytics: random sampling, hypothesis testing, chi-square test, correlation, expectation, variance, covariance and Bayes' rule, and so on. It then discusses the central objects of probability theory: random variables, stochastic processes, and events. Information theory, which studies the quantification, storage, and communication of information, will be discussed at the end of the chapter.

Chapter 3, From Data to Decisions - Getting Started with TensorFlow, provides a detailed description of the main TensorFlow features in a real-life problem, followed by detailed discussions about TensorFlow installation and configuration. It then covers computation graphs, data, and programming models before getting started with TensorFlow. The last part of the chapter contains an example of implementing linear regression model for predictive analytics.

Chapter 4, Putting Data in Place - Supervised Learning for Predictive Analytics, covers some TensorFlow-based supervised learning techniques from a theoretical and practical perspective. In particular, the linear regression model for regression analysis will be covered on a real dataset. It then shows how we could solve the Titanic survival problem using logistic regression, random forests, and SVMs for predictive analytics.

Chapter 5, Clustering Your Data - Unsupervised Learning for Predictive Analytics, digs deeper into predictive analytics and finds out how we can take advantage of it to cluster records belonging to the certain group or class for a dataset of unsupervised observations. It will then provide some practical examples of unsupervised learning. Particularly, clustering techniques using TensorFlow will be discussed with some hands-on examples.

Chapter 6, Predictive Analytics Pipelines for NLP, shows how to use TensorFlow for text analytics with a focus on text classification from an unstructured spam prediction and movie review dataset. Based on the spam filtering dataset, it shows how to develop predictive models using a linear regression algorithm with TensorFlow. Particularly, it will use the bag-of-words (BOW) and TF-IDF algorithms for spam prediction. Later on, it will also show how to develop large-scale predictive models for predicting sentiment from the movie review dataset using the continuous bag-of-words (CBOW) and continuous skip-gram algorithms.

Chapter 7, Using Deep Neural Networks for Predictive Analytics, demonstrates how to train DNNs and analyze the performance metrics that are needed to evaluate a DNN predictive model. It also shows how to tune the hyperparameters for DNNs for better and optimized performance. It will provide two examples on how to build very robust and accurate predictive models for predictive analytics as well, in particular, using DeepBeliefNetworks (DBN) and MultilayerPerceptron (MLP) on a bank marketing dataset.

Chapter 8, Using Convolutional Neural Networks for Predictive Analytics, discusses how to develop predictive analytics applications such as emotion recognition, image classification, and text classification using the convolutional neural network algorithm on real image/text datasets. Finally, it will provide some pointers on how to tune and debug CNN-based networks for optimized performance.

Chapter 9, Using Recurrent Neural Networks for Predictive Analytics, provides some theoretical background for RNNs. Then, it shows a few examples of implementing predictive models for image classification, sentiment analysis of movies, and products spam prediction for NLP. Finally, it shows how to develop predictive models for time-series data.

Chapter 10, Recommendation System for Predictive Analytics, provides several examples of how to develop recommendation systems for predictive analytics followed by some theoretical background of recommendation systems, for example, matrix factorization. Later in the chapter, an example of developing movie recommendation engine using SVD and K-means will be shown. Finally, the chapter shows how we could use factorization machines to develop a more accurate and robust recommendation system.

Chapter 11, Using Reinforcement Learning for Predictive Analytics, talks about designing machine learning systems driven by criticism and rewards. It will show several examples of how to apply reinforcement learning algorithms for developing predictive models on real-life datasets.

What you need for this book

All the examples have been implemented in Python 2 and 3 with TensorFlow 1.2.0+. You will also need some additional software and tools. To be more specific, the following tools and libraries are required, preferably the latest version:

Python (2.7.x or 3.3+)TensorFlow (1.0.0+)Bazel (latest version)pip/pip3 (latest version for Python 2 and 3 respectively)matplotlib (latest version) pandas (latest version) NumPy (latest version)SciPy (latest version) sklearn (latest version)yahoo_finance (latest version)Bazel(latest version)CUDA (latest version)CuDNN (latest version)

Linux distributions are preferable (including Debian, Ubuntu, Fedora, RHEL, and CentOS) and to be more specific, for Ubuntu it is recommended to have the 14.04 (LTS) 64-bit (or later) complete installation or VMWare player 12 or VirtualBox. You can also run TensorFlow jobs on Windows (XP/7/8/10) or Mac OS X (10.4.7+).

Processor Core i5 or Core i7 with GPU support is recommended to get the best results. However, multicore processing would provide faster data processing and scalability of the predictive analytics jobs—at least 8 GB RAM (recommended) for a standalone mode and at least 32 GB RAM for a single VM and higher for a cluster. There is enough storage for running heavy jobs (depending on the dataset size you will be handling), preferably at least 50 GB of free disk storage.

Who this book is for

This book is dedicated to developers, data analysts, and deep learning enthusiasts who want to build powerful, robust, and accurate predictive models with the power of TensorFlow from scratch and in combination with other open source Python libraries. If you want to build your own extensive applications that work and can predict smart decisions in the future, then this book is what you need! A good command of object-oriented programming with Python is a prerequisite. Some competence in applied mathematics, statistics, linear algebra, and information theory is a plus and would help readers understand the concepts presented in this book.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e–mail to <[email protected]>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e–mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.Hover the mouse pointer on the SUPPORT tab at the top.Click on Code Downloads & Errata.Enter the name of the book in the Search box.Select the book for which you're looking to download the code files.Choose from the drop-down menu where you purchased this book from.Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Predictive–Analytics–with–TensorFlow. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/PredictiveAnalyticswithTensorFlow_ColorImages.pdf

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the ErrataSubmissionForm link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyright material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Chapter 1. Basic Python and Linear Algebra for Predictive Analytics

Predictive analytics (PA) is the use of data, statistical algorithms, and machine learning techniques to identify the likelihood of future outcomes based on historical data. The goal is to go beyond knowing what has happened to provide the best assessment of what will happen in the future. However, before we start developing predictive analytics models, knowing basic linear algebra, statistics, probability, and information theory with Python is a mandate. We will start with the basic concepts of linear algebra with Python.

In a nutshell, the following topics will be covered in this chapter:

What are predictive analytics and why do we use them?What is linear algebra?Installing and getting started with PythonVectors, matrices, and tensorsLinear dependence and spanPrincipal component analysis (PCA)Singular value decomposition (SVD)Predictive modeling tools in Python

A basic introduction to predictive analytics

We will refer to a famous definition of machine learning by Tom Mitchell, where he explained what learning really means from a computer science perspective:

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E"

Based on this definition, we can conclude that a computer program or machine can:

Learn from data and historiesCan be improved with experienceInteractively enhance a model that can be used to predict an outcome

Typical machine learning tasks are concept learning, predictive modeling, clustering, and finding useful patterns. The ultimate goal is to improve the learning in such a way that it becomes automatic: so that no human interactions are needed anymore or reduce the level of human interaction as much as possible.

Predictive analytics on the other hand is the process of extracting useful information from historical facts, and stream data (consisting of live data objects) in order to determine hidden patterns and predict future outcomes and trends.

Tip

What doesn't predictive analytics do?

Predictive analytics does not tell you what will happen in the future, rather it is about creating predictive models that place a numerical value, or score, on the likelihood of a particular event to happen in the future with an acceptable level of reliability, and includes what-if scenarios and risk assessment.

Why predictive analytics?

In the area of business intelligence, with the right operations management platform, decision-makers are capable of managing all of the business-related inputs, events, and data that provide real-time insight to the enterprise level. Subsequently, predictive models can be used to identify useful patterns from historical, transactional, and recent data to identify potential risks and opportunities. Therefore, it is gaining much attention and wide acceptance. Furthermore, using the traditional reporting and monitoring tools, you have the ability to move from the reactive operations to proactive operations. PA helps move beyond this to plan for the future and identify new areas of business for profit and productivity.

Working principles of a predictive model

Being at the core of predictive analytics, many machine learning functions can be formulated as a convex optimization problem for finding a minimizer of a convex function f that depends on a variable vector w (weights), which has d records. Formally, we can write this as the optimization problem , where the objective function is of the form:

Here the vectors are the training data points for 1≤i≤n, and are their corresponding labels that we want to predict eventually. We call the method linear if L(w;x,y) can be expressed as a function of wTx and y.

The objective function f has two components: i) a regularizer that controls the complexity of the model, and ii) the loss that measures the error of the model on the training data. The loss function L(w;) is typically a convex function in w. The fixed regularization parameter λ≥0 defines the trade-off between the two goals of minimizing the loss on the training error and minimizing model complexity to avoid overfitting. For more detailed discussion, interested readers should refer to Chapter 7, Using Deep Neural Networks for Predictive Analytics.

A more simplified understanding can be gained from figure 1: you have the current data or observations. Now it's your shot to use the black box to predict the future outcome based on the current data and historical facts. In this context, all the undecided values are called parameters, and the description–that is, the black box, is a PA model:

Figure 1: the main task in predictive analytics is predictive modeling–that is, using the black box

As an engineer or a developer, you have to write an algorithm that will observe existing parameters/data/samples/examples to train the black box and figure out how to tune parameters to achieve the best model for making predictions before the deployment. Wow, that's a mouthful! Don't worry; this concept will be clearer in upcoming chapters.

In machine learning, we observe an algorithm's performance in two stages: learning and inference. The ultimate target of the learning stage is to prepare and describe the available data, also called feature vector, which is used to train the model.

The learning stage is one of the most important stages, but it is also truly time-consuming. It involves preparing a list of vectors also called feature vectors (most of the time) from the training data after transformation so that we can feed them to the learning algorithms. On the other hand, training data also sometimes contains impure information that needs some pre-processing such as cleaning.

Once we have the feature vectors, the next step in this stage is preparing (or writing/reusing) the learning algorithm. The next important step is training the algorithm to prepare the predictive model. Typically, (and of course based on data size), running an algorithm may take hours (or even days) so that the features converge into a useful model as shown in the following figure:

Figure 2: Learning and training a predictive model – it shows how to generate the feature vectors from the training data to train the learning algorithm that produces a predictive model

Tip

Common predictive analytics methods

Common predictive analytics methods include regression analysis, classification, time series forecasting, association rule mining, clustering, recommendation systems and text mining, sentiment analysis, and much more. Now to prepare the feature vectors, we need to know a little bit about mathematics, statistics, and so on.

The second most important stage is the inference that is used for making an intelligent use of the model such as predicting from the never-before-seen data, making recommendations, deducing future rules, and so on. Typically, it takes less time compared to the learning stage and sometimes even in real time, as shown in the following figure:

Figure 3: Inferencing from an existing model towards predictive analytics (feature vectors are generated from unknown data for making predictions)

Thus, inferencing (see figure 4 for more) is all about testing the model against new (that is, unobserved) data and evaluating the performance of the model itself. However, in the whole process and for making the predictive model a successful one, data acts as the first-class citizen in all machine learning tasks.

In reality, the data that we feed to our machine learning systems must be mathematical objects, such as vectors, matrices, or graphs (in later chapters, we will refer to them as tensors to make it clearer) so that they can consume such data:

Figure 4: Feature vectors are everywhere - they are used in both learning and inferencing stages in predictive analytics

Depending on the available data and feature types, the performance of your predictive model can vacillate dramatically. Therefore, selecting the right features is one of the most important steps before the inferencing takes place. This is called feature engineering, which can be defined as follows:

Tip

Feature engineering

In this process, domain knowledge about the data is used to create only selective or useful features that help prepare the feature vectors to be used so that a machine learning algorithm works.

For example, buying a car; you often see features such as model name, color, horse-power, price, and a number of seats. Thus considering these features, buying a car is not a trivial problem. The general machine learning rule of thumb is that the more data there is, the better the predictive model. However, having more features often creates a mess so the performance degrades drastically: especially if the dataset is high-dimensional and this phenomenon is called the curse of dimensionality. We will see some examples in following sections.

In addition, we also need to know how to represent and use such objects through better representation and transformation. These include some basic (and sometimes advanced maths), statistics, probability, and information theory.

For now, this is enough learning. Let's focus on learning some non-trivial topics of linear algebra that could cover vectors, matrix, graphs, and so on. In Chapter 2, Statistics, Probability and Information Theory for Predictive Modeling, we will learn the basic statistics, probability, and information theory needed for developing PA models. These will be your helping hand as well as basic building blocks for the TensorFlow-based PA throughout subsequent chapters.

Installing and getting started with Python

Python is one of the most popular programming languages. It is a high-level, interpreted, interactive, and object-oriented scripting language. Unfortunately, there has been a big split between Python versions: 2 versus 3, which could make things a bit confusing to newcomers. You can see the major difference between them at https://wiki.python.org/moin/Python2orPython3. But don't worry; I will lead you in the right direction for installing both major versions.

Installing on Windows

On the Python download page at https://www.python.org/downloads/, you'll find the latest release of Python 2 or Python 3 (2.7.13 and 3.6.1, respectively, at the time of writing). You can now select and download the installer (.exe) of either version. Installation is similar to installing other software on Windows.

Let's assume that you have installed both versions and now it's time to add the installation path to the environmental variables.

For doing so click on Start, and then type advanced system settings, then select the View advanced system settings | System Properties | Advanced | Environment Variables... button:

Figure 5: Creating a system variable for Python

Python 3 is usually listed in the User variables for Jason, but Python 2 is listed under the System variables as follows:

Figure 6: Showing how to add the Python installation location as system path

There are a few ways you can remedy this situation. The simplest way is to make changes that can give us access to python for Python 2 and python3 for Python 3. For this, go to the folder where you have installed Python 3. It should be something like this: C:\Users\[username]\AppData\Local\Programs\Python\Python36 by default.

Make a copy of the python.exe file, and rename that copy (not the original) to python3.exe as shown in the following screenshot:

Figure 7: Fixing Python 2 versus Python 3 issue

Open a new Command Prompt (the environmental variables refresh with each new Command Prompt you open), and type python3 --version:

Figure 8: Showing Python 2 and Python 3 version

Fantastic, now you're ready for whatever Python project you want to tackle.

Installing Python on Linux

For those of you who are new to Python, Python 2.7.x and 3.x are automatically installed on Ubuntu. Make sure to check if Python 2 or Python 3 is installed using the following command:

$ python -V >> Python 2.7.13$ which python>> /usr/bin/python

For Python 3.3+ use the following:

$ python3 -V >> Python 3.6.1

If you want a very specific version:

$ sudo apt-cache show python3$ sudo apt-get install python3=3.6.1*

Installing and upgrading PIP (or PIP3)

The pip or pip3 package manager usually comes with your Ubuntu. Make check to sure if pip or pip3 is installed using the following command:

$ pip -V >> pip 9.0.1 from /usr/local/lib/python2.7/dist-packages/pip-9.0.1-py2.7.egg (python 2.7)

For Python 3.3+ use the following:

$ pip3 -V >> pip 1.5.4 from /usr/lib/python3/dist-packages (python 3.4)

It is to be noted that pip version 8.1+ or pip3 version 1.5+ are strongly expected to give better results and smooth computation. If version 8.1+ for pip and 1.5+ for pip3 are not installed, see the following command to either install or upgrade to the latest pip version:

$ sudo apt-get install python-pip python-dev

For Python 3.3+, use the following command:

$ sudo apt-get install python3-pip python-dev

Installing Python on Mac OS

Before installing the Python, you should install a C compiler. The fastest way of doing so is to install the Xcode command-line tools by running the following command:

xcode-select –install

Alternatively, you can also download the full version of Xcode from the Mac App Store.

If you already have Xcode installed on your Mac machine, do not install OSX-GCC-Installer. In combination, you can experience some unwanted issues that are really difficult to diagnose and get rid of.

Although Mac OS comes with a large number of Unix utilities, however, one key component called Homebrew is missing, which can be installed using the following command:

$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Set the Homebrew installation path to the PATH environment variable to the ~/.profile file by issuing the following command:

export PATH=/usr/local/bin:/usr/local/sbin:$PATH

Now, you're ready to install Python 2.7.x or 3.x. For Python 2.7.x issue the following command:

$ brew install python

For Python 3 issue the following command:

$ brew install python3

Installing packages in Python

Additional packages (other than built-in packages) that will be used throughout this book can be installed via the pip installer program. We have already installed Python pip for Python 2.7.x and Python 3.x. Now to install a Python package or module, you can execute pip on the command line (Windows) or terminal (Linux/Mac OS):

$ sudo pip install PackageName # For Python3 use pip3

However, already installed packages can be updated via the --upgrade flag by issuing the following command:

$ sudo pip install PackageName –upgrade # For Python3, use pip3

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

Predictive Analytics with TensorFlow E-Book

Md. Rezaul Karim

About This Book

Who This Book Is For

What You Will Learn

In Detail

Style and approach

Table of Contents

Predictive Analytics with TensorFlow

Predictive Analytics with TensorFlow

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Chapter 1. Basic Python and Linear Algebra for Predictive Analytics

A basic introduction to predictive analytics

Tip

Why predictive analytics?

Working principles of a predictive model

Tip

Tip

Installing and getting started with Python

Installing on Windows

Installing Python on Linux

Installing and upgrading PIP (or PIP3)

Installing Python on Mac OS

Installing packages in Python