Data Science  with Python - Rohan Chopra - E-Book

Data Science with Python E-Book

Rohan Chopra

0,0
32,36 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Leverage the power of the Python data science libraries and advanced machine learning techniques to analyse large unstructured datasets and predict the occurrence of a particular future event.




Key Features



  • Explore the depths of data science, from data collection through to visualization


  • Learn pandas, scikit-learn, and Matplotlib in detail


  • Study various data science algorithms using real-world datasets



Book Description



Data Science with Python begins by introducing you to data science and teaches you to install the packages you need to create a data science coding environment. You will learn three major techniques in machine learning: unsupervised learning, supervised learning, and reinforcement learning. You will also explore basic classification and regression techniques, such as support vector machines, decision trees, and logistic regression.






As you make your way through chapters, you will study the basic functions, data structures, and syntax of the Python language that are used to handle large datasets with ease. You will learn about NumPy and pandas libraries for matrix calculations and data manipulation, study how to use Matplotlib to create highly customizable visualizations, and apply the boosting algorithm XGBoost to make predictions. In the concluding chapters, you will explore convolutional neural networks (CNNs), deep learning algorithms used to predict what is in an image. You will also understand how to feed human sentences to a neural network, make the model process contextual information, and create human language processing systems to predict the outcome.






By the end of this book, you will be able to understand and implement any new data science algorithm and have the confidence to experiment with tools or libraries other than those covered in the book.




What you will learn



  • Pre-process data to make it ready to use for machine learning


  • Create data visualizations with Matplotlib


  • Use scikit-learn to perform dimension reduction using principal component analysis (PCA)


  • Solve classification and regression problems


  • Get predictions using the XGBoost library


  • Process images and create machine learning models to decode them


  • Process human language for prediction and classification


  • Use TensorBoard to monitor training metrics in real time


  • Find the best hyperparameters for your model with AutoML



Who this book is for



Data Science with Python is designed for data analysts, data scientists, database engineers, and business analysts who want to move towards using Python and machine learning techniques to analyze data and predict outcomes. Basic knowledge of Python and data analytics will prove beneficial to understand the various concepts explained through this book.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 358

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Data Science with Python

Combine Python with machine learning principles to discover hidden patterns in raw data

Rohan Chopra

Aaron England

Mohamed Noordeen Alaudeen

Data Science with Python

Copyright © 2019 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Authors: Rohan Chopra, Aaron England and Mohamed Noordeen Alaudeen

Technical Reviewer: Santiago Riviriego Esbert

Managing Editor: Aritro Ghosh

Acquisitions Editors: Kunal Sawant and Koushik Sen

Production Editor: Samita Warang

Editorial Board: David Barnes, Mayank Bhardwaj, Ewan Buckingham, Simon Cox, Mahesh Dhyani, Taabish Khan, Manasa Kumar, Alex Mazonowicz, Douglas Paterson, Dominic Pereira, Shiny Poojary, Erol Staveley, Ankita Thakur, and Jonathan Wray

First Published: July 2019

Production Reference: 1090719

ISBN: 978-1-83855-286-2

Published by Packt Publishing Ltd.

Livery Place, 35 Livery Street

Birmingham B3 2PB, UK

Table of Contents

Preface   i

About the Book   ii

Chapter 1: Introduction to Data Science and Data Pre-Processing   1

Introduction   2

Python Libraries   2

Roadmap for Building Machine Learning Models   4

Data Representation   5

Independent and Target Variables   6

Exercise 1: Loading a Sample Dataset and Creating the Feature Matrix and Target Matrix   8

Data Cleaning    14

Exercise 2: Removing Missing Data   15

Exercise 3: Imputing Missing Data   19

Exercise 4: Finding and Removing Outliers in Data   23

Data Integration   26

Exercise 5: Integrating Data   26

Data Transformation   28

Handling Categorical Data   29

Exercise 6: Simple Replacement of Categorical Data with a Number   30

Exercise 7: Converting Categorical Data to Numerical Data Using Label Encoding   33

Exercise 8: Converting Categorical Data to Numerical Data Using One-Hot Encoding   36

Data in Different Scales   40

Exercise 9: Implementing Scaling Using the Standard Scaler Method   41

Exercise 10: Implementing Scaling Using the MinMax Scaler Method   42

Data Discretization   44

Exercise 11: Discretization of Continuous Data    45

Train and Test Data   47

Exercise 12: Splitting Data into Train and Test Sets   47

Activity 1: Pre-Processing Using the Bank Marketing Subscription Dataset   50

Supervised Learning   51

Unsupervised Learning    52

Reinforcement Learning   52

Performance Metrics   53

Summary   56

Chapter 2: Data Visualization   59

Introduction   60

Functional Approach   60

Exercise 13: Functional Approach – Line Plot   61

Exercise 14: Functional Approach – Add a Second Line to the Line Plot   65

Activity 2: Line Plot   68

Exercise 15: Creating a Bar Plot   70

Activity 3: Bar Plot   74

Exercise 16: Functional Approach – Histogram   75

Exercise 17: Functional Approach – Box-and-Whisker plot   78

Exercise 18: Scatterplot   81

Object-Oriented Approach Using Subplots   84

Exercise 19: Single Line Plot using Subplots   84

Exercise 20: Multiple Line Plots Using Subplots   88

Activity 4: Multiple Plot Types Using Subplots   92

Summary   93

Chapter 3:Introduction to Machine Learning via Scikit-Learn   95

Introduction   96

Introduction to Linear and Logistic Regression   96

Simple Linear Regression   97

Exercise 21: Preparing Data for a Linear Regression Model   97

Exercise 22: Fitting a Simple Linear Regression Model and Determining the Intercept and Coefficient   100

Exercise 23: Generating Predictions and Evaluating the Performance of a Simple Linear Regression Model   102

Multiple Linear Regression   105

Exercise 24: Fitting a Multiple Linear Regression Model and Determining the Intercept and Coefficients   106

Activity 5: Generating Predictions and Evaluating the Performance of a Multiple Linear Regression Model   107

Logistic Regression   108

Exercise 25: Fitting a Logistic Regression Model and Determining the Intercept and Coefficients   109

Exercise 26: Generating Predictions and Evaluating the Performance of a Logistic Regression Model   111

Exercise 27: Tuning the Hyperparameters of a Multiple Logistic Regression Model   113

Activity 6: Generating Predictions and Evaluating Performance of a Tuned Logistic Regression Model   114

Max Margin Classification Using SVMs   115

Exercise 28: Preparing Data for the Support Vector Classifier (SVC) Model   118

Exercise 29: Tuning the SVC Model Using Grid Search   119

Activity 7: Generating Predictions and Evaluating the Performance of the SVC Grid Search Model   120

Decision Trees   121

Activity 8: Preparing Data for a Decision Tree Classifier   122

Exercise 30: Tuning a Decision Tree Classifier Using Grid Search   123

Exercise 31: Programmatically Extracting Tuned Hyperparameters from a Decision Tree Classifier Grid Search Model   124

Activity 9: Generating Predictions and Evaluating the Performance of a Decision Tree Classifier Model   127

Random Forests   127

Exercise 32: Preparing Data for a Random Forest Regressor   128

Activity 10: Tuning a Random Forest Regressor   129

Exercise 33: Programmatically Extracting Tuned Hyperparameters and Determining Feature Importance from a Random Forest Regressor Grid Search Model   130

Activity 11: Generating Predictions and Evaluating the Performance of a Tuned Random Forest Regressor Model   132

Summary   132

Chapter 4: Dimensionality Reduction and Unsupervised Learning   135

Introduction   136

Hierarchical Cluster Analysis (HCA)   137

Exercise 34: Building an HCA Model   139

Exercise 35: Plotting an HCA Model and Assigning Predictions   141

K-means Clustering   143

Exercise 36: Fitting k-means Model and Assigning Predictions   146

Activity 12: Ensemble k-means Clustering and Calculating Predictions   147

Exercise 37: Calculating Mean Inertia by n_clusters   148

Exercise 38: Plotting Mean Inertia by n_clusters   150

Principal Component Analysis (PCA)   152

Exercise 39: Fitting a PCA Model   152

Exercise 40: Choosing n_components using Threshold of Explained Variance   153

Activity 13: Evaluating Mean Inertia by Cluster after PCA Transformation   154

Exercise 41: Visual Comparison of Inertia by n_clusters   155

Supervised Data Compression using Linear Discriminant Analysis (LDA)   158

Exercise 42: Fitting LDA Model   158

Exercise 43: Using LDA Transformed Components in Classification Model   160

Summary   162

Chapter 5: Mastering Structured Data   165

Introduction   166

Boosting Algorithms   166

Gradient Boosting Machine (GBM)   166

XGBoost (Extreme Gradient Boosting)   167

Exercise 44: Using the XGBoost library to Perform Classification   167

XGBoost Library   169

Controlling Model Overfitting   174

Handling Imbalanced Datasets   175

Activity 14: Training and Predicting the Income of a Person   177

External Memory Usage   178

Cross-validation   179

Exercise 45: Using Cross-validation to Find the Best Hyperparameters   180

Saving and Loading a Model   184

Exercise 46: Creating a Python Pcript that Predicts Based on Real-time Input   185

Activity 15: Predicting the Loss of Customers   187

Neural Networks   188

What Is a Neural Network?   189

Optimization Algorithms   190

Hyperparameters   191

Keras   194

Exercise 47: Installing the Keras library for Python and Using it to Perform Classification   194

Keras Library   197

Exercise 48: Predicting Avocado Price Using Neural Networks   199

Categorical Variables   201

One-hot Encoding   202

Entity Embedding   203

Exercise 49: Predicting Avocado Price Using Entity Embedding   204

Activity 16: Predicting a Customer's Purchase Amount   208

Summary   209

Chapter 6: Decoding Images   211

Introduction   212

Images   212

Exercise 50: Classify MNIST Using a Fully Connected Neural Network   213

Convolutional Neural Networks   217

Convolutional Layer   217

Pooling Layer   219

Adam Optimizer   220

Cross-entropy Loss   221

Exercise 51: Classify MNIST Using a CNN   222

Regularization   225

Dropout Layer   225

L1 and L2 Regularization   226

Batch Normalization   227

Exercise 52: Improving Image Classification Using Regularization Using CIFAR-10 images    228

Image Data Preprocessing   233

Normalization   233

Converting to Grayscale   233

Getting All Images to the Same Size   234

Other Useful Image Operations   237

Activity 17: Predict if an Image Is of a Cat or a Dog   239

Data Augmentation   241

Generators   242

Exercise 53: Classify CIFAR-10 Images with Image Augmentation   245

Activity 18: Identifying and Augmenting an Image   249

Summary   250

Chapter 7: Processing Human Language   253

Introduction   254

Text Data Processing    254

Regular Expressions   255

Exercise 54: Using RegEx for String Cleaning    256

Basic Feature Extraction   260

Text Preprocessing   263

Exercise 55: Preprocessing the IMDB Movie Review Dataset   267

Text Processing   271

Exercise 56: Creating Word Embeddings Using Gensim   277

Activity 19: Predicting Sentiments of Movie Reviews   281

Recurrent Neural Networks (RNNs)   282

LSTMs   284

Exercise 57: Performing Sentiment Analysis Using LSTM    284

Activity 20: Predicting Sentiments from Tweets   289

Summary   290

Chapter 8   293

Tips and Tricks of the Trade   293

Introduction   294

Transfer Learning   294

Transfer Learning for Image Data   296

Exercise 58: Using InceptionV3 to Compare and Classify Images   297

Activity 21: Classifying Images using InceptionV3   302

Useful Tools and Tips   304

Train, Development, and Test Datasets   304

Working with Unprocessed Datasets   305

pandas Profiling   307

TensorBoard   309

AutoML   314

Exercise 59: Get a Well-Performing Network Using Auto-Keras   314

Model Visualization Using Keras   317

Activity 22: Using Transfer Learning to Predict Images   319

Summary   320

Appendix   323

Preface

About

This section briefly introduces the authors, what this book covers, the technical skills you'll need to get started, and the hardware and software requirements required to complete all of the included activities and exercises.

Chapter 1

Introduction to Data Science and Data Pre-Processing

Learning Objectives

By the end of this chapter, you will be able to:

Use various Python machine learning librariesHandle missing data and deal with outliersPerform data integration to bring together data from different sourcesPerform data transformation to convert data into a machine-readable formScale data to avoid problems with values of different magnitudesSplit data into train and test datasetsDescribe the different types of machine learningDescribe the different performance measures of a machine learning model

This chapter introduces data science and covers the various processes included in the building of machine learning models, with a particular focus on pre-processing.