R: Recipes for Analysis, Visualization and Machine Learning - Viswa Viswanathan - E-Book

R: Recipes for Analysis, Visualization and Machine Learning E-Book

Viswa Viswanathan

0,0
73,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Get savvy with R language and actualize projects aimed at analysis, visualization and machine learning

About This Book

  • Proficiently analyze data and apply machine learning techniques
  • Generate visualizations, develop interactive visualizations and applications to understand various data exploratory functions in R
  • Construct a predictive model by using a variety of machine learning packages

Who This Book Is For

This Learning Path is ideal for those who have been exposed to R, but have not used it extensively yet. It covers the basics of using R and is written for new and intermediate R users interested in learning. This Learning Path also provides in-depth insights into professional techniques for analysis, visualization, and machine learning with R – it will help you increase your R expertise, regardless of your level of experience.

What You Will Learn

  • Get data into your R environment and prepare it for analysis
  • Perform exploratory data analyses and generate meaningful visualizations of the data
  • Generate various plots in R using the basic R plotting techniques
  • Create presentations and learn the basics of creating apps in R for your audience
  • Create and inspect the transaction dataset, performing association analysis with the Apriori algorithm
  • Visualize associations in various graph formats and find frequent itemset using the ECLAT algorithm
  • Build, tune, and evaluate predictive models with different machine learning packages
  • Incorporate R and Hadoop to solve machine learning problems on big data

In Detail

The R language is a powerful, open source, functional programming language. At its core, R is a statistical programming language that provides impressive tools to analyze data and create high-level graphics. This Learning Path is chock-full of recipes. Literally! It aims to excite you with awesome projects focused on analysis, visualization, and machine learning. We'll start off with data analysis – this will show you ways to use R to generate professional analysis reports. We'll then move on to visualizing our data – this provides you with all the guidance needed to get comfortable with data visualization with R. Finally, we'll move into the world of machine learning – this introduces you to data classification, regression, clustering, association rule mining, and dimension reduction.

This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:

  • R Data Analysis Cookbook by Viswa Viswanathan and Shanthi Viswanathan
  • R Data Visualization Cookbook by Atmajitsinh Gohil
  • Machine Learning with R Cookbook by Yu-Wei, Chiu (David Chiu)

Style and approach

This course creates a smooth learning path that will teach you how to analyze data and create stunning visualizations. The step-by-step instructions provided for each recipe in this comprehensive Learning Path will show you how to create machine learning projects with R.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 1062

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

R: Recipes for Analysis, Visualization and Machine Learning
R: Recipes for Analysis, Visualization and Machine Learning
Credits
Preface
What this learning path covers
What you need for this learning path
Who this learning path is for
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Module 1
1. A Simple Guide to R
Installing packages and getting help in R
Getting ready
How to do it…
How it works…
There's more…
See also
Data types in R
How to do it…
Special values in R
How to do it…
How it works…
Matrices in R
How to do it…
How it works…
Editing a matrix in R
How to do it…
Data frames in R
How to do it…
Editing a data frame in R
How to do it...
Importing data in R
How to do it...
How it works…
Exporting data in R
How to do it…
How it works…
Writing a function in R
Getting ready
How to do it…
How it works…
See also
Writing if else statements in R
How to do it…
How it works…
Basic loops in R
How to do it…
How it works…
Nested loops in R
How to do it…
The apply, lapply, sapply, and tapply functions
How to do it…
How it works…
Using par to beautify a plot in R
How to do it…
How it works…
Saving plots
How to do it…
How it works…
2. Practical Machine Learning with R
Introduction
Downloading and installing R
Getting ready
How to do it...
How it works...
See also
Downloading and installing RStudio
Getting ready
How to do it...
How it works
See also
Installing and loading packages
Getting ready
How to do it...
How it works
See also
Reading and writing data
Getting ready
How to do it...
How it works
See also
Using R to manipulate data
Getting ready
How to do it...
How it works
There's more...
Applying basic statistics
Getting ready
How to do it...
How it works...
There's more...
Visualizing data
Getting ready
How to do it...
How it works...
See also
Getting a dataset for machine learning
Getting ready
How to do it...
How it works...
See also
3. Acquire and Prepare the Ingredients – Your Data
Introduction
Reading data from CSV files
Getting ready
How to do it...
How it works...
There's more...
Handling different column delimiters
Handling column headers/variable names
Handling missing values
Reading strings as characters and not as factors
Reading data directly from a website
Reading XML data
Getting ready
How to do it...
How it works...
There's more...
Extracting HTML table data from a web page
Extracting a single HTML table from a web page
Reading JSON data
Getting ready
How to do it...
How it works...
Reading data from fixed-width formatted files
Getting ready
How to do it...
How it works...
There's more...
Files with headers
Excluding columns from data
Reading data from R files and R libraries
Getting ready
How to do it...
How it works...
There's more...
To save all objects in a session
To selectively save objects in a session
Attaching/detaching R data files to an environment
Listing all datasets in loaded packages
Removing cases with missing values
Getting ready
How to do it...
How it works...
There's more...
Eliminating cases with NA for selected variables
Finding cases that have no missing values
Converting specific values to NA
Excluding NA values from computations
Replacing missing values with the mean
Getting ready
How to do it...
How it works...
There's more...
Imputing random values sampled from nonmissing values
Removing duplicate cases
Getting ready
How to do it...
How it works...
There's more...
Identifying duplicates (without deleting them)
Rescaling a variable to [0,1]
Getting ready
How to do it...
How it works...
There's more...
Rescaling many variables at once
See also…
Normalizing or standardizing data in a data frame
Getting ready
How to do it...
How it works...
There's more...
Standardizing several variables simultaneously
See also…
Binning numerical data
Getting ready
How to do it...
How it works...
There's more...
Creating a specified number of intervals automatically
Creating dummies for categorical variables
Getting ready
How to do it...
How it works...
There's more...
Choosing which variables to create dummies for
4. What's in There? – Exploratory Data Analysis
Introduction
Creating standard data summaries
Getting ready
How to do it...
How it works...
There's more...
Using the str() function for an overview of a data frame
Computing the summary for a single variable
Finding the mean and standard deviation
Extracting a subset of a dataset
Getting ready
How to do it...
How it works...
There's more...
Excluding columns
Selecting based on multiple values
Selecting using logical vector
Splitting a dataset
Getting ready
How to do it...
How it works...
Creating random data partitions
Getting ready
How to do it…
Case 1 – numerical target variable and two partitions
Case 2 – numerical target variable and three partitions
Case 3 – categorical target variable and two partitions
Case 4 – categorical target variable and three partitions
How it works...
There's more...
Using a convenience function for partitioning
Sampling from a set of values
Generating standard plots such as histograms, boxplots, and scatterplots
Getting ready
How to do it...
Histograms
Boxplots
Scatterplots
Scatterplot matrices
How it works...
Histograms
Boxplots
There's more...
Overlay a density plot on a histogram
Overlay a regression line on a scatterplot
Color specific points on a scatterplot
Generating multiple plots on a grid
Getting ready
How to do it...
How it works...
Graphics parameters
See also…
Selecting a graphics device
Getting ready
How to do it...
How it works...
See also…
Creating plots with the lattice package
Getting ready
How to do it...
How it works...
There's more...
Adding flair to your graphs
See also…
Creating plots with the ggplot2 package
Getting ready
How to do it...
How it works...
There's more...
Graph using qplot
Condition plots on continuous numeric variables
See also…
Creating charts that facilitate comparisons
Getting ready
How to do it...
Using base plotting system
Using ggplot2
How it works...
There's more...
Creating boxplots with ggplot2
See also…
Creating charts that help visualize a possible causality
Getting ready
How to do it...
See also…
Creating multivariate plots
Getting ready
How to do it...
How it works...
See also…
5. Where Does It Belong? – Classification
Introduction
Generating error/classification-confusion matrices
Getting ready
How to do it...
How it works...
There's more...
Visualizing the error/classification confusion matrix
Comparing the model's performance for different classes
Generating ROC charts
Getting ready
How to do it...
How it works...
There's more…
Using arbitrary class labels
Building, plotting, and evaluating – classification trees
Getting ready
How to do it...
How it works...
There's more...
Computing raw probabilities
Create the ROC Chart
See also
Using random forest models for classification
Getting ready
How to do it...
How it works...
There's more...
Computing raw probabilities
Generating the ROC chart
Specifying cutoffs for classification
See also...
Classifying using Support Vector Machine
Getting ready
How to do it...
How it works...
There's more...
Controlling scaling of variables
Determining the type of SVM model
Assigning weights to the classes
See also...
Classifying using the Naïve Bayes approach
Getting ready
How to do it...
How it works...
See also...
Classifying using the KNN approach
Getting ready
How to do it...
How it works...
There's more...
Automating the process of running KNN for many k values
Using KNN to compute raw probabilities instead of classifications
Using neural networks for classification
Getting ready
How to do it...
How it works...
There's more...
Exercising greater control over nnet
Generating raw probabilities
Classifying using linear discriminant function analysis
Getting ready
How to do it...
How it works...
There's more...
Using the formula interface for lda
See also ...
Classifying using logistic regression
Getting ready
How to do it...
How it works...
Using AdaBoost to combine classification tree models
Getting ready
How to do it...
How it works...
6. Give Me a Number – Regression
Introduction
Computing the root mean squared error
Getting ready
How to do it...
How it works...
There's more...
Using a convenience function to compute the RMS error
Building KNN models for regression
Getting ready
How to do it...
How it works...
There's more...
Running KNN with cross-validation in place of validation partition
Using a convenience function to run KNN
Using a convenience function to run KNN for multiple k values
See also...
Performing linear regression
Getting ready
How to do it...
How it works...
There's more...
Forcing lm to use a specific factor level as the reference
Using other options in the formula expression for linear models
See also...
Performing variable selection in linear regression
Getting ready
How to do it...
How it works...
See also...
Building regression trees
Getting ready
How to do it...
How it works...
There's more…
Generating regression trees for data with categorical predictors
See also...
Building random forest models for regression
Getting ready
How to do it...
How it works...
There's more...
Controlling forest generation
See also...
Using neural networks for regression
Getting ready
How to do it...
How it works...
See also...
Performing k-fold cross-validation
Getting ready
How to do it...
How it works...
See also...
Performing leave-one-out-cross-validation to limit overfitting
How to do it...
How it works...
See also...
7. Can You Simplify That? – Data Reduction Techniques
Introduction
Performing cluster analysis using K-means clustering
Getting ready
How to do it...
How it works...
There's more...
Use a convenience function to choose a value for K
See also...
Performing cluster analysis using hierarchical clustering
Getting ready
How to do it...
How it works...
See also...
Reducing dimensionality with principal component analysis
Getting ready
How to do it...
How it works...
8. Lessons from History – Time Series Analysis
Introduction
Creating and examining date objects
Getting ready
How to do it...
How it works...
See also...
Operating on date objects
Getting ready
How to do it...
How it works...
See also...
Performing preliminary analyses on time series data
Getting ready
How to do it...
How it works...
See also...
Using time series objects
Getting ready
How to do it...
How it works...
See also...
Decomposing time series
Getting ready
How to do it...
How it works...
See also...
Filtering time series data
Getting ready
How to do it...
How it works...
See also...
Smoothing and forecasting using the Holt-Winters method
Getting ready
How to do it...
How it works...
See also...
Building an automated ARIMA model
Getting ready
How to do it...
How it works...
See also...
9. It's All About Your Connections – Social Network Analysis
Introduction
Downloading social network data using public APIs
Getting ready
How to do it...
How it works...
See also...
Creating adjacency matrices and edge lists
Getting ready
How to do it...
How it works...
See also...
Plotting social network data
Getting ready
How to do it...
How it works...
There's more...
Specifying plotting preferences
Plotting directed graphs
Creating a graph object with weights
Extracting the network as an adjacency matrix from the graph object
Extracting an adjacency matrix with weights
Extracting edge list from graph object
Creating bipartite network graph
Generating projections of a bipartite network
See also...
Computing important network metrics
Getting ready
How to do it...
How it works...
There's more...
Getting edge sequences
Getting immediate and distant neighbors
Adding vertices or nodes
Adding edges
Deleting isolates from a graph
Creating subgraphs
10. Put Your Best Foot Forward – Document and Present Your Analysis
Introduction
Generating reports of your data analysis with R Markdown and knitr
Getting ready
How to do it...
How it works...
There's more...
Using the render function
Adding output options
Creating interactive web applications with shiny
Getting ready
How to do it...
How it works...
There's more...
Adding images
Adding HTML
Adding tab sets
Adding a dynamic UI
Creating single file web application
Creating PDF presentations of your analysis with R Presentation
Getting ready
How to do it...
How it works...
There's more...
Using hyperlinks
Controlling the display
Enhancing the look of the presentation
11. Work Smarter, Not Harder – Efficient and Elegant R Code
Introduction
Exploiting vectorized operations
Getting ready
How to do it...
How it works...
There's more...
Processing entire rows or columns using the apply function
Getting ready
How to do it...
How it works...
There's more...
Using apply on a three-dimensional array
Applying a function to all elements of a collection with lapply and sapply
Getting ready
How to do it...
How it works...
There's more...
Dynamic output
One caution
Applying functions to subsets of a vector
Getting ready
How to do it...
How it works...
There's more...
Applying a function on groups from a data frame
Using the split-apply-combine strategy with plyr
Getting ready
How to do it...
How it works...
There's more...
Adding a new column using transform
Using summarize along with the plyr function
Concatenating the list of data frames into a big data frame
Slicing, dicing, and combining data with data tables
Getting ready
How to do it...
How it works...
There's more...
Adding multiple aggregated columns
Counting groups
Deleting a column
Joining data tables
Using symbols
12. Where in the World? – Geospatial Analysis
Introduction
Downloading and plotting a Google map of an area
Getting ready
How to do it...
How it works...
There's more...
Saving the downloaded map as an image file
Getting a satellite image
Overlaying data on the downloaded Google map
Getting ready
How to do it...
How it works...
Importing ESRI shape files into R
Getting ready
How to do it...
How it works...
Using the sp package to plot geographic data
Getting ready
How to do it...
How it works...
Getting maps from the maps package
Getting ready
How to do it...
How it works...
Creating spatial data frames from regular data frames containing spatial and other data
Getting ready
How to do it...
How it works...
Creating spatial data frames by combining regular data frames with spatial objects
Getting ready
How to do it...
How it works...
Adding variables to an existing spatial data frame
Getting ready
How to do it...
How it works...
13. Playing Nice – Connecting to Other Systems
Introduction
Using Java objects in R
Getting ready
How to do it...
How it works...
There's more...
Checking JVM properties
Displaying available methods
Using JRI to call R functions from Java
Getting ready
How to do it...
How it works...
There's more...
Using Rserve to call R functions from Java
Getting ready
How to do it...
How it works...
There's more...
Retrieving an array from R
Executing R scripts from Java
Getting ready
How to do it...
How it works...
Using the xlsx package to connect to Excel
Getting ready
How to do it...
How it works...
Reading data from relational databases – MySQL
Getting ready
How to do it...
Using RODBC
Using RMySQL
Using RJDBC
How it works...
Using RODBC
Using RMySQL
Using RJDBC
There's more...
Fetching all rows
When the SQL query is long
Reading data from NoSQL databases – MongoDB
Getting ready
How to do it...
How it works...
There's more...
Validating your JSON
2. Module 2
1. Basic and Interactive Plots
Introduction
Introducing a scatter plot
Getting ready
How to do it…
How it works…
Scatter plots with texts, labels, and lines
How to do it…
How it works…
There's more…
See also
Connecting points in a scatter plot
How to do it…
How it works…
There's more…
See also
Generating an interactive scatter plot
Getting ready
How to do it…
How it works…
There's more…
See also
A simple bar plot
How to do it…
How it works…
There's more…
See also
An interactive bar plot
Getting ready
How to do it…
How it works…
There's more…
See also
A simple line plot
Getting ready
How to do it…
How it works…
See also
Line plot to tell an effective story
Getting ready
How to do it…
How it works…
See also
Generating an interactive Gantt/timeline chart in R
Getting ready
How to do it…
See also
Merging histograms
How to do it…
How it works…
Making an interactive bubble plot
How to do it…
How it works…
There's more…
See also
Constructing a waterfall plot in R
Getting ready
How to do it…
2. Heat Maps and Dendrograms
Introduction
Constructing a simple dendrogram
Getting ready
How to do it…
How it works…
There's more...
See also
Creating dendrograms with colors and labels
Getting ready
How to do it…
How it works…
There's more…
Creating a heat map
Getting ready
How to do it…
How it works…
There's more…
See also
Generating a heat map with customized colors
Getting ready
How to do it…
How it works…
Generating an integrated dendrogram and a heat map
How to do it…
There's more…
See also
Creating a three-dimensional heat map and a stereo map
Getting ready
How to do it…
See also
Constructing a tree map in R
Getting ready
How to do it…
How it works…
There's more…
See also
3. Maps
Introduction
Introducing regional maps
Getting ready
How to do it…
How it works…
See also
Introducing choropleth maps
Getting ready
How to do it…
How it works…
There's more…
See also
A guide to contour maps
How to do it…
How it works…
There's more…
See also
Constructing maps with bubbles
Getting ready
How to do it…
How it works...
There's more…
See also
Integrating text with maps
Getting ready
How to do it…
See also
Introducing shapefiles
Getting ready
How to do it…
See also
Creating cartograms
Getting ready
How to do it…
See also
4. The Pie Chart and Its Alternatives
Introduction
Generating a simple pie chart
How to do it…
How it works…
There's more...
See also
Constructing pie charts with labels
Getting ready
How to do it…
How it works…
There's more…
Creating donut plots and interactive plots
Getting rady
How to do it...
How it works…
There's more…
See also
Generating a slope chart
Getting ready
How to do it…
How it works…
See also
Constructing a fan plot
Getting ready
How to do it…
How it works…
5. Adding the Third Dimension
Introduction
Constructing a 3D scatter plot
Getting ready
How to do it…
How it works…
There's more…
See also
Generating a 3D scatter plot with text
Getting ready
How to do it…
How it works…
There's more…
See also
A simple 3D pie chart
Getting ready
How to do it…
How it works…
A simple 3D histogram
Getting ready
How to do it…
How it works…
There's more...
Generating a 3D contour plot
Getting ready
How to do it…
How it works…
Integrating a 3D contour and a surface plot
Getting ready
How to do it…
How it works…
There's more...
See also
Animating a 3D surface plot
Getting ready
How to do it…
How it works…
There's more…
See also
6. Data in Higher Dimensions
Introduction
Constructing a sunflower plot
Getting ready
How to do it…
How it works…
See also
Creating a hexbin plot
Getting ready
How to do it…
How it works…
See also
Generating interactive calendar maps
Getting ready
How to do it…
How it works…
See also
Creating Chernoff faces in R
Getting ready
How to do it…
How it works…
Constructing a coxcomb plot in R
Getting ready
How to do it…
How it works…
See also
Constructing network plots
Getting ready
How to do it…
How it works…
There's more…
See also
Constructing a radial plot
Getting ready
How to do it…
How it works…
There's more…
See also
Generating a very basic pyramid plot
Getting ready
How to do it…
How it works…
See also
7. Visualizing Continuous Data
Introduction
Generating a candlestick plot
Getting ready
How to do it…
How it works…
There's more…
See also
Generating interactive candlestick plots
Getting ready
How to do it…
How it works…
Generating a decomposed time series
How to do it…
How it works…
There's more…
See also
Plotting a regression line
How to do it…
How it works…
See also
Constructing a box and whiskers plot
Getting ready
How to do it…
How it works…
See also
Generating a violin plot
Getting ready
How to do it…
Generating a quantile-quantile plot (QQ plot)
Getting ready
How to do it…
See also
Generating a density plot
Getting ready
How to do it…
How it works…
There's more…
See also
Generating a simple correlation plot
Getting ready
How to do it…
How it works…
There's more…
See also
8. Visualizing Text and XKCD-style Plots
Introduction
Generating a word cloud
Getting ready
How to do it…
How it works…
There's more…
See also
Constructing a word cloud from a document
Getting ready
How to do it…
How it works…
There's more…
See also
Generating a comparison cloud
Getting ready
How to do it…
How it works…
See also
Constructing a correlation plot and a phrase tree
Getting ready
How to do it…
How it works…
There's more…
See also
Generating plots with custom fonts
Getting ready
How to do it…
How it works…
See also
Generating an XKCD-style plot
Getting ready
How to do it…
See also
9. Creating Applications in R
Introduction
Creating animated plots in R
Getting ready
How to do it…
How it works…
Creating a presentation in R
Getting ready
How to do it…
How it works…
There's more…
See also
A basic introduction to API and XML
Getting ready
How to do it…
How it works…
See also
Constructing a bar plot using XML in R
Getting ready
How to do it…
How it works…
See also
Creating a very simple shiny app in R
Getting ready
How to do it…
How it works…
See also
3. Module 3
1. Data Exploration with RMS Titanic
Introduction
Reading a Titanic dataset from a CSV file
Getting ready
How to do it...
How it works...
There's more...
Converting types on character variables
Getting ready
How to do it...
How it works...
There's more...
Detecting missing values
Getting ready
How to do it...
How it works...
There's more...
Imputing missing values
Getting ready
How to do it...
How it works...
There's more...
Exploring and visualizing data
Getting ready
How to do it...
How it works...
There's more...
See also
Predicting passenger survival with a decision tree
Getting ready
How to do it...
How it works...
There's more...
Validating the power of prediction with a confusion matrix
Getting ready
How to do it...
How it works...
There's more...
Assessing performance with the ROC curve
Getting ready
How to do it...
How it works...
See also
2. R and Statistics
Introduction
Understanding data sampling in R
Getting ready
How to do it...
How it works...
See also
Operating a probability distribution in R
Getting ready
How to do it...
How it works...
There's more...
Working with univariate descriptive statistics in R
Getting ready
How to do it...
How it works...
There's more...
Performing correlations and multivariate analysis
Getting ready
How to do it...
How it works...
See also
Operating linear regression and multivariate analysis
Getting ready
How to do it...
How it works...
See also
Conducting an exact binomial test
Getting ready
How to do it...
How it works...
See also
Performing student's t-test
Getting ready
How to do it...
How it works...
See also
Performing the Kolmogorov-Smirnov test
Getting ready
How to do it...
How it works...
See also
Understanding the Wilcoxon Rank Sum and Signed Rank test
Getting ready
How to do it...
How it works...
See also
Working with Pearson's Chi-squared test
Getting ready
How to do it
How it works...
There's more...
Conducting a one-way ANOVA
Getting ready
How to do it...
How it works...
There's more...
Performing a two-way ANOVA
Getting ready
How to do it...
How it works...
See also
3. Understanding Regression Analysis
Introduction
Fitting a linear regression model with lm
Getting ready
How to do it...
How it works...
There's more...
Summarizing linear model fits
Getting ready
How to do it...
How it works...
See also
Using linear regression to predict unknown values
Getting ready
How to do it...
How it works...
See also
Generating a diagnostic plot of a fitted model
Getting ready
How to do it...
How it works...
There's more...
Fitting a polynomial regression model with lm
Getting ready
How to do it...
How it works
There's more...
Fitting a robust linear regression model with rlm
Getting ready
How to do it...
How it works
There's more...
Studying a case of linear regression on SLID data
Getting ready
How to do it...
How it works...
See also
Applying the Gaussian model for generalized linear regression
Getting ready
How to do it...
How it works...
See also
Applying the Poisson model for generalized linear regression
Getting ready
How to do it...
How it works...
See also
Applying the Binomial model for generalized linear regression
Getting ready
How to do it...
How it works...
See also
Fitting a generalized additive model to data
Getting ready
How to do it...
How it works
See also
Visualizing a generalized additive model
Getting ready
How to do it...
How it works...
There's more...
Diagnosing a generalized additive model
Getting ready
How to do it...
How it works...
There's more...
4. Classification (I) – Tree, Lazy, and Probabilistic
Introduction
Preparing the training and testing datasets
Getting ready
How to do it...
How it works...
There's more...
Building a classification model with recursive partitioning trees
Getting ready
How to do it...
How it works...
See also
Visualizing a recursive partitioning tree
Getting ready
How to do it...
How it works...
See also
Measuring the prediction performance of a recursive partitioning tree
Getting ready
How to do it...
How it works...
See also
Pruning a recursive partitioning tree
Getting ready
How to do it...
How it works...
See also
Building a classification model with a conditional inference tree
Getting ready
How to do it...
How it works...
See also
Visualizing a conditional inference tree
Getting ready
How to do it...
How it works...
See also
Measuring the prediction performance of a conditional inference tree
Getting ready
How to do it...
How it works...
See also
Classifying data with the k-nearest neighbor classifier
Getting ready
How to do it...
How it works...
See also
Classifying data with logistic regression
Getting ready
How to do it...
How it works...
See also
Classifying data with the Naïve Bayes classifier
Getting ready
How to do it...
How it works...
See also
5. Classification (II) – Neural Network and SVM
Introduction
Classifying data with a support vector machine
Getting ready
How to do it...
How it works...
See also
Choosing the cost of a support vector machine
Getting ready
How to do it...
How it works...
See also
Visualizing an SVM fit
Getting ready
How to do it...
How it works...
See also
Predicting labels based on a model trained by a support vector machine
Getting ready
How to do it...
How it works...
There's more...
Tuning a support vector machine
Getting ready
How to do it...
How it works...
See also
Training a neural network with neuralnet
Getting ready
How to do it...
How it works...
See also
Visualizing a neural network trained by neuralnet
Getting ready
How to do it...
How it works...
See also
Predicting labels based on a model trained by neuralnet
Getting ready
How to do it...
How it works...
See also
Training a neural network with nnet
Getting ready
How to do it...
How it works...
See also
Predicting labels based on a model trained by nnet
Getting ready
How to do it...
How it works...
See also
6. Model Evaluation
Introduction
Estimating model performance with k-fold cross-validation
Getting ready
How to do it...
How it works...
There's more...
Performing cross-validation with the e1071 package
Getting ready
How to do it...
How it works...
See also
Performing cross-validation with the caret package
Getting ready
How to do it...
How it works...
See also
Ranking the variable importance with the caret package
Getting ready
How to do it...
How it works...
There's more...
Ranking the variable importance with the rminer package
Getting ready
How to do it...
How it works...
See also
Finding highly correlated features with the caret package
Getting ready
How to do it...
How it works...
See also
Selecting features using the caret package
Getting ready
How to do it...
How it works...
See also
Measuring the performance of the regression model
Getting ready
How to do it...
How it works...
There's more…
Measuring prediction performance with a confusion matrix
Getting ready
How to do it...
How it works...
See also
Measuring prediction performance using ROCR
Getting ready
How to do it...
How it works...
See also
Comparing an ROC curve using the caret package
Getting ready
How to do it...
How it works...
See also
Measuring performance differences between models with the caret package
Getting ready
How to do it...
How it works...
See also
7. Ensemble Learning
Introduction
Classifying data with the bagging method
Getting ready
How to do it...
How it works...
There's more...
Performing cross-validation with the bagging method
Getting ready
How to do it...
How it works...
See also
Classifying data with the boosting method
Getting ready
How to do it...
How it works...
There's more...
Performing cross-validation with the boosting method
Getting ready
How to do it...
How it works...
See also
Classifying data with gradient boosting
Getting ready
How to do it...
How it works...
There's more...
Calculating the margins of a classifier
Getting ready
How to do it...
How it works...
See also
Calculating the error evolution of the ensemble method
Getting ready
How to do it...
How it works...
See also
Classifying data with random forest
Getting ready
How to do it...
How it works...
There's more...
Estimating the prediction errors of different classifiers
Getting ready
How to do it...
How it works...
See also
8. Clustering
Introduction
Clustering data with hierarchical clustering
Getting ready
How to do it...
How it works...
There's more...
Cutting trees into clusters
Getting ready
How to do it...
How it works...
There's more...
Clustering data with the k-means method
Getting ready
How to do it...
How it works...
See also
Drawing a bivariate cluster plot
Getting ready
How to do it...
How it works...
There's more
Comparing clustering methods
Getting ready
How to do it...
How it works...
See also
Extracting silhouette information from clustering
Getting ready
How to do it...
How it works...
See also
Obtaining the optimum number of clusters for k-means
Getting ready
How to do it...
How it works...
See also
Clustering data with the density-based method
Getting ready
How to do it...
How it works...
See also
Clustering data with the model-based method
Getting ready
How to do it...
How it works...
See also
Visualizing a dissimilarity matrix
Getting ready
How to do it...
How it works...
There's more...
Validating clusters externally
Getting ready
How to do it...
How it works...
See also
9. Association Analysis and Sequence Mining
Introduction
Transforming data into transactions
Getting ready
How to do it...
How it works...
See also
Displaying transactions and associations
Getting ready
How to do it...
How it works...
See also
Mining associations with the Apriori rule
Getting ready
How to do it...
How it works...
See also
Pruning redundant rules
Getting ready
How to do it...
How it works...
See also
Visualizing association rules
Getting ready
How to do it...
How it works...
See also
Mining frequent itemsets with Eclat
Getting ready
How to do it...
How it works...
See also
Creating transactions with temporal information
Getting ready
How to do it...
How it works...
See also
Mining frequent sequential patterns with cSPADE
Getting ready
How to do it...
How it works...
See also
10. Dimension Reduction
Introduction
Performing feature selection with FSelector
Getting ready
How to do it...
How it works...
See also
Performing dimension reduction with PCA
Getting ready
How to do it...
How it works...
There's more...
Determining the number of principal components using the scree test
Getting ready
How to do it...
How it works...
There's more...
Determining the number of principal components using the Kaiser method
Getting ready
How to do it...
How it works...
See also
Visualizing multivariate data using biplot
Getting ready
How to do it...
How it works...
There's more...
Performing dimension reduction with MDS
Getting ready
How to do it...
How it works...
There's more...
Reducing dimensions with SVD
Getting ready
How to do it...
How it works...
See also
Compressing images with SVD
Getting ready
How to do it...
How it works...
See also
Performing nonlinear dimension reduction with ISOMAP
Getting ready
How to do it...
How it works...
There's more...
Performing nonlinear dimension reduction with Local Linear Embedding
Getting ready
How to do it...
How it works...
See also
11. Big Data Analysis (R and Hadoop)
Introduction
Preparing the RHadoop environment
Getting ready
How to do it...
How it works...
See also
Installing rmr2
Getting ready
How to do it...
How it works...
See also
Installing rhdfs
Getting ready
How to do it...
How it works...
See also
Operating HDFS with rhdfs
Getting ready
How to do it...
How it works...
See also
Implementing a word count problem with RHadoop
Getting ready
How to do it...
How it works...
See also
Comparing the performance between an R MapReduce program and a standard R program
Getting ready
How to do it...
How it works...
See also
Testing and debugging the rmr2 program
Getting ready
How to do it...
How it works...
See also
Installing plyrmr
Getting ready
How to do it...
How it works...
See also
Manipulating data with plyrmr
Getting ready
How to do it...
How it works...
See also
Conducting machine learning with RHadoop
Getting ready
How to do it...
How it works...
See also
Configuring RHadoop clusters on Amazon EMR
Getting ready
How to do it...
How it works...
See also
A. Resources for R and Machine Learning
B. Dataset – Survival of Passengers on the Titanic
A. Bibliography
Index

R: Recipes for Analysis, Visualization and Machine Learning

R: Recipes for Analysis, Visualization and Machine Learning

Get savvy with R language and actualize projects aimed at analysis, visualization and machine learning

A course in three modules

BIRMINGHAM - MUMBAI

R: Recipes for Analysis, Visualization and Machine Learning

Copyright © 2016 Packt Publishing

All rights reserved. No part of this course may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this course to ensure the accuracy of the information presented. However, the information contained in this course is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this course.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this course by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Published on: November 2016

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78728-959-8

www.packtpub.com

Credits

Authors

Viswa Viswanathan

Shanthi Viswanathan

Atmajitsinh Gohil

Yu-Wei, Chiu (David Chiu)

Reviewers

Kenneth D. Graves

Jithin S L

Dipanjan Sarkar

Hang (Harvey) Yu

Sharan Kumar Ravindran

Kannan Kalidasan

Erik M. Rodríguez Pacheco

Arun Padmanabhan

Juan Pablo Zamora

Patric Zhao

Tarek Amr

Abir Datta (data scientist)

Saibal Dutta

Ratanlal Mahanta

(senior quantitative analyst)

Ricky Shi

Jithin S.L

Content Development Editor

Mayur Pawanikar

Production Coordinator

Nilesh Mohite

Preface

Since the release of version 1.0 in 2000, R's popularity as an environment for statistical computing, data analytics, and graphing has grown exponentially. People who have been using spreadsheets and need to perform things that spreadsheet packages cannot readily do, or need to handle larger data volumes than what a spreadsheet program can comfortably handle, are looking to R. Analogously, people using powerful commercial analytics packages are also intrigued by this free and powerful option. As a result, a large number of people are now looking to quickly get things done in R. Being an extensible system, R's functionality is divided across numerous packages with each one exposing large numbers of functions. Even experienced users cannot expect to remember all the details off the top of their head.

Our ability to generate data has improved tremendously with the advent of technology. The data generated has become more complex with the passage of time. The complexity in data forces us to develop new tools and methods to analyze it, interpret it, and communicate with the data. Data visualization empowers us with the necessary skills required to convey the meaning of underlying data. Data visualization is a remarkable intersection of data, science, and art, and this makes it hard to define visualization in a formal way; a simple Google search will prove me right. The Merriam-Webster dictionary defines visualization as "formation of mental visual images”.

Big data has become a popular buzzword across many industries. An increasing number of people have been exposed to the term and are looking at how to leverage big data in their own businesses, to improve sales and profitability. However, collecting, aggregating, and visualizing data is just one part of the equation. Being able to extract useful information from data is another task, and much more challenging.

Traditionally, most researchers perform statistical analysis using historical samples of data. The main downside of this process is that conclusions drawn from statistical analysis are limited. In fact, researchers usually struggle to uncover hidden patterns and unknown correlations from target data. Aside from applying statistical analysis, machine learning has emerged as an alternative. This process yields a more accurate predictive model with the data inserted into a learning algorithm. Through machine learning, the analysis of business operations and processes is not limited to human-scale thinking. Machine-scale analysis enables businesses to discover hidden values in big data.

What this learning path covers

Module 1, R Data Analysis Cookbook, this module, aimed at users who are already exposed to the fundamentals of R, provides ready recipes to perform many important data analytics tasks. Instead of having to search the Web or delve into numerous books when faced with a specific task, people can find the appropriate recipe and get going in a matter of minutes.

Module 2, R Data Visualization Cookbook, in this module you will learn how to generate basic visualizations, understand the limitations and advantages of using certain visualizations, develop interactive visualizations and applications, understand various data exploratory functions in R, and finally learn ways of presenting the data to our audience. This module is aimed at beginners and intermediate users of R who would like to go a step further in using their complex data to convey a very convincing story to their audience.

Module 3, Machine Learning with R Cookbook, this module covers how to perform statistical analysis with machine learning analysis and assessing created models, which are covered in detail later on in the book. The module includes content on learning how to integrate R and Hadoop to create a big data analysis platform. The detailed illustrations provide all the information required to start applying machine learning to individual projects.

What you need for this learning path

Module 1:

We have tested all the code in this module for R versions 3.0.2 (Frisbee Sailing) and 3.1.0 (Spring Dance). When you install or load some of the packages, you may get a warning message to the effect that the code was compiled for a different version, but this will not impact any of the code in this module.

Module 2:

You need to download R to generate the visualizations. You can download and install R using the CRAN website available at http://cran.r-project.org/. All the recipes were written using RStudio. RStudio is an integrated development environment (IDE) for R and can be downloaded from http://www.rstudio.com/products/rstudio/. Many of the visualizations are created using R packages and they are discussed in their respective recipes.

In few of the recipes, I have introduced users to some other open source platforms such as ScapeToad, ArcGIS, and Mapbox. Their installation procedures are outlined in their respective recipes.

Module 3:

To follow the course's examples, you will need a computer with access to the Internet and the ability to install the R environment. You can download R from http://www.cran.r-project.org/. Detailed installation instructions are available in the first chapter.

The examples provided in this book were coded and tested with R Version 3.1.2 on a computer with Microsoft Windows installed on it. These examples should also work with any recent version of R installed on either MAC OSX or a Unix-like OS.

Who this learning path is for

This Learning Path is ideal for those who are already exposed to R, but have not yet used it extensively. This Learning Path will set you up with an extensive insight into professional techniques for analysis, visualization and machine learning with R. Regardless of your level of experience, this course also covers the basics of using R and it is written keeping in mind new and intermediate R users interested in learning.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this course—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the course's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a course, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt course, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this course from your account at http://www.packtpub.com. If you purchased this course elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.Hover the mouse pointer on the SUPPORT tab at the top.Click on Code Downloads & Errata.Enter the name of the course in the Search box.Select the course for which you're looking to download the code files.Choose from the drop-down menu where you purchased this course from.Click on Code Download.

You can also download the code files by clicking on the Code Files button on the course's webpage at the Packt Publishing website. This page can be accessed by entering the course's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the course is also hosted on GitHub at https://github.com/PacktPublishing/R-Recipes-for-Analysis-Visualization-and-Machine-Learning. We also have other code bundles from our rich catalog of books, videos and courses available at https://github.com/PacktPublishing/. Check them out!

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this course. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your course, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this course, you can contact us at <[email protected]>, and we will do our best to address the problem.

Part 1. Module 1

R Data Analysis Cookbook

Over 80 recipes to help you breeze through your data analysis projects using R

Chapter 1. A Simple Guide to R

In this chapter, we will cover the following recipes:

Installing packages and getting help in RData types in RSpecial values in RMatrices in REditing a matrix in RData frames in REditing a data frame in RImporting data in RExporting data in RWriting a function in RWriting if else statements in RBasic loops in RNested loops in RThe apply, lapply, sapply, and tapply functionsUsing par to beautify a plot in RSaving plots

Installing packages and getting help in R

If you are a new user and have never launched R, you must definitely start the learning process by understanding the use of install.packages(), library(), and getting help in R. R comes loaded with some basic packages, but the R community is rapidly growing and active R users are constantly developing new packages for R.

As you read through this cookbook, you will observe that we have used a lot of packages to create different visualizations. So the question now is, how do we know what packages are available in R? In order to keep myself up-to-date with all the changes that are happening in the R community, I diligently follow these blogs:

RbloggerRstudio blog

There are many blogs, websites, and posts that I will refer to as we go through the book. We can view a list of all the packages available in R by going to http://cran.r-project.org/, and also http://www.inside-r.org/packages provides a list as well as a short description of all the packages.

Getting ready

We can start by powering up our R studio, which is an Integrated Development Environment (IDE) for R. If you have not downloaded Rstudio, then I would highly recommend going to http://www.rstudio.com/ and downloading it.

How to do it…

To install a package in R, we will use the install.packages() function. Once we install a package, we will have to load the package in our active R session; if not, we will get an error. The library() function allows us to load the package in R.

How it works…

The install.packages() function comes with some additional arguments but, for the purpose of this book, we will only use the first argument, that is, the name of the package. We can also load multiple packages by using install.packages(c("plotrix", "RColorBrewer")). The name of the package is the only argument we will use in the library() function. Note that you can only load one package at a time with the library() function unlike the install.packages() function.

There's more…

It is hard to remember all the functions and their arguments in R, unless we use them all the time, and we are bound to get errors and warning messages. The best way to learn R is to use the active R community and the help manual available in R.

To understand any function in R or to learn about the various arguments, we can type ?<name of the function>. For example, I can learn about all the arguments related to the plot() function by simply typing ?plot or ?plot() in the R console window. You will now view the help page on the right side of the screen. We can also learn more about the behavior of the function using some of the examples at the bottom of the help page.

If we are still unable to understand the function or its use and implementation, we could go to Google and type the question or use the Stack Overflow website. I am always able to resolve my errors by searching on the Internet. Remember, every problem has a solution, and the possibilities with R are endless.

See also

Flowing Data (http://flowingdata.com/): This is a good resource to learn visualization tools and R. The tutorials are based on an annual subscription.Stack Overflow (http://stackoverflow.com/): This is a great place to get help regarding R functions.Inside-R (http://www.inside-r.org/): This lists all the packages along with a small description.Rblogger (http://www.r-bloggers.com/): This is a great webpage to learn about new R packages, books, tutorials, data scientists, and other data-related jobs.R forge (https://r-forge.r-project.org/).R journal (http://journal.r-project.org/archive/2014-1/).

Exporting data in R