R for Data Science Cookbook - Yu-Wei Chiu - E-Book

R for Data Science Cookbook E-Book

Yu-Wei Chiu)

0,0
38,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Over 100 hands-on recipes to effectively solve real-world data problems using the most popular R packages and techniques

About This Book

  • Gain insight into how data scientists collect, process, analyze, and visualize data using some of the most popular R packages
  • Understand how to apply useful data analysis techniques in R for real-world applications
  • An easy-to-follow guide to make the life of data scientist easier with the problems faced while performing data analysis

Who This Book Is For

This book is for those who are already familiar with the basic operation of R, but want to learn how to efficiently and effectively analyze real-world data problems using practical R packages.

What You Will Learn

  • Get to know the functional characteristics of R language
  • Extract, transform, and load data from heterogeneous sources
  • Understand how easily R can confront probability and statistics problems
  • Get simple R instructions to quickly organize and manipulate large datasets
  • Create professional data visualizations and interactive reports
  • Predict user purchase behavior by adopting a classification approach
  • Implement data mining techniques to discover items that are frequently purchased together
  • Group similar text documents by using various clustering methods

In Detail

This cookbook offers a range of data analysis samples in simple and straightforward R code, providing step-by-step resources and time-saving methods to help you solve data problems efficiently.

The first section deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the “dplyr” and “data.table” packages to efficiently process larger data structures. We also focus on “ggplot2” and show you how to create advanced figures for data exploration.

In addition, you will learn how to build an interactive report using the “ggvis” package. Later chapters offer insight into time series analysis on financial data, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction.

By the end of this book, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.

Style and approach

This easy-to-follow guide is full of hands-on examples of data analysis with R. Each topic is fully explained beginning with the core concept, followed by step-by-step practical examples, and concluding with detailed explanations of each concept used.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 455

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

R for Data Science Cookbook
Credits
About the Author
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Sections
Getting ready
How to do it…
How it works…
There's more…
See also
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Functions in R
Introduction
Creating R functions
Getting ready
How to do it...
How it works...
There's more...
Matching arguments
Getting ready
How to do it...
How it works...
There's more...
Understanding environments
Getting ready
How to do it...
How it works...
There's more...
Working with lexical scoping
Getting ready
How to do it...
How it works...
There's more...
Understanding closure
Getting ready
How to do it...
How it works...
There's more...
Performing lazy evaluation
Getting ready
How to do it...
How it works...
There's more...
Creating infix operators
Getting ready
How to do it...
How it works...
There's more...
Using the replacement function
Getting ready
How to do it...
How it works...
There's more...
Handling errors in a function
Getting ready
How to do it...
How it works...
There's more...
The debugging function
Getting ready
How to do it...
How it works...
There's more...
2. Data Extracting, Transforming, and Loading
Introduction
Downloading open data
Getting ready
How to do it…
How it works…
There's more…
Reading and writing CSV files
Getting ready
How to do it…
How it works…
There's more…
Scanning text files
Getting ready
How to do it…
How it works…
There's more…
Working with Excel files
Getting ready
How to do it…
How it works…
Reading data from databases
Getting ready
How to do it…
How it works…
There's more…
Scraping web data
Getting ready
How to do it…
How it works…
There's more…
Accessing Facebook data
Getting ready
How to do it…
How it works…
There's more…
Working with twitteR
Getting ready
How to do it…
How it works…
There's more…
3. Data Preprocessing and Preparation
Introduction
Renaming the data variable
Getting ready
How to do it…
How it works…
There's more…
Converting data types
Getting ready
How to do it…
How it works…
There's more…
Working with the date format
Getting ready
How to do it…
How it works…
There's more…
Adding new records
Getting ready
How to do it…
How it works…
There's more…
Filtering data
Getting ready
How to do it…
How it works…
There's more…
Dropping data
Getting ready
How to do it…
How it works…
There's more…
Merging data
Getting ready
How to do it…
How it works…
There's more…
Sorting data
Getting ready
How to do it…
How it works…
There's more…
Reshaping data
Getting ready
How to do it…
How it works…
There's more…
Detecting missing data
Getting ready
How to do it…
How it works…
There's more…
Imputing missing data
Getting ready
How to do it…
How it works…
There's more…
4. Data Manipulation
Introduction
Enhancing a data.frame with a data.table
Getting ready
How to do it…
How it works…
There's more…
Managing data with a data.table
Getting ready
How to do it…
How it works…
There's more…
Performing fast aggregation with a data.table
Getting ready
How to do it…
How it works…
There's more…
Merging large datasets with a data.table
Getting ready
How to do it…
How it works…
There's more…
Subsetting and slicing data with dplyr
Getting ready
How to do it…
How it works…
There's more…
Sampling data with dplyr
Getting ready
How to do it…
How it works…
There's more…
Selecting columns with dplyr
Getting ready
How to do it…
How it works…
There's more…
Chaining operations in dplyr
Getting ready
How to do it…
How it works…
There's more…
Arranging rows with dplyr
Getting ready
How to do it…
How it works…
There's more…
Eliminating duplicated rows with dplyr
Getting ready
How to do it…
How it works…
There's more…
Adding new columns with dplyr
Getting ready
How to do it…
How it works…
There's more…
Summarizing data with dplyr
Getting ready
How to do it…
How it works…
There's more…
Merging data with dplyr
Getting ready
How to do it…
How it works…
There's more…
5. Visualizing Data with ggplot2
Introduction
Creating basic plots with ggplot2
Getting ready
How to do it…
How it works…
There's more…
Changing aesthetics mapping
Getting ready
How to do it…
How it works…
There's more…
Introducing geometric objects
Getting ready
How to do it…
How it works…
There's more…
Performing transformations
Getting ready
How to do it…
How it works…
There's more…
Adjusting scales
Getting ready
How to do it…
How it works…
See also
Faceting
Getting ready
How to do it…
How it works…
There's more…
Adjusting themes
Getting ready
How to do it…
How it works…
There's more…
Combining plots
Getting ready
How to do it…
How it works…
There's more…
Creating maps
Getting ready
How to do it…
How it works…
There's more…
6. Making Interactive Reports
Introduction
Creating R Markdown reports
Getting ready
How to do it…
How it works…
There's more…
Learning the markdown syntax
Getting ready
How to do it…
How it works…
There's more…
Embedding R code chunks
Getting ready
How to do it…
How it works…
There's more…
Creating interactive graphics with ggvis
Getting ready
How to do it…
How it works…
There's more…
Understanding basic syntax and grammar
Getting ready
How to do it…
How it works…
There's more…
Controlling axes and legends
Getting ready
How to do it…
How it works…
There's more…
Using scales
Getting ready
How to do it …
How it works…
There's more …
Adding interactivity to a ggvis plot
Getting ready
How to do it…
How it works…
There's more…
Creating an R Shiny document
Getting ready
How to do it…
How it works…
There's more…
Publishing an R Shiny report
Getting ready
How to do it…
How it works…
There's more…
7. Simulation from Probability Distributions
Introduction
Generating random samples
Getting ready
How to do it…
How it works…
There's more…
Understanding uniform distributions
Getting ready
How to do it…
How it works…
Generating binomial random variates
Getting ready
How to do it…
How it works…
There's more…
Generating Poisson random variates
Getting ready
How to do it…
How it works…
There's more…
Sampling from a normal distribution
Getting ready
How to do it…
How it works…
There's more…
Sampling from a chi-squared distribution
Getting ready
How to do it…
How it works…
There's more…
Understanding Student's t-distribution
Getting ready
How to do it…
How it works…
There's more…
Sampling from a dataset
Getting ready
How to do it…
How it works…
There's more…
Simulating the stochastic process
Getting ready
How to do it…
How it works…
There's more…
8. Statistical Inference in R
Introduction
Getting confidence intervals
Getting ready
How to do it…
How it works…
There's more…
Performing Z-tests
Getting ready
How to do it…
How it works…
There's more…
Performing student's T-tests
Getting ready
How to do it…
How it works…
There's more…
Conducting exact binomial tests
Getting ready
How to do it…
How it works…
There's more…
Performing Kolmogorov-Smirnov tests
Getting ready
How to do it…
How it works…
There's more…
Working with the Pearson's chi-squared tests
Getting ready
How to do it…
How it works…
There's more…
Understanding the Wilcoxon Rank Sum and Signed Rank tests
Getting ready
How to do it…
How it works…
There's more…
Conducting one-way ANOVA
Getting ready
How to do it…
How it works…
There's more…
Performing two-way ANOVA
Getting ready
How to do it…
How it works…
There's more…
9. Rule and Pattern Mining with R
Introduction
Transforming data into transactions
Getting ready
How to do it…
How it works…
There's more…
Displaying transactions and associations
Getting ready
How to do it…
How it works…
There's more…
Mining associations with the Apriori rule
Getting ready
How to do it…
How it works…
There's more…
Pruning redundant rules
Getting ready
How to do it…
How it works…
There's more…
Visualizing association rules
Getting ready
How to do it…
How it works…
See also
Mining frequent itemsets with Eclat
Getting ready
How to do it…
How it works…
There's more…
Creating transactions with temporal information
Getting ready
How to do it…
How it works…
There's more…
Mining frequent sequential patterns with cSPADE
Getting ready
How to do it…
How it works…
See also
10. Time Series Mining with R
Introduction
Creating time series data
Getting ready
How to do it…
How it works…
There's more…
Plotting a time series object
Getting ready
How to do it…
How it works…
There's more…
Decomposing time series
Getting ready
How to do it…
How it works…
There's more…
Smoothing time series
Getting ready
How to do it…
How it works…
There's more…
Forecasting time series
Getting ready
How to do it…
How it works…
There's more…
Selecting an ARIMA model
Getting ready
How to do it…
How it works…

R for Data Science Cookbook

R for Data Science Cookbook

Copyright © 2016 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2016

Production reference: 1270716

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78439-081-5

www.packtpub.com

Credits

Author

Yu-Wei, Chiu (David Chiu)

Reviewer

Prabhanjan Tattar

Commissioning Editor

Veena Pagare

Acquisition Editor

Tushar Gupta

Content Development Editor

Pooja Mhapsekar

Technical Editor

Madhunikita Sunil Chindarkar

Copy Editor

Priyanka Ravi

Project Coordinator

Suzanne Coutinho

Proofreader

Safis Editing

Indexer

Tejal Daruwale Soni

Graphics

Jason Monteiro

Production Coordinator

Aparna Bhagat

Cover Work

Aparna Bhagat

About the Author

Yu-Wei, Chiu (David Chiu) is the founder of LargitData (www.LargitData.com), a startup company that mainly focuses on providing big data and machine learning products. He has previously worked for Trend Micro as a software engineer, where he was responsible for building big data platforms for business intelligence and customer relationship management systems. In addition to being a start-up entrepreneur and data scientist, he specializes in using Spark and Hadoop to process big data and apply data mining techniques for data analysis. Yu-Wei is also a professional lecturer and has delivered lectures on big data and machine learning in R and Python, and given tech talks at a variety of conferences.

In 2015, Yu-Wei wrote Machine Learning with R Cookbook, Packt Publishing. In 2013, Yu-Wei reviewed Bioinformatics with R Cookbook, Packt Publishing. For more information, visit his personal website at www.ywchiu.com.

I have immense gratitude for my family and friends for supporting and encouraging me to complete this book. I would like to sincerely thank my mother, Ming-Yang Huang (Miranda Huang); my mentor, Man-Kwan Shan; the proofreader of this book, Brendan Fisher; members of LargitData; Data Science Program (DSP); and other friends who have offered their support.

About the Reviewer

Prabhanjan Tattar is currently working as a senior data scientist at Fractal Analytics, Inc. He has 8 years of experience as a statistical analyst. Survival analysis and statistical inference are his main areas of research and interest, and he has published several research papers in peer-reviewed journals, as well as authoring two books on R: R Statistical Application Development by Example, Packt Publishing, and A Course in Statistics with R, Wiley. The R packages gpk, RSADBE, and ACSWR are also maintained by him.

www.PacktPub.com

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Preface

Big data, the Internet of Things, and artificial intelligence have become the hottest technology buzzwords in recent years. Although there are many different terms used to define these technologies, the common concept is that they're all driven by data. Simply having data is not enough; being able to unlock its value is essential. Therefore, data scientists have begun to focus on how to gain insights from raw data.

Data science has become one of the most popular subjects among academic and industry groups. However, as data science is a very broad discipline, learning how to master it can be challenging. A beginner must learn how to prepare, process, aggregate, and visualize data. More advanced techniques involve machine learning, mining various data formats (text, image, and video), and, most importantly, using data to generate business value. The role of a data scientist is challenging and requires a great deal of effort. A successful data scientist requires a useful tool to help solve day-to-day problems.

In this field, the most widely used tool by data scientists is the R language, which is open source and free. Being a machine language, it provides many data processes, learning packages, and visualization functions, allowing users to analyze data on the fly. R helps users quickly perform analysis and execute machine learning algorithms on their dataset without knowing every detail of the sophisticated mathematical models.

R for Data Science Cookbook takes a practical approach to teaching you how to put data science into practice with R. The book has 12 chapters, each of which is introduced by breaking down the topic into several simple recipes. Through the step-by-step instructions in each recipe, you can apply what you have learned from the book by using a variety of packages in R.

The first section of this book deals with how to create R functions to avoid unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL operations for heterogeneous data sources with R packages. An example of data manipulation is provided that illustrates how to use the dplyr and data.table packages to process larger data structures efficiently, while there is a section focusing on ggplot2 that covers how to create advanced figures for data exploration. Also, you will learn how to build an interactive report using the ggvis package.

This book also explains how to use data mining to discover items that are frequently purchased together. Later chapters offer insight into time series analysis on financial data, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, and dimension reduction.

With R for Data Science Cookbook in hand, I can assure you that you will find data science has never been easier.

What this book covers

Chapter 1, Functions in R, describes how to create R functions. This chapter covers the basic composition, environment, and argument matching of an R function. Furthermore, we will look at advanced topics such as closure, functional programming, and how to properly handle errors.

Chapter 2, Data Extracting, Transforming, and Loading, teaches you how to read structured and unstructured data with R. The chapter begins by collecting data from text files. Subsequently, we will look at how to connect R to a database. Lastly, you will learn how to write a web scraper to crawl through unstructured data from a web page or social media site.

Chapter 3, Data Preprocessing and Preparation, introduces you to preparing data ready for analysis. In this chapter, we will cover the data preprocess steps, such as type conversion, adding, filtering, dropping, merging, reshaping, and missing-value imputation, with some basic R functions.

Chapter 4, Data Manipulation, demonstrates how to manipulate data in an efficient and effective manner with the advanced R packages data.table and dplyr. The data.table package exposes you to the possibility of quickly loading and aggregating large amounts of data. The dplyr package provides the ability to manipulate data in SQL-like syntax.

Chapter 5, Visualizing Data with ggplot2, explores using ggplot2 to visualize data. This chapter begins by introducing the basic building blocks of ggplot2. Next, we will cover advanced topics on how to create a more sophisticated graph with ggplot2 functions. Lastly, we will describe how to build a map with ggmap.

Chapter 6, Making Interactive Reports, reveals how to create a professional report with R. In the beginning, the chapter discusses how to write R markdown syntax and embed R code chunks. We will also explore how to add interactive charts to the report with ggvis. Finally, we will look at how to create and publish an R Shiny report.

Chapter 7, Simulation from Probability Distributions, begins with an emphasis on sampling data from different probability distributions. As a concrete example, we will look at how to simulate a stochastic trading process with a probability function.

Chapter 8, Statistical Inference in R, begins with a discussion on point estimation and confidence intervals. Subsequently, you will be introduced to parametric and non-parametric testing methods. Lastly, we will look at how one can use ANOVA to analyze whether the salary basis of an engineer differs based on his job title and location.

Chapter 9, Rule and Pattern Mining with R, exposes you to the common methods used to discover associated items and underlying frequency patterns from transaction data. In this chapter, we use a real-world blog as example data so that you can learn how to perform rule and pattern mining on real-world data.

Chapter 10, Time Series Mining with R, begins by introducing you to creating and manipulating time series from a finance dataset. Subsequently, we will learn how to forecast time series with HoltWinters and ARIMA. For a more concrete example, this chapter reveals how to predict stock prices with ARIMA.

Chapter 11, Supervised Machine Learning, teaches you how to build a model that makes predictions based on labeled training data. You will learn how to use regression models to make sense of numeric relationships and apply a fitted model to data for continuous value prediction. For classification, you will learn how to fit data into a tree-based classifier.

Chapter 12, Unsupervised Machine Learning, introduces you to revealing the hidden structure of unlabeled data. Firstly, we will look at how to group similarly located hotels together with the clustering method. Subsequently, we will learn how to select and extract features on the economy freedom dataset with PCA.

What you need for this book

To follow the book's examples, you will need a computer with access to the Internet and the ability to install the R environment. You can download R from http://www.cran.r-project.org/. Detailed installation instructions are available in the first chapter.

The examples provided in this book were coded and tested with R version 3.2.4 on Microsoft Windows. The examples are likely to work with any recent version of R installed on either Mac OS X or a Unix-like OS.

Who this book is for

R for Data Science Cookbook is intended for those who are already familiar with the basic operation of R, but want to learn how to efficiently and effectively analyze real-world data problems using practical R packages.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it, How it works, There's more, and See also).

To give clear instructions on how to complete a recipe, we use these sections as follows:

Getting ready

This section tells you what to expect in the recipe, and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make the reader more knowledgeable about the recipe.

See also

This section provides helpful links to other useful information for the recipe.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Package and function names are shown as follows: "You can then install and load the package RCurl."

A block of code is set as follows:

> install.packages("RCurl")> library(RCurl)

Any URL is written as follows:

http://data.worldbank.org/topic/economy-and-growth

Variable name, argument name, new terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "In R, a missing value is noted with the symbol NA (not available), and an impossible value is NaN (not a number)."

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.Hover the mouse pointer on the SUPPORT tab at the top.Click on Code Downloads & Errata.Enter the name of the book in the Search box.Select the book for which you're looking to download the code files.Choose from the drop-down menu where you purchased this book from.Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/R-for-Data-Science-Cookbook. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/RforDataScienceCookbook_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Chapter 1. Functions in R

This chapter covers the following topics:

Creating R functionsMatching argumentsUnderstanding environmentsWorking with lexical scopeUnderstanding closurePerforming lazy evaluationCreating infix operatorsUsing the replacement functionHandling errors in a functionThe debugging function

Introduction

R is the mainstream programming language of choice for data scientists. According to polls conducted by KDnuggets, a leading data analysis website, R ranked as the most popular language for analytics, data mining, and data science in the three most recent surveys (2012 to 2014). For many data scientists, R is more than just a programming language because the software also provides an interactive environment that can perform all types of data analysis.

R has many advantages in data manipulation and analysis, and the three most well-known are as follows:

Open Source and free: Using SAS or SPSS requires the purchase of a usage license. One can use R for free, allowing users to easily learn how to implement statistical algorithms from the source code of each function.Powerful data analysis functions: R is famous in the data science community. Many biologists, statisticians, and programmers have wrapped their models into R packages before distributing these packages worldwide through CRAN (Comprehensive R Archive Network). This allows any user to start their analysis project by downloading and installing an R package from CRAN.Easy to use: As R is a self-explanatory, high-level language, programming in R is fairly easy. R users only need to know how to use the R functions and how each parameter works through its powerful documentation. We can easily conduct high-level data analysis without having knowledge of the complex underlying mathematics.

R users will most likely agree that these advantages make complicated data analysis easier and more approachable. Notably, R also allows us to take the role of just a basic user or a developer. For an R user, we only need to know how a function works without requiring detailed knowledge of how it is implemented. Similarly to SPSS, we can perform various types of data analysis through R's interactive shell. On the other hand, as an R developer, we can write their function to create a new model, or they can even wrap implemented functions into a package.

Instead of explaining how to write an R program from scratch, the aim of this book is to cover how to become a developer in R. The main purpose of this chapter is to show users how to define their function to accelerate the analysis procedure. Starting with creating a function, this chapter covers the environment of R, and it explains how to create matching arguments. There is also content on how to perform functional programming in R, how to create advanced functions, such as infix operator and replacement, and how to handle errors and debug functions.

Creating R functions

The R language is a collection of functions; a user can apply built-in functions from various packages to their project, or they can define a function for a particular purpose. In this recipe, we will show you how to create an R function.

Getting ready

If you are new to the R language, you can find a detailed introduction, language history, and functionality on the official R site (http://www.r-project.org/). When you are ready to download and install R, please connect to the comprehensive R archive network (http://cran.r-project.org/).

How to do it...

Perform the following steps in order to create your first R function:

Type the following code on your R console to create your first function:
>addnum<- function(x, y){+ s <- x+y+ return(s)+ }
Execute the addnum user-defined function with the following command:
>addnum (3,7)[1] 10

Or, you can define your function without a return statement:

>addnum2<- function(x, y){+ x+y+ }
Execute the addnum2 user-defined function with the following command:
>addnum2(3,7)[1] 10
You can view the definition of a function by typing its function name:
>addnum2function(x, y){x+y}
Finally, you can use body and formals to examine the body and formal arguments of a function:
>body(addnum2){x + y}>formals(addnum2)$x$y>args(addnum2)function (x, y)NULL

How it works...

R functions are a block of organized and reusable statements, which makes programming less repetitive by allowing you to reuse code. Additionally, by modularizing statements within a function, your R code will become more readable and maintainable.

By following these steps, you can now create two addnum and addnum2 R functions, and you can successfully add two input arguments with either function. In R, the function usually takes the following form:

FunctionName<- function (arg1, arg2) {bodyreturn(expression)}

FunctionName is the name of the function, and arg1 and arg2 are arguments. Inside the curly braces, we can see the function body, where a body is a collection of a valid statement, expression, or assignment. At the bottom of the function, we can find the return statement, which passes expression back to the caller and exits the function.

The addnum function is in standard function syntax, which contains both body and return statement. However, you do not necessarily need to put a return statement at the end of the function. Similar to the addnum2 function, the function itself will return the last expression back to the caller.

If you want to view the composition of the function, simply type the function name on the interactive shell. You can also examine the body and formal arguments of the function further using the body and formal functions. Alternatively, you can use the args function to obtain the argument list of the function.

There's more...

If you want to see the documentation of a function in R, you can use the help function or simply type ? in front of the function name. For example, if you want to examine the documentation of the sum function, you would do the following:

>help(sum)> ?sum

Understanding environments

Besides the function name, body, and formal arguments, the environment is another basic component of a function. In a nutshell, the environment is where R manages and stores different types of variables. Besides the global environment, each function activates its environment whenever a new function is created. In this recipe, we will show you how the environment of each function works.

Getting ready

Ensure that you completed the previous recipes by installing R on your operating system.

How to do it...

Perform the following steps to work with the environment:

First, you can examine the current environment with the environment function:
>environment()<environment: R_GlobalEnv>
You can also browse the global environment with .GlobalEnv and globalenv:
> .GlobalEnv<environment: R_GlobalEnv>>globalenv()<environment: R_GlobalEnv>
You can compare the environment with the identical function:
>identical(globalenv(), environment())[1] TRUE
Furthermore, you can create a new environment as follows:
>myenv<- new.env()>myenv<environment: 0x0000000017e3bb78>
Next, you can find the variables of different environments:
>myenv$x<- 3>ls(myenv)[1] "x">ls()[1] "myenv">xError: object 'x' not found
At this point, you can create an addnum function and use environment to get the environment of the function:
>addnum<- function(x, y){+ x+y+ }>environment(addnum)<environment: R_GlobalEnv>
You can also determine that the environment of a function belongs to the package:
>environment(lm)<environment: namespace:stats>
Moving on, you can print the environment within a function:
>addnum2<- function(x, y){+ print(environment())+ x+y+ }>addnum2(2,3)<environment: 0x0000000018468710>[1] 5
Furthermore, you can compare the environment inside and outside a function:
>addnum3<- function(x, y){+ func1<- function(x){+ print(environment())+ }+ func1(x)+ print(environment())+ x + y+ }>addnum3(2,5)<environment: 0x000000001899beb0><environment: 0x000000001899cc50>[1] 7

How it works...

We can regard an R environment as a place to store and manage variables. That is, whenever we create an object or a function in R, we add an entry to the environment. By default, the top-level environment is the R_GlobalEnv global environment, and we can determine the current environment using the environment function. Then, we can use either .GlobalEnv or globalenv to print the global environment, and we can compare the environment with the identical function.

Besides the global environment, we can actually create our environment and assign variables into the new environment. In the example, we created the myenv environment and then assigned x <- 3 to myenv by placing a dollar sign after the environment name. This allows us to use the ls function to list all variables in myenv and global environment. At this point, we find x in myenv, but we can only find myenv in the global environment.

Moving on, we can determine the environment of a function. By creating a function called addnum, we can use environment to get its environment. As we created the function under global environment, the function obviously belongs to the global environment. On the other hand, when we get the environment of the lm function, we get the package name instead. That means that the lm function is in the namespace of the stat package.

Furthermore, we can print out the current environment inside a function. By invoking the addnum2 function, we can determine that the environment function outputs a different environment name from the global environment. That is, when we create a function, we also create a new environment for the global environment and link a pointer to its parent environment. To further examine this characteristic, we create another addnum3 function with a func1 nested function inside. At this point, we can print out the environment inside func1 and addnum3, and it is possible that they have completely different environments.

There's more...

To get the parent environment, we can use theparent.env function. In the following example, we can see that the parent environment of parentenv is R_GlobalEnv:

>parentenv<- function(){+ e <- environment()+ print(e)+ print(parent.env(e))+ }>parentenv()<environment: 0x0000000019456ed0><environment: R_GlobalEnv>

Working with lexical scoping

Lexical scoping, also known as static binding, determines how a value binds to a free variable in a function. This is a key feature that originated from thescheme functional programming language, and it makes R different from S. In the following recipe, we will show you how lexical scoping works in R.

Getting ready

Ensure that you completed the previous recipes by installing R on your operating system.

How to do it...

Perform the following steps to understand how the scoping rule works:

First, we create an x variable, and we then create a tmpfunc function with x+3 as the return:
>x<- 5>tmpfunc<- function(){+ x + 3+ }>tmpfunc()[1] 8
We then create a function named parentfunc with a childfunc nested function and see what returns when we call the parentfunc function:
>x<- 5>parentfunc<- function(){+ x<- 3+ childfunc<- function(){+ x+ }+ childfunc()+ }>parentfunc()[1] 3
Next, we create an x string, and then we create a localassign function to modify x within the function:
> x <- 'string'>localassign<- function(x){+ x <- 5+ x+ }>localassign(x)[1] 5>x[1] "string"
We can also create another globalassign function but reassign the x variable to 5 using the <<- notation:
> x <- 'string'>gobalassign<- function(x){+ x <<- 5+ x+ }>gobalassign(x)[1] 5>x[1] 5

How it works...

There are two different types of variable binding methods: one is lexical binding, and the other is dynamic binding. Lexical binding is also called static binding in which every binding scope manages variable names and values in the lexical environment. That is, if a variable is lexically bound, it will search the binding of the nearest lexical environment. In contrast to this, dynamic binding keeps all variables and values in the global state. That is, if a variable is dynamically bound, it will bind to the most recently created variable.

To demonstrate how lexical binding works, we first create an x variable and assign 5 to x in the global environment. Then, we can create a function named tmpfunc. The function outputs x + 3 as the return value. Even though we do not assign any value to x within the tmpfunc function, x can still find the value of x as 5 in the global environment.

Next, we create another function named parentfunc. In this function, we assign x to 3 and create a childfunc nested function (a function defined within a function). At the bottom of the parentfunc body, we invoke childfunc as the function return. Here, we find that the function uses the x defined in parentfunc instead of the one defined outside parentfunc. This is because R searches the global environment for a matched symbol name, and then subsequently searches the namespace of packages on the search list.

Moving on, let's take a look at what will return if we create an x variable as a string in the global state and assign an x local variable to 5 within the function. When we invoke the localassign function, we discover that the function returns 5 instead of the string value. On the other hand, if we print out the value of x, we still see string in return. While the local variable and global variable have the same name, the assignment of the function does not alter the value of x in global state. If you need to revise the value of x in the global state, you can use the <<- notation instead.

There's more...

In order to examine the search list (or path) of R, you can type search() to list the search list:

>search()[1] ".GlobalEnv""tools:rstudio"[3] "package:stats" "package:graphics"[5] "package:grDevices" "package:utils"[7] "package:datasets" "package:methods"[9] "Autoloads" "package:base"

Performing lazy evaluation

R functions evaluate arguments lazily; the arguments are evaluated as they are needed. Thus, lazy evaluation reduces the time needed for computation. In the following recipe, we will demonstrate how lazy evaluation works.

Getting ready

Ensure that you completed the previous recipes by installing R on your operating system.

How to do it...

Perform the following steps to see how lazy evaluation works:

First, we create a lazyfunc function with x and y as the argument, but only return x:
>lazyfunc<- function(x, y){+ x+ }>lazyfunc(3)[1] 3
On the other hand, if the function returns the summation of x and y but we do not pass y into the function, an error occurs:
>lazyfunc2<- function(x, y){+ x + y+ }>lazyfunc2(3)Error in lazyfunc2(3) : argument "y" is missing, with no default
We can also specify a default value to the y argument in the function but pass the x argument only to the function:
>lazyfunc4<- function(x, y=2){+ x + y+ }>lazyfunc4(3)[1] 5
In addition to this, we can use lazy evaluation to perform Fibonacci computation in a function:
>fibonacci<- function(n){+ if (n==0)+ return(0)+ if (n==1)+ return(1)+ return(fibonacci(n-1) + fibonacci(n-2))+ }>fibonacci(10)[1] 55

How it works...

R performs a lazy evaluation to evaluate an expression if its value is needed. This type of evaluation strategy has the following three advantages: