41,99 €
Recipes for emerging developers in R programming and data scientists to simplify their R programming capabilities
This book is for developers who would like to enhance the R programming skills. Basic knowledge of R programming is assumed.
R is a powerful tool for statistics, graphics, and statistical programming. It is used by tens of thousands of people daily to perform serious statistical analyses. It is a free, open source system whose implementation is the collective accomplishment of many intelligent, hard-working people. There are more than 2,000 available add-ons, and R is a serious rival to all commercial statistical packages. The objective of this book is to show how to work with different programming aspects of R. The emerging R developers and data science could have very good programming knowledge but might have limited understanding about R syntax and semantics. Our book will be a platform develop practical solution out of real world problem in scalable fashion and with very good understanding. You will work with various versions of R libraries that are essential for scalable data science solutions. You will learn to work with Input / Output issues when working with relatively larger dataset. At the end of this book readers will also learn how to work with databases from within R and also what and how meta programming helps in developing applications.
This book will be a companion for R programmer and emerging developers in R programming areas. This book will contain recipes related to advanced R programming which will enable users to solve complex problems efficiently.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 229
Veröffentlichungsjahr: 2017
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: October 2017
Production reference: 1051017
ISBN 978-1-78712-905-4
www.packtpub.com
Author
Jaynal Abedin
Copy Editor
Karuna Narayanan
Reviewer
Eilidh Troup
Project Coordinator
Vaidehi Sawant
Commissioning Editor
Aaron Lazar
Proofreader
Safis Editing
Acquisition Editor
Karan Sadawana
Indexer
Francy Puthiry
ContentDevelopmentEditor
Zeeyan Pinheiro
Graphics
Abhinash Sahu
Technical Editor
Vibhuti Gawde
Production Coordinator
Nilesh Mohite
Jaynal Abedin is currently doing research as a PhD student at Unit for Biomedical Data Analytics (BDA) of INSIGHT at the National University of Ireland Galway. His research work is focused on the sports science and sports medicine area in a targeted project with ORRECO --an Irish startup company that provides evidence-based advice to individual athletes through biomarker and GPS data. Before joining INSIGHT as a PhD student he was leading a team of statisticians at an international public health research organization (icddr,b). His primary role there was to develop internal statistical capabilities for researchers who come from various disciplines. He was involved in designing and delivering statistical training to the researchers. He has a bachelors and masters degree in statistics, and he has written two books in R programming: Data Manipulation with R and R Graphs Cookbook (Second Edition) with Packt. His current research interests are predictive modeling to predict probable injury of an athlete and scoring extremeness of multivariate data to get an early signal of an anomaly. Moreover, he has an excellent reputation as a freelance R programmer and statistician in an online platform such as upwork.
Eilidh J. Troup is an applications consultant at EPCC in the University of Edinburgh. She is interested in making High Performance Computing accessible to new users, particularly to biologists. She works on a variety of software projects, including the Simple Parallel R INTerface (SPRINT).
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787129055.
If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
What this book covers
What you need for this book
Who this book is for
Sections
Getting ready
How to do it…
How it works…
There's more…
See also
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
Installing and Configuring R and its Libraries
Introduction
Installing and configuring base R in Windows
Getting ready
How to do it...
How it works...
There's more...
See also
Installing and configuring base R in Linux
Getting ready
How to do it...
There's more...
See also
Installing and configuring RStudio IDE in Windows
Getting ready
How to do it…
How it works…
There's more…
See also
Installing and configuring RStudio IDE in Linux
Getting ready
How to do it…
How it works...
See also
Installing and configuring R tools for Visual Studio in Windows
Getting ready
How to do it…
How it works…
See also
Installing R libraries from various sources
Getting ready
How to do it…
The ggplot2 library
The devtools library
Installing a library from GitHub
Installing a library from the BioC repository
How it works…
There's more…
See also
Installing a specific version of R library
Getting ready
How to do it…
How it works…
Data Structures in R
Introduction
Creating a vector and accessing its properties
Getting ready
How to do it…
How it works…
There's more…
See also
Creating a matrix and accessing its properties
Getting ready
How to do it…
How it works…
There's more…
See also
Creating a data frame and accessing its properties
Getting ready
How to do it…
How it works…
There's more…
See also
Creating an array and accessing its properties
Getting ready
How to do it…
How it works…
There's more…
See also
Creating a list from a combination of vector, matrix, and data frame
Getting ready
How to do it…
How it works…
There's more…
See also
Converting a matrix to a data frame and a data frame to a matrix
Getting ready
How to do it…
How it works…
There's more…
See also
Writing Customized Functions
Introduction
Writing your first function in R
Getting ready
How to do it…
How it works…
There’s more…
Writing functions with multiple arguments and use of default values
Getting ready
How to do it…
How it works…
There’s more…
Handling data types in input arguments
Getting ready
How to do it…
How it works…
There’s more…
Producing different output types and return values
Getting ready
How to do it…
How it works…
There’s more…
Making a recursive call to a function
Getting ready
How to do it…
How it works…
There’s more…
Handling exceptions and error messages
Getting ready
How to do it…
How it works…
There’s more…
See also
Conditional and Iterative Operations
Introduction
The use of the if conditional statement
Getting ready
How to do it…
How it works…
There’s more…
The use of the if…else conditional operator
Getting ready
How to do it…
How it works…
There’s more…
The use of the ifelse vectorised conditional operator
Getting ready
How to do it…
How it works…
There’s more…
See also
Writing a function using the switch operator
Getting ready
How to do it…
How it works…
There’s more…
Comparing the performance of switch and series of the if…else statements
Getting ready
How to do it…
How it works…
Using for loop for iterations
Getting ready
How to do it…
How it works…
Vectorised operation versus for loop
Getting ready
How to do it…
How it works…
R Objects and Classes
Introduction
Defining a new S3 class
Getting ready
How to do it…
How it works…
There's more…
See also
Defining methods for the S3 class
Getting ready
How to do it…
How it works…
There's more…
See also
Creating a generic function and defining a method for the S3 class
Getting ready
How to do it…
How it works…
There's more…
Defining a new S4 class
Getting ready
How to do it…
How it works…
There's more…
See also
Defining methods for an S4 class
Getting ready
How to do it…
How it works…
There's more…
See also
Creating a function to return an object of the S4 class
Getting ready
How to do it…
How it works…
There's more…
See also
Querying, Filtering, and Summarizing
Introduction
Using the pipe operator for data processing
Getting ready
How to do it…
How it works…
There's more…
See also
Efficient and fast summarization using the dplyr verbs
Getting ready
How to do it…
How it works…
There's more…
See also
Using the customized function within the dplyr verbs
Getting ready
How to do it...
How it works...
There's more…
See also
Using the select verb for data processing
Getting ready
How to do it...
How it works...
There's more…
See also
Using the filter verb for data processing
Getting ready
How to do it...
How it works...
Using the arrange verb for data processing
Getting ready
How to do it...
How it works...
There's more…
Using mutate for data processing
Getting ready
How to do it...
How it works...
There's more…
Using summarise to summarize dataset
Getting ready
How to do it...
How it works...
R for Text Processing
Introduction
Extracting unstructured text data from a plain web page
Getting ready
How to do it…
How it works…
There’s more…
Extracting text data from an HTML page
Getting ready
How to do it…
How it works…
There's more…
Extracting text data from an HTML page using the XML library
Getting ready
How to do it…
How it works…
Extracting text data from PubMed
Getting ready
How to do it…
How it works…
There's more…
Importing unstructured text data from a plain text file
Getting ready
How to do it…
How it works…
There's more…
Importing plain text data from a PDF file
Getting ready
How to do it…
How it works…
There's more…
Pre-processing text data for topic modeling and sentiment analysis
Getting ready
How to do it…
How it works…
There's more…
Creating a word cloud to explore unstructured text data
Getting ready
How to do it…
How it works…
There's more…
Using regular expression in text processing
Getting ready
How to do it…
How it works…
There's more…
R and Databases
Introduction
Installing the PostgreSQL database server
Getting ready
How to do it…
How it works…
Creating a new user in the PostgreSQL database server
Getting ready
How to do it…
How it works…
There's more…
See also
Creating a table in a database in PostgreSQL
Getting ready
How to do it...
How it works…
There's more…
Creating a dataset in PostgreSQL from R
Getting ready
How to do it...
How it works...
Interacting with the PostgreSQL database from R
Getting ready
How to do it…
How it works…
There's more...
Creating and interacting with the SQLite database from R
Getting ready
How to do it...
How it works...
There's more…
Parallel Processing in R
Introduction
Creating an XDF file from CSV input
Getting ready
How to do it…
How it works…
There's more…
See also
Processing data as a chunk
Getting ready
How to do it…
How it works…
There's more…
See also
Comparing computation time with data frame and XDF
Getting ready
How to do it…
How it works…
There's more…
Linear regression with larger data (rxFastLiner)
Getting ready
How to do it…
How it works…
There's more…
See also
R is a high-level statistical language and is widely used among statisticians and data miners for developing statistical applications. The objective of this book is to show the readers how to work with different programming aspects of R. Emerging R developers and data scientists may have very good programming knowledge but their understanding of the R syntax and semantics could be limited. This book will be a platform to develop practical solutions to real-world problems in a scalable fashion and with very good understanding of R. You will work with various versions of R libraries that are essential for scalable data science solutions. You will learn to work with I/O issues when working with the relatively larger datasets. By the end of this book, you will also learn how to work with databases from within R.
Chapter 1, Installing and Configuring R and its Libraries, covers the recipes on how to install and configure R and its libraries on Windows and Linux platforms.
Chapter 2, Data Structures in R, covers the data structures of R and how to create and access their properties and various operations related to a specific data structure.
Chapter 3, Writing Customized Functions, guides you to create your own customized functions and understand how to work with various data types within a function and access an output of a function.
Chapter 4, Conditional and Iterative Operations, covers the use of conditional and repetition operators in R.
Chapter 5, R Objects and Classes, guides you in creating the S3 and S4 objects and how to use them in a variety of applications.
Chapter 6, Querying, Filtering, and Summarizing, introduces you to the dplyr library for data processing. This is one of the most popular libraries in R for data processing.
Chapter 7, R for Text Processing, covers the recipes related to working with unstructured text data.
Chapter 8, R and Databases, helps you learn how to interact with a database management system to develop statistical applications.
Chapter 9, Parallel Processing in R, uses the parallel processing approach to solve memory problems with a larger dataset and uses the XDF file for processing.
This book requires the following to be set up:
Base R
RStudio IDE
Microsoft R Client
R tools for Visual Studio
PostgreSQL database server
This book is for developers who would like to enhance their R programming skills. Some basic knowledge of R programming is assumed.
In this book, you will find several headings that appear frequently (Getting ready, How to do it…, How it works…, There's more…, and See also). To give clear instructions on how to complete a recipe, we use these sections as follows:
This section tells you what to expect in the recipe, and describes how to set up any software or any preliminary settings required for the recipe.
This section contains the steps required to follow the recipe.
This section usually consists of a detailed explanation of what happened in the previous section.
This section consists of additional information about the recipe in order to make the reader more knowledgeable about the recipe.
This section provides helpful links to other useful information for the recipe.
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning. Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Execute the following code to create numeric and logical vectors using thec()function" A block of code is set as follows:
cVec <- c("Cricket", "Football", "Basketball", "Rugby")
Any command-line input or output is written as follows:
lsb_release -a
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "On this web page, you will seebaseunder theSubdirectoriescategory."
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important to us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply email [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps:
Log in or register to our website using your email address and password.
Hover the mouse pointer on the
SUPPORT
tab at the top.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on
Code Download
.
You can also download the code files by clicking on the Code Files button on the book's web page at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account. Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Modern-R-Programming-Cookbook/. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
In this chapter, you will be exposed to the recipes on how to install and configure R and its libraries in Windows and Linux platforms. Specifically, you will come across the following recipes:
Installing and configuring base R in Windows
Installing and configuring base R in Linux
Installing and configuring RStudio IDE in Windows
Installing and configuring RStudio IDE in Linux
Installing and configuring R tools for Visual Studio in Windows
Installing R libraries from various sources
Installing a specific version of R library
It is expected that you have basic knowledge of installing software on the platform that you use regularly. However, it is helpful to have an overview of some technical aspects of installing R and the integrated development environment (IDE) such as RStudio. This chapter will serve as a reference point for the technical issues during installation and configuration of R and its libraries for Windows and Linux platforms. Each of the recipes contains detailed description with the necessary screenshots so that you feel very comfortable, even if you are not in front of your computer. After completing all the recipes of this chapter, you will be confident enough to install R and its libraries in Windows and Linux platforms. So, let’s get started.
In this recipe, you will learn to install R in the Windows platform and we will address other necessary configuration issues that are related to the Windows operating system.
To start this recipe, you will need to know your version of the Windows operating system, for example, whether it is Windows 7, 8, or 10. Also, you need to know specific architecture, such as 32-bit or 64-bit. Once you know the particulars of the operating system, you are ready to install base R by following the steps in the next section. Another thing that you need to check is whether R is already installed on your computer or not. You can easily check by inspecting the Start menu or task bar or desktop icon. Now, let's assume that you did not install R previously in your computer and this is the first time you are going to do so.
Once you get detailed information about your operating system, you will need to download the executable file for the Windows operating system. To find the latest version of R, you can visit the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/. On this web page, you will get to know the latest release of R and other related information.
To download the latest release of R for Windows, perform the following steps:
Visit
https://cran.r-project.org/bin/windows/
, which will show you the following screen:
On this web page, you will see
base
under the
Subdirectories
category. As a first-time user of R, you need to download the executable file from this link.
Once you click on
base
, you will end up on this page,
https://cran.r-project.org/bin/windows/base/
as shown in the following screenshot:
Now, click on
Download R 3.x.x (This version number might differ because during preparation of this recipe, the version was 3.3.3) for Windows
. The executable file will be downloaded into your local storage.
Once you have downloaded the executable file, you are ready to install it on your computer. Perform the following steps and choose the options accordingly. The screenshots are for your convenience only:
Go to the folder where you have stored the executable file that you have downloaded, by following the instructions in the previous section.
If you have administrator privileges, then just double-click on the executable file, or right-click on the mouse and select
Run as administrator
:
In Windows 7, it will show a notification with the title
User Access Control
. In this case, you must choose
Yes
to proceed.
The first thing it will ask you to do is choose a language, as shown in the following screenshot. Once you select your chosen language, click on the
OK
button. The R Setup Wizard will appear, and you will see various options to select in different pages:
Click on
Next >
on this page to proceed with the installation:
On this page, the
GNU GENERAL PUBLIC LICENSE
information will be displayed, and you will be asked to read and understand the licensing agreement and then click on the
Next >
button:
Now, you will be asked to choose the destination folder where you want to store the installation files. Usually, by default, it shows
C:/Program Files/
, and sometimes, it also shows one more level
C:/Program Files/R/R-x.x.x
. You can keep the default location, or you can choose a separate destination based on your choice. Once you have selected the destination, click on the
Next >
button:
At this stage, you will get the option to select the component that you want to install. You can choose either
32-bit Files
only,
64-bit Files
