Learning RStudio for R Statistical Computing - Mark P. J. van der Loo - E-Book

Learning RStudio for R Statistical Computing E-Book

Mark P. J. van der Loo

0,0
23,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Data is coming at us faster, dirtier, and at an ever increasing rate. The necessity to handle many, complex statistical analysis projects is hitting statisticians and analysts across the globe. This book will show you how to deal with it like never before, thus providing an edge and improving productivity.
"Learning RStudio for R Statistical Computing" will teach you how to quickly and efficiently create and manage statistical analysis projects, import data, develop R scripts, and generate reports and graphics. R developers will learn about package development, coding principles, and version control with RStudio.
This book will help you to learn and understand RStudio features to effectively perform statistical analysis and reporting, code editing, and R development.
The book starts with a quick introduction where you will learn to load data, perform simple analysis, plot a graph, and generate automatic reports. You will then be able to explore the available features for effective coding, graphical analysis, R project management, report generation, and even project management.
"Learning RStudio for R Statistical Computing" is stuffed with feature-rich and easy-to-understand examples, through step-by-step instructions helping you to quickly master the most popular IDE for R development.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 152

Veröffentlichungsjahr: 2012

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Learning RStudio for R Statistical Computing
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Getting Started
RStudio at a glance
Installing RStudio
Installing R
Installing R on Windows and Mac OS X
Installing R on Linux
Building R from source
Building R using Windows
Installing RStudio
Installing RStudio Server
Installing R packages
Overview: A first R session
Keyboard shortcuts
Getting help
What if I uninstall RStudio?
Further reading
Summary
2. Writing R Scripts and the R Console
Moving around RStudio
Keyboard shortcuts to move around RStudio
Features of the R console
Executing commands
Command history
Command completion
Completion of functions and arguments
Object completion
Completion of filenames
Bracket and quote completion
Keyboard shortcuts for the console
Features of the source editor
A few words on code quality
Editing R scripts
Keyboard shortcuts for file navigation
Keyboard shortcuts for code editing
Syntax highlighting
Adjusting the syntax highlighting theme
Indenting code
Commenting code
Find and replace
Folding, sectioning, and navigation
Code folding
Keyboard shortcuts for code folding
Code navigation
Keyboard shortcuts for code navigation
Code sections
Code execution
Keyboard shortcuts for code execution
Summary
3. Viewing and Plotting Data
Viewing data and the object browser
Plotting
Zoom
Export
Navigation
Interactive plotting with the manipulate package
The manipulate function
Using more options of manipulate
Advanced topic: retrieving plot parameters from manipulate
Summary
4. Managing R Projects
R projects
Creating an R project
Directory structure and file manipulations
Version control
Introduction to version control
Installing GIT or Subversion
Version control for single-person projects
GIT
Subversion
Working with a team
Further reading
Summary
5. Generating Reports
Prerequisites for report generation
Notebook
Notebook options
Publishing a notebook
R Markdown and Rhtml
Workflow for R Markdown
An extended example
An introduction to Markdown syntax
Rhtml
Code chunks
Chunk syntax and options
RMarkdown: .Rmd files
Rhtml: .Rhtml files
LaTeX: .Rnw files
RStudio's chunk support and keyboard shortcuts
LaTeX
Further reading
Summary
6. Using RStudio Effectively
Additional features for function writing
Function extraction
Function navigation
Introduction to package writing
Prerequisites
Basic structure and workflow
Creating the package directory structure
Documenting functions with Roxygen2
Building your package with devtools
More about the devtools package
Publishing your package
Summary
Index

Learning RStudio for R Statistical Computing

Learning RStudio for R Statistical Computing

Copyright © 2012 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: December 2012

Production Reference: 1171212

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78216-060-1

www.packtpub.com

Cover Image by Tarun Singh (<[email protected]>)

Credits

Authors

Mark P.J. van der Loo 

Edwin de Jonge

Reviewers

Mzabalazo Z. Ngwenya

Yihui Xie

Acquisition Editor

Kartikey Pandey

Commissioning Editor

Meeta Rajani

Technical Editors

Prasad Dalvi

Pooja Pande

Project Coordinator

Esha Thakker

Proofreader

Maria Gould

Indexer

Monica Ajmera Mehta

Production Coordinator

Prachali Bhiwandkar

Cover Work

Prachali Bhiwandkar

About the Authors

Mark P.J. van der Loo obtained his PhD from the Institute for Theoretical Chemistry at the University of Nijmegen (The Netherlands). Since 2007 he has worked at the statistical methodology department of the Dutch official statistics office (Statistics Netherlands). His research interests include automated data cleaning methods and statistical computing. At Statistics Netherlands he is responsible for the local R center of expertise, which supports and educates users on statistical computing with R. Mark has been teaching R for several years and (co)authored a number of R packages that are available via CRAN: editrules, deducorrect, rspa, and extremevalues. A list of publications can be found at www.markvanderloo.eu.

Edwin de Jonge has worked for more than 15 years at the Dutch official statistics office (Statistics Netherlands). Having a background in theoretical and computational solid state physics (MSc.) he started working at the statistical computing department. Currently he works with the statistical methodology department. His research interests include data visualization, data analysis, and statistical computing. He has trained over 150 people in the workshop Graphical Analysis with R. Edwin has (co)authored several R packages that are available via CRAN: tabplot, tabplotd3, ffbase, whisker, editrules, and deducorrect.

About the Reviewers

Mzabalazo Z. Ngwenya has worked extensively in the field of consulting and currently works as a biometrician.

Yihui Xie (http://yihui.name) is currently a PhD student in the Department of Statistics, Iowa State University. His research interests include interactive statistical graphics, statistical computing, and reproducible research. He is the author of several R packages such as animation, cranvas, formatR, Rd2roxygen, and knitr, among which the animation package won the 2009 John M. Chambers Statistical Software Award (American Statistical Association). In 2006 he founded the Capital of Statistics (http://cos.name), which has grown into a large online community on statistics in China. He also initiated the first Chinese R conference in 2008 and has been organizing R conferences in China since then. He is a co-author of the book Reproducible Research with R (Chapman & Hall), which is under development.

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books. 

Why Subscribe?

Fully searchable across every book published by PacktCopy and paste, print and bookmark contentOn demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

Learning RStudio for R Statistical Computing is a comprehensive guide to the popular open source integrated development environment for R. In six chapters, we will show you how to perform reproducible statistical research with RStudio. The book covers automatic report generating, advanced R code editing, project files management, data visualization, and more.

What this book covers

Chapter 1, Getting Started: We install R and RStudio on Windows, Mac, and Linux and guide you through your first reproducible research project.

Chapter 2, Writing R Scripts and the R Console: A thorough discussion of RStudio's code editing and execution features, both interactively in the console and in scripts.

Chapter 3, Viewing and Plotting Data: RStudio facilitates inspection of R objects and visualization of data. Learn how to create interactive plots with the manipulate package.

Chapter 4, Managing R Projects: This chapter discusses RStudio's project file management features and version control integration. A short introduction to version control is provided as well.

Chapter 5, Generating Reports: Learn how to automatically transform your data analysis into a beautifully laid out HTML page or a PDF report, making it truly reproducible. RStudio offers several ways to generate reports, all of which are discussed thoroughly in this chapter.

Chapter 6, Using RStudio Effectively: This chapter is reserved for R developers who need to get the most out of RStudio—advanced code editing, code navigation, and package development are discussed in this chapter.

What you need for this book

All you need for this book is a reasonably modern computer that allows you to run R and RStudio. This book is not about learning statistics, and although we do not use any advanced statistics in this book, some basic statistical knowledge is assumed. We also expect you to have some experience with R. Although the book is not meant to teach R, some of the less commonly used features of R will be explained in detail where appropriate.

Who this book is for

The book is aimed at R developers and analysts who wish to do R statistical development while taking advantage of RStudio functionality to ease their development efforts. Familiarity with R is assumed. Those who want to get started with R development using RStudio will also find the book useful. Even if you already use R but want to create reproducible statistical analysis projects or extend R with self-written packages, this book shows how to quickly achieve this using RStudio.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "On the bottom right-hand side it shows the first 25 records of the resulting data.frame."

A block of code is set as follows:

meanLength <- mean(abalone$Length) model <- lm(Whole.weight ~ Length + Sex, data=abalone) x <- 1:3 cv <- function(x, na.rm=FALSE){ sd(x, na.rm=na.rm)/mean(x, na.rm=na.rm) }

Any command-line input or output is written as follows:

form <- as.formula(paste("Length", "Whole.weight", sep="~"))plot(x=form, data=abalone)

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "These packages can be updated by clicking on Check for Updates".

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Some of the examples used in this book use GIT version control. You can download all extensive examples from https://github.com/rstudiobook.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Getting Started

This chapter shows how to obtain R and RStudio. An introduction to the concepts of reproducible research will be given. We will first show a simple RStudio session that already results in a simple, fully reproducible report. If you have ever had to analyze data for work, study, or a research project you'd have probably run into a situation where you ended up with a messy kludge of temporary files, scripts, and intermediate results that are almost impossible to untangle. If this sounds familiar, you probably also had to rewrite pieces of your report while debugging your analyses, or when receiving updates of your data sets. Re-running calculations, and re-inserting figures, tables, and results can take a lot of time. Moreover, as a project turns more and more into a spaghetti of files and folders, reproducing exactly what you did becomes harder and harder. Needless to say, things can become even more difficult when collaborating with a number of people on such projects.

RStudio™ is a free and open source tool that makes it easier for you to do the following:

Work with R and R's graphics interactivelyOrganize your code and maintain multiple projectsMake your research reproducibleMaintain the packages in your R installationCreate and share your reportsShare your code and collaborate with other users

RStudio runs on all the major operating systems, including Windows, Linux, and Mac OS X. Additionally, it can be used to run R on a remote web server. In that case, RStudio's interface will run in your browser.

This book is aimed at beginning and moderate R users who want to get the most out of R and RStudio. In the coming chapters we will cover most of RStudio's features, and emphasize some best practices in statistical data analyses. A few words about R: R is a free software tool for statistical analyses comprised of the R programming language and the R environment. Here, free means not only free of charge (as in free beer) but also free as in freedom. That is, you are allowed to download and use R, inspect or alter its source code, and redistribute it as you like. Note that this freedom is in fact a requirement to perform truly reproducible research, as it allows one, in principle, to check exactly how data is processed in a certain project, down to R's source code itself.

R is distributed via the Comprehensive R Archive Network, a network of servers around the world from where you can download R and its extension packages. You can access it via www.r-project.org. There are a few other sites offering extension package repositories; the most noteworthy are bioconductor (www.bioconductor.org) and the Omega project for statistical computing(www.omegahat.org).

The R environment is a so-called repl, which stands for a read-evaluate-print loop. That is, it offers a text-based interface where you can enter R commands. After a command is entered, the R engine processes it (evaluation) and possibly prints a result to the screen. Alternatively (and more commonly), the commands can be stored in a text file to be run by R.

Users who are accustomed to point-and-click interfaces for using statistical functionality may find the first encounter with such an interface daunting, and to be honest, the learning curve for R can be steep at times. However, in order to make work reproducible, it is unavoidable to store the steps of your analyses as source code. Moreover, being a true programming language makes R a much more versatile and powerful tool than any point-and-click software that only offers a predefined functionality.

Fortunately for us, writing code is nothing new and over the past decades, many good ideas have been developed in the software industry to make coding and code management a lot easier. RStudio implements many of those ideas for R users. Important tips for your maintaining of your R installation are mentioned as follows:

Always use the latest, stable version. This is the version likely to have the least bugs in the older functionality. You can read about the latest features by reading the news file, for example by running View(news()) from the R command line. See the Installing R section for an easier way to install R.Frequently update your installed packages. This is simply done by running the update.packages() command from your R console.

RStudio at a glance

Like R, RStudio is a free and open source project. Founded by JJ Allaire, RStudio is also a company that sells services related to their open source product, such as consulting and training.

RStudio is an Integrated Development Environment (IDE) for R. The term IDE comes from the software industry and refers to a tool that makes it easy to develop applications in one or more programming languages. Typical IDEs offer tools to easily write and document code, compile and perform tests, and offer integration with a version control tool.