23,99 €
Data is coming at us faster, dirtier, and at an ever increasing rate. The necessity to handle many, complex statistical analysis projects is hitting statisticians and analysts across the globe. This book will show you how to deal with it like never before, thus providing an edge and improving productivity.
"Learning RStudio for R Statistical Computing" will teach you how to quickly and efficiently create and manage statistical analysis projects, import data, develop R scripts, and generate reports and graphics. R developers will learn about package development, coding principles, and version control with RStudio.
This book will help you to learn and understand RStudio features to effectively perform statistical analysis and reporting, code editing, and R development.
The book starts with a quick introduction where you will learn to load data, perform simple analysis, plot a graph, and generate automatic reports. You will then be able to explore the available features for effective coding, graphical analysis, R project management, report generation, and even project management.
"Learning RStudio for R Statistical Computing" is stuffed with feature-rich and easy-to-understand examples, through step-by-step instructions helping you to quickly master the most popular IDE for R development.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 152
Veröffentlichungsjahr: 2012
Copyright © 2012 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: December 2012
Production Reference: 1171212
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78216-060-1
www.packtpub.com
Cover Image by Tarun Singh (<[email protected]>)
Authors
Mark P.J. van der Loo
Edwin de Jonge
Reviewers
Mzabalazo Z. Ngwenya
Yihui Xie
Acquisition Editor
Kartikey Pandey
Commissioning Editor
Meeta Rajani
Technical Editors
Prasad Dalvi
Pooja Pande
Project Coordinator
Esha Thakker
Proofreader
Maria Gould
Indexer
Monica Ajmera Mehta
Production Coordinator
Prachali Bhiwandkar
Cover Work
Prachali Bhiwandkar
Mark P.J. van der Loo obtained his PhD from the Institute for Theoretical Chemistry at the University of Nijmegen (The Netherlands). Since 2007 he has worked at the statistical methodology department of the Dutch official statistics office (Statistics Netherlands). His research interests include automated data cleaning methods and statistical computing. At Statistics Netherlands he is responsible for the local R center of expertise, which supports and educates users on statistical computing with R. Mark has been teaching R for several years and (co)authored a number of R packages that are available via CRAN: editrules, deducorrect, rspa, and extremevalues. A list of publications can be found at www.markvanderloo.eu.
Edwin de Jonge has worked for more than 15 years at the Dutch official statistics office (Statistics Netherlands). Having a background in theoretical and computational solid state physics (MSc.) he started working at the statistical computing department. Currently he works with the statistical methodology department. His research interests include data visualization, data analysis, and statistical computing. He has trained over 150 people in the workshop Graphical Analysis with R. Edwin has (co)authored several R packages that are available via CRAN: tabplot, tabplotd3, ffbase, whisker, editrules, and deducorrect.
Mzabalazo Z. Ngwenya has worked extensively in the field of consulting and currently works as a biometrician.
Yihui Xie (http://yihui.name) is currently a PhD student in the Department of Statistics, Iowa State University. His research interests include interactive statistical graphics, statistical computing, and reproducible research. He is the author of several R packages such as animation, cranvas, formatR, Rd2roxygen, and knitr, among which the animation package won the 2009 John M. Chambers Statistical Software Award (American Statistical Association). In 2006 he founded the Capital of Statistics (http://cos.name), which has grown into a large online community on statistics in China. He also initiated the first Chinese R conference in 2008 and has been organizing R conferences in China since then. He is a co-author of the book Reproducible Research with R (Chapman & Hall), which is under development.
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
http://PacktLib.PacktPub.com
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.
Learning RStudio for R Statistical Computing is a comprehensive guide to the popular open source integrated development environment for R. In six chapters, we will show you how to perform reproducible statistical research with RStudio. The book covers automatic report generating, advanced R code editing, project files management, data visualization, and more.
Chapter 1, Getting Started: We install R and RStudio on Windows, Mac, and Linux and guide you through your first reproducible research project.
Chapter 2, Writing R Scripts and the R Console: A thorough discussion of RStudio's code editing and execution features, both interactively in the console and in scripts.
Chapter 3, Viewing and Plotting Data: RStudio facilitates inspection of R objects and visualization of data. Learn how to create interactive plots with the manipulate package.
Chapter 4, Managing R Projects: This chapter discusses RStudio's project file management features and version control integration. A short introduction to version control is provided as well.
Chapter 5, Generating Reports: Learn how to automatically transform your data analysis into a beautifully laid out HTML page or a PDF report, making it truly reproducible. RStudio offers several ways to generate reports, all of which are discussed thoroughly in this chapter.
Chapter 6, Using RStudio Effectively: This chapter is reserved for R developers who need to get the most out of RStudio—advanced code editing, code navigation, and package development are discussed in this chapter.
All you need for this book is a reasonably modern computer that allows you to run R and RStudio. This book is not about learning statistics, and although we do not use any advanced statistics in this book, some basic statistical knowledge is assumed. We also expect you to have some experience with R. Although the book is not meant to teach R, some of the less commonly used features of R will be explained in detail where appropriate.
The book is aimed at R developers and analysts who wish to do R statistical development while taking advantage of RStudio functionality to ease their development efforts. Familiarity with R is assumed. Those who want to get started with R development using RStudio will also find the book useful. Even if you already use R but want to create reproducible statistical analysis projects or extend R with self-written packages, this book shows how to quickly achieve this using RStudio.
In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.
Code words in text are shown as follows: "On the bottom right-hand side it shows the first 25 records of the resulting data.frame."
A block of code is set as follows:
Any command-line input or output is written as follows:
New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "These packages can be updated by clicking on Check for Updates".
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.
Some of the examples used in this book use GIT version control. You can download all extensive examples from https://github.com/rstudiobook.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.
This chapter shows how to obtain R and RStudio. An introduction to the concepts of reproducible research will be given. We will first show a simple RStudio session that already results in a simple, fully reproducible report. If you have ever had to analyze data for work, study, or a research project you'd have probably run into a situation where you ended up with a messy kludge of temporary files, scripts, and intermediate results that are almost impossible to untangle. If this sounds familiar, you probably also had to rewrite pieces of your report while debugging your analyses, or when receiving updates of your data sets. Re-running calculations, and re-inserting figures, tables, and results can take a lot of time. Moreover, as a project turns more and more into a spaghetti of files and folders, reproducing exactly what you did becomes harder and harder. Needless to say, things can become even more difficult when collaborating with a number of people on such projects.
RStudio™ is a free and open source tool that makes it easier for you to do the following:
RStudio runs on all the major operating systems, including Windows, Linux, and Mac OS X. Additionally, it can be used to run R on a remote web server. In that case, RStudio's interface will run in your browser.
This book is aimed at beginning and moderate R users who want to get the most out of R and RStudio. In the coming chapters we will cover most of RStudio's features, and emphasize some best practices in statistical data analyses. A few words about R: R is a free software tool for statistical analyses comprised of the R programming language and the R environment. Here, free means not only free of charge (as in free beer) but also free as in freedom. That is, you are allowed to download and use R, inspect or alter its source code, and redistribute it as you like. Note that this freedom is in fact a requirement to perform truly reproducible research, as it allows one, in principle, to check exactly how data is processed in a certain project, down to R's source code itself.
R is distributed via the Comprehensive R Archive Network, a network of servers around the world from where you can download R and its extension packages. You can access it via www.r-project.org. There are a few other sites offering extension package repositories; the most noteworthy are bioconductor (www.bioconductor.org) and the Omega project for statistical computing(www.omegahat.org).
The R environment is a so-called repl, which stands for a read-evaluate-print loop. That is, it offers a text-based interface where you can enter R commands. After a command is entered, the R engine processes it (evaluation) and possibly prints a result to the screen. Alternatively (and more commonly), the commands can be stored in a text file to be run by R.
Users who are accustomed to point-and-click interfaces for using statistical functionality may find the first encounter with such an interface daunting, and to be honest, the learning curve for R can be steep at times. However, in order to make work reproducible, it is unavoidable to store the steps of your analyses as source code. Moreover, being a true programming language makes R a much more versatile and powerful tool than any point-and-click software that only offers a predefined functionality.
Fortunately for us, writing code is nothing new and over the past decades, many good ideas have been developed in the software industry to make coding and code management a lot easier. RStudio implements many of those ideas for R users. Important tips for your maintaining of your R installation are mentioned as follows:
Like R, RStudio is a free and open source project. Founded by JJ Allaire, RStudio is also a company that sells services related to their open source product, such as consulting and training.
RStudio is an Integrated Development Environment (IDE) for R. The term IDE comes from the software industry and refers to a tool that makes it easy to develop applications in one or more programming languages. Typical IDEs offer tools to easily write and document code, compile and perform tests, and offer integration with a version control tool.
