39,59 €
RStudio helps you to manage small to large projects by giving you a multi-functional integrated development environment, combined with the power and flexibility of the R programming language, which is becoming the bridge language of data science for developers and analyst worldwide. Mastering the use of RStudio will help you to solve real-world data problems.
This book begins by guiding you through the installation of RStudio and explaining the user interface step by step. From there, the next logical step is to use this knowledge to improve your data analysis workflow. We will do this by building up our toolbox to create interactive reports and graphs or even web applications with Shiny. To collaborate with others, we will explore how to use Git and GitHub and how to build your own packages to ensure top quality results. Finally, we put it all together in an interactive dashboard written with R.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 276
Veröffentlichungsjahr: 2015
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2015
Production reference: 1251115
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-254-7
www.packtpub.com
Authors
Julian Hillebrand
Maximilian H. Nierhoff
Reviewer
Nicholas A. Yager
Commissioning Editor
Kartikey Pandey
Acquisition Editor
Tushar Gupta
Content Development Editor
Anish Dhurat
Technical Editor
Mohita Vyas
Copy Editor
Angad Singh
Project Coordinator
Harshal Ved
Proofreader
Safis Editing
Indexer
Rekha Nair
Graphics
Abhinash Sahu
Production Coordinator
Melwyn Dsa
Cover Work
Melwyn Dsa
Julian Hillebrand studied international business marketing management at the Cologne Business School in Germany. His interest in the current questions of the business world showed him the importance of data-driven decision-making. Because of the growing size of available inputs, he soon realized the great potential of R for analyzing and visualizing data. This fascination made him start a blog project about using data science, especially for social media data analysis, which can be found at http://thinktostart.com/. He managed to combine his hands-on tutorials with his marketing and business knowledge.
Julian is always looking for new technological opportunities and is also interested in the emerging field of machine learning. He completed several digital learning offerings to take his data science capabilities to the next level.
Maximilian H. Nierhoff is an analyst for online marketing with more than half a decade of experience in managing online marketing channels and digital analytics. After studying economics, cultural activities, and creative industries, he started building online marketing departments and realized quickly that future marketing forces should also have programming knowledge. He has always been passionate about everything related to the topics of data, marketing, and customer journey analysis. Therefore, he has specialized in using R since then, which is his first-choice language for programming, data science, and analysis capabilities. He considers himself a lifelong learner and is an avid user of MOOCs, which are about R and digital analytics.
Nicholas A. Yager is a biostatistician and software developer researching statistical genomics, image analysis, and infectious disease epidemiology. With an education in biochemistry and biostatistics, his experience in analyzing cutting-edge genomics data and simulating complex biological systems has given him an in-depth understanding of scientific computing and data analysis. Currently, Nicholas works for a personalized medicine company, designing medical informatics systems for next-generation personalized cancer tests. Aside from this book, Nicholas has reviewed Unsupervised Learning with R, Packt Publishing.
I would like to thank my friends, Lauren and Matt, and my mentor, Dr. Gregg Hartvigsen, for their help in reviewing this book.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Data analysis, visualization, and the handling of complex statistical issues was reserved just for universities and very few organizations for a long time. In fact, an easy-to-use and free environment to make the concept of data analysis available to a broader audience was not available.
But in the early nineties, R saw the light of day, and since then, it has been on a meteoric rise. R has shaped the landscape of data science in recent years like no other programing language. Because of its open source nature, it became widely known and is often referred to as the lingua franca of data analysis. Another reason for this huge success is the availability of a sophisticated Integrated Development Environment (IDE) named RStudio.
The development of RStudio started in 2010, and now, it is the de facto, go-to IDE for everybody working with R. The mission statement of RStudio is "to provide the most widely used open source and enterprise-ready professional software for the R statistical computing environment."
But RStudio offers more than just a handy way to create R scripts; it grew to a real ecosystem by providing a variety of functionalities like package, application, interactive reporting creation, and more. Walking this way, RStudio has managed to bring data analysis to a broader audience. And because of its continuous desire to innovate R and its possibilities, it can be seen as a further development of the R language. RStudio combines the strong statistical power of R, the community, and open source spirit with cutting edge technologies of user interface development.
This made RStudio more than just a tool for statisticians; it became the platform for everybody who wants to generate insights from data and share them with others.
Therefore, we will hereafter guide you to develop, communicate, and collaborate with R by mastering RStudio.
Chapter 1, The RStudio IDE – an Overview, describes how to install RStudio, and gives a general overview of its user interface.
Chapter 2, Communicating Your Work with R Markdown, shows how to create R Markdown documents and presentations with the help of the concept of reproducible research.
Chapter 3, R Lesson I – Graphics System, gives an introduction to the landscape of plotting packages in R and the basic process of plot creation with different packages for interactive graphs.
Chapter 4, Shiny – a Web-app Framework for R, describes how to create web applications with the Shiny framework by explaining the basic concept of reactive programming.
Chapter 5, Interactive Documents with R Markdown, explains how to create interactive R Markdown documents with the Shiny framework and other R packages.
Chapter 6, Creating Professional Dashboards with R and Shiny, introduces the concept of dashboards, and how to build a professional dashboard with the shinydashboard package.
Chapter 7, Package Development in RStudio, describes the basic process of package development in R, and how to create R packages with RStudio.
Chapter 8, Collaborating with Git and GitHub, shows the fundamentals of Git and GitHub, and how to use them with RStudio.
Chapter 9, R for your Organization – Managing the RStudio Server, describes how to install R, RStudio, and the Shiny Server on a cloud server to create a fully flexible programming environment.
Chapter 10, Extending RStudio and Your Knowledge of R, explains where you can find additional resources to improve your work with R and RStudio.
To fully apply the knowledge learned in this book, you will need a computer with access to the Internet, and the ability to install the R environment as well as the RStudio IDE. The first chapter will guide you through this process.
This book is aimed at R developers and analysts who wish to work on R statistical development while taking advantage of RStudio's functionality to ease their development efforts. Experience with R programming is assumed, as well as being comfortable with R's basic structures and a number of functions.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
The number of users adopting the R programming language has been increasing faster and faster in the last few years. It is not just used for smaller analyses, but also for bigger projects, and often, several people collaborating on the same project. The functions of the R console are limited when it comes to managing a lot of files, or when we want to work with version control systems. This is the reason, in combination with the increasing adoption rate, why a need for a better development environment arose. To serve this need, a team of R fans began to develop an integrated development environment (IDE) to make it easier to work on bigger projects and to collaborate with others. This IDE has the name, RStudio. We will introduce you to this fantastic software and show you how to take your R programming to the next level. Mastering the use of RStudio will help you solve real-world problems faster and more effectively.
In this chapter, we will introduce you to the RStudio interface and build the foundation for more advanced topics in the following chapters.
This chapter covers the following topics:
Before installing RStudio, you should install R on your computer. RStudio will then automatically search for your R installation.
RStudio is based on the R framework and it requires, at least, R version 2.11.1, but we highly recommend that you install the latest version. The latest version of R is 3.2.2, as of September 2015.
We assume that most readers are using Windows or Mac OS systems. The installation of R is pretty simple. Just go to http://cran.rstudio.com, download the proper version of R for your system, and install it using the default setting.
We would like to leave more space to talk about installing R on different Linux distributions. As there are a huge number of different Linux distributions out there, we will focus, in this book, on the most used one: Ubuntu.
CRAN hosts repositories for Debian and Ubuntu. To install the latest version of R, you should add the CRAN repository to your system.
The supported releases are: Utopic Unicorn (14.10), Trusty Tahr (14.04; LTS), Precise Pangolin (12.04; LTS), and Lucid Lynx (10.04; LTS). However, only the latest Long Term Support (LTS) is fully supported by the R framework development team.
We will take Ubuntu 14.04 LTS as an example. Perform the following steps:
Installing RStudio on Windows and Ubuntu is pretty much the same, as RStudio offers installers for nearly all platforms. The steps are listed as follows:
As R updates continuously, it is possible that you have, even after a short time, several versions of R installed on your system. Sometimes, you also have projects that require an older version of R to run properly.
When R is installed on Windows, it automatically writes the version being installed into the registry as the current version of R. And this will also be the version that RStudio uses. You can choose the version of R that you want to use by holding the Ctrl key during the launch of RStudio.
On Linux, you can use a command with R to see which version of R, RStudio uses. If you want RStudio to use another version of R (maybe you want to use an older version or because you had to install R in your Documents folder because of missing admin rights) you can overwrite the settings with the following export: RSTUDIO_WHICH_R=/usr/local/bin/R. This line has to be added to your ~/.profile file.
Updating RStudio is as easy as installing it. If you want to check if an update is available, navigate to Help | Check for Updates.
If an update is available, you can download the newest version and just install it. As RStudio saves all user information in the user's home directory, they will still be there after the update.
Now, we can take a look at RStudio's user interface.
When you start RStudio for the first time, you will see four main panes. If you want to customize the four main panes, you can do it by navigating to Tools | Global Options | Pane Layout.
We will explain their use, but first we need to create a new R script file by clicking on File | New File | R Script.
The new R script file is opened in a new pane and is named Untitled1.
You can see that we now have four panes. They are named as follows:
RStudio's source editor was developed in a fully functional R editor over the last few years. It has a powerful syntax highlighter that works with not only every format connected to R development, such as R Scripts, R Markdown, or R documentation files, but also C++, JavaScript, HTML, and many more.
We've already created a new R script file and can now demonstrate some of the code editor's functions. You can also open an existing R document by clicking on File | Open File, or by using the shortcut, Ctrl + O.
The code editor works with tabs, which gives you the possibility of opening several files at the same time, as you can see in the following screenshot. If there are unsaved changes in a file, their names will be highlighted in red and marked with an asterisk.
If you have several files opened, you will see a double arrow in the menu of the source code editor. This will open a small menu showing you an overview of all the opened files. You can also search for a specific file.
Under the tabs with the opened files, you can see a toolbox with tools for the code editor. For example, you have the Source on Save checkbox. This is a really handy tool especially when you are working on a reusable function. If activated, the function is automatically sourced to the global environment and we do not have to source it manually again after editing the code.
Another function you can find in the toolbox is the search and replace tool. This is known from a lot of text editors and helps you find existing code and replace it. RStudio also offers different options for your search, such as In selection, to just search in the code you selected in the editor or Match case, to make the search case-sensitive. This is demonstrated in the following screenshot:
RStudio highlights parts of your code according to the R language definition. This makes your code much easier to read. The default settings are:
One of the most important menus in the source editor is what you find when you click on the magic stick. If you forgot what exact arguments the selected function needs, just hit the Tab button and you will see a list of available arguments with a description, if available:
You can then scroll through the list and select the argument you want to use. This is especially useful when you have functions that can be called with a lot of different arguments; it would be very time-consuming to open the package documentation for every function call.
You can also find direct links to the help or function definition, which shows you where the current function is defined.
After that, you can find the functions, Extract Function and Extract Variable. These functions help you in creating functions. When you click on Extract Function or use the shortcut, Ctrl + Alt + X, RStudio creates a function from your selection and inserts it in the source code.
After executing the command, your code will look like this:
The next button is the Compile Notebook button. This helps you compile your currently opened source file into a notebook with the format, HTML, PDF, or MS Word:
The compiled report will then open in a new window.
This is the code we used for the preceding example; if you want to reproduce it, type the following code:
On the extreme right of the source code menu, you will find the buttons needed to run the code. These buttons are:
Code regions are foldable regions of code in the code editor. We will explain later how you can create them.
If you want to execute a single line, or rather, if you want to run the current line where your cursor is, you can use the Run button or the shortcut, Ctrl + Enter. After the execution, the cursor will jump to the next line in the source file.
If you want to execute several lines of code, you can select the lines and press the Run button.
RStudio supports both automatic and user-defined folding for regions of code. This is a very handy feature, especially when you work with functions and larger scripts. It lets you hide and show blocks to make the code easier to navigate.
RStudio automatically folds the following regions in the source editor:
The output looks like this:
To define a code section on your own and to make it easier to navigate in larger source files, you can use three methods:
So, the line can start with any number of pound signs (#), but is has to end with at least four or more -, , or # characters. RStudio then automatically defines the following code as the section. To navigate between code sections, you can use the Jump To menu at the bottom of the editor.
The menu at the bottom, on the right-hand side lets you choose the file format of the currently opened source file. Normally, RStudio chooses the right format automatically. If you change it manually, the code completion and the syntax highlighting will adapt to the new settings.
RStudio offers visual debuggers to help you understand code and find bugs and problems. Therefore, it uses the debugging functions of R but integrates them seamlessly into the RStudio user interface. You can find these tools in the Debug tab of the menu, or by pressing Alt + D:
You can set breakpoints right in the source editor by clicking on the number of the line, or by pressing Shift + F9:
The debugger output can help you find bugs in your code in a better way. In this example, the debugger output is debug.R:10. This means that we should look into the tenth line of the source file:
With the default settings, this pane consists of the tabs, Environment and History. You can use the shortcut, Ctrl + 8, to switch to theEnvironment browser, and Ctrl + 4 to switch to the Historywindow:
The
