R For Dummies - Andrie de Vries - E-Book

R For Dummies E-Book

Andrie de Vries

4,8
22,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Mastering R has never been easier Picking up R can be tough, even for seasoned statisticians and data analysts. R For Dummies, 2nd Edition provides a quick and painless way to master all the R you'll ever need. Requiring no prior programming experience and packed with tons of practical examples, step-by-step exercises, and sample code, this friendly and accessible guide shows you how to know your way around lists, data frames, and other R data structures, while learning to interact with other programs, such as Microsoft Excel. You'll learn how to reshape and manipulate data, merge data sets, split and combine data, perform calculations on vectors and arrays, and so much more. R is an open source statistical environment and programming language that has become very popular in varied fields for the management and analysis of data. R provides a wide array of statistical and graphical techniques, and has become the standard among statisticians for software development and data analysis. R For Dummies, 2nd Edition takes the intimidation out of working with R and arms you with the knowledge and know-how to master the programming language of choice among statisticians and data analysts worldwide. * Covers downloading, installing, and configuring R * Includes tips for getting data in and out of R * Offers advice on fitting regression models and ANOVA * Provides helpful hints for working with graphics R For Dummies, 2nd Edition is an ideal introduction to R for complete beginners, as well as an excellent technical reference for experienced R programmers.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 570

Veröffentlichungsjahr: 2015

Bewertungen
4,8 (16 Bewertungen)
13
3
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



R For Dummies®, 2nd Edition

Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com

Copyright © 2015 by John Wiley & Sons, Inc., Hoboken, New Jersey

Media and software compilation copyright © 2015 by John Wiley & Sons, Inc. All rights reserved.

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.

For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit www.wiley.com/techsupport.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2015941928

ISBN 978-1-119-05580-8 (pbk); ISBN 978-1-119-05583-9 (epub); 978-1-119-05585-3 (epdf)

R For Dummies

Visit www.dummies.com/cheatsheet/R to view this book's cheat sheet.

Table of Contents

Cover

Introduction

About This Book

Changes in the Second Edition

Conventions Used in This Book

What You’re Not to Read

Foolish Assumptions

How This Book Is Organized

Icons Used in This Book

Beyond the Book

Where to Go from Here

Part I: Getting Started with R Programming

Chapter 1: Introducing R: The Big Picture

Recognizing the Benefits of Using R

Looking At Some of the Unique Features of R

Chapter 2: Exploring R

Working with a Code Editor

Starting Your First R Session

Sourcing a Script

Navigating the Environment

Chapter 3: The Fundamentals of R

Using the Full Power of Functions

Keeping Your Code Readable

Getting from Base R to More

Part II: Getting Down to Work in R

Chapter 4: Getting Started with Arithmetic

Working with Numbers, Infinity, and Missing Values

Organizing Data in Vectors

Getting Values in and out of Vectors

Working with Logical Vectors

Powering Up Your Math

Chapter 5: Getting Started with Reading and Writing

Using Character Vectors for Text Data

Manipulating Text

Factoring in Factors

Chapter 6: Going on a Date with R

Working with Dates

Presenting Dates in Different Formats

Adding Time Information to Dates

Formatting Dates and Times

Performing Operations on Dates and Times

Chapter 7: Working in More Dimensions

Adding a Second Dimension

Using the Indices

Naming Matrix Rows and Columns

Calculating with Matrices

Adding More Dimensions

Combining Different Types of Values in a Data Frame

Manipulating Values in a Data Frame

Combining Different Objects in a List

Part III: Coding in R

Chapter 8: Putting the Fun in Functions

Moving from Scripts to Functions

Using Arguments the Smart Way

Coping with Scoping

Dispatching to a Method

Chapter 9: Controlling the Logical Flow

Making Choices with if Statements

Doing Something Else with an if. . .else Statement

Vectorizing Choices

Making Multiple Choices

Looping Through Values

Looping without Loops: Meeting the Apply Family

Chapter 10: Debugging Your Code

Knowing What to Look For

Reading Errors and Warnings

Going Bug Hunting

Generating Your Own Messages

Recognizing the Mistakes You’re Sure to Make

Chapter 11: Getting Help

Finding Information in the R Help Files

Searching the Web for Help with R

Getting Involved in the R Community

Making a Minimal Reproducible Example

Part IV: Making the Data Talk

Chapter 12: Getting Data into and out of R

Getting Data into R

Getting Your Data out of R

Working with Files and Folders

Chapter 13: Manipulating and Processing Data

Deciding on the Most Appropriate Data Structure

Creating Subsets of Your Data

Adding Calculated Fields to Data

Combining and Merging Data Sets

Sorting and Ordering Data

Traversing Your Data with the Apply Functions

Getting to Know the Formula Interface

Whipping Your Data into Shape

Chapter 14: Summarizing Data

Starting with the Right Data

Describing Continuous Variables

Describing Categories

Describing Distributions

Describing Multiple Variables

Working with Tables

Chapter 15: Testing Differences and Relations

Taking a Closer Look at Distributions

Comparing Two Samples

Testing Counts and Proportions

Working with Models

Part V: Working with Graphics

Chapter 16: Using Base Graphics

Creating Different Types of Plots

Controlling Plot Options and Arguments

Saving Graphics to Image Files

Chapter 17: Creating Faceted Graphics with Lattice

Creating a Lattice Plot

Changing Plot Options

Plotting Different Types

Plotting Data in Groups

Printing and Saving a Lattice Plot

Chapter 18: Looking At ggplot2 Graphics

Installing and Loading ggplot2

Looking At Layers

Using Geoms and Stats

Sussing Stats

Adding Facets, Scales, and Options

Getting More Information

Part VI: The Part of Tens

Chapter 19: Ten Things You Can Do in R That You Would’ve Done in Microsoft Excel

Adding Row and Column Totals

Formatting Numbers

Sorting Data

Making Choices with If

Calculating Conditional Totals

Transposing Columns or Rows

Finding Unique or Duplicated Values

Working with Lookup Tables

Working with Pivot Tables

Using the Goal Seek and Solver

Chapter 20: Ten Tips on Working with Packages

Poking Around the Nooks and Crannies of CRAN

Finding Interesting Packages

Installing Packages

Loading Packages

Reading the Package Manual and Vignette

Updating Packages

Forging Ahead with R-Forge

Getting packages from github

Conducting Installations from BioConductor

Reading the R Manual

Appendix A: Installing R and RStudio

Installing and Configuring R

Installing and Configuring RStudio

Appendix B: The rfordummies Package

Using rfordummies

About the Authors

Cheat Sheet

Connect with Dummies

End User License Agreement

Guide

Cover

Table of Contents

Begin Reading

Pages

i

ii

iii

iv

v

vi

vii

viii

ix

x

xi

1

2

3

4

5

6

7

9

10

11

12

13

14

15

16

17

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

395

396

397

398

399

400

401

402

419

420

Introduction

Welcome to R For Dummies, the book that helps you learn the statistical programming language R quickly and easily.

We can’t guarantee that you’ll be a guru if you read this book, but you should be able to

Perform data analysis by using a variety of powerful tools.

Use the power of R to do statistical analysis and data-processing tasks.

Appreciate the beauty of using vector-based operations (rather than loops) to do speedy calculations.

Appreciate the meaning of the following line of code:

knowledge <- apply(theory, 1, sum)

Know how to find, download, and use code that has been contributed to R by its very active community of developers.

Know where to find extra help and resources to take your R coding skills to the next level.

Create beautiful graphs and visualizations of your data.

About This Book

R For Dummies is an introduction to the statistical programming language known as R. We start by introducing the interface and work our way from the very basic concepts of the language through more sophisticated data manipulation and analysis.

We illustrate every step with easy-to-follow examples. This book contains numerous code snippets, several write-it-yourself functions you can use later on, and complete analysis scripts. All these are for you to try out yourself.

We don’t attempt to give a technical description of how R is programmed internally, but we do focus as much on the why as on the how. R has many features that may seem surprising at first, so we believe it’s important to explain both how you should talk to R, and how the R engine interprets what you say. After reading this book, you should be able to manipulate your data in the form you want and understand how to use functions we didn’t cover in the book (as well as the ones we do cover).

This book is a reference. You don’t have to read it from beginning to end. Instead, you can use the table of contents and index to find the information you need. We cross-reference other chapters where you can find more information.

Changes in the Second Edition

Since the publication of the first edition, R has kept evolving and improving. To keep the book accurate, we updated the code to reflect any changes in the latest version of R (version 3.2.0). With the feedback from readers, students, and colleagues we could rework some sections to clarify issues and correct inaccuracies. For example, we modified the code to use double quotes instead of single quotes when using text strings. We also refer to the fundamental units of lists as components, rather than elements.

The new rfordummies package contains code examples in the book. Read all about it in Appendix B.

R and RStudio

R For Dummies can be used with any operating system that R runs on. Whether you use Mac, Linux, or Windows, this book will get you on your way with R.

R is more a programming language than an application. When you download R, you automatically download a console application that’s suitable for your operating system. However, this application has only basic functionality, and it differs to some extent from one operating system to the next.

RStudio is a cross-platform application, also known as an Integrated Development Environment (IDE) with some very neat features to support R. In this book, we don’t assume you use any specific console application. However, RStudio provides a common user interface across the major operating systems. For this reason, we use RStudio to demonstrate some of the concepts rather than any specific operating-system version of R.

What You’re Not to Read

You can use this book however works best for you, but if you’re pressed for time (or just not interested in the nitty-gritty details), you can safely skip anything marked with a Technical Stuff icon. You also can skip sidebars (text in gray boxes); they contain interesting information, but nothing critical to your understanding of the subject at hand.

Foolish Assumptions

This book makes the following assumptions about you and your computer:

You know your way around a computer.

You know how to download and install software. You know how to find information on the Internet and you have Internet access.

You’re not necessarily a programmer.

If you are a programmer, and you’re used to coding in other languages, you may want to read the notes marked by the Technical Stuff icon — there, we fill you in on how R is similar to, or different from, other common languages.

You’re not a statistician, but you understand the very basics of statistics.

R For Dummies

isn’t a statistics book, although we do show you how to do some basic statistics using R. If you want to understand the statistical stuff in more depth, we recommend

Statistics For Dummies,

2nd Edition, by Deborah J. Rumsey, PhD (Wiley).

You want to explore new stuff.

You like to solve problems and aren’t afraid of trying things out in the R console.

How This Book Is Organized

The book is organized in six parts. Here’s what each of the six parts covers.

Part I: Getting Started with R Programming

In this part, you write your first script. You use the powerful concept of vectors to make simultaneous calculations on many variables at once. You work with the R workspace (in other words, how to create, modify, or remove variables). You find out how to save your work and retrieve and modify script files that you wrote in previous sessions. We also introduce some fundamentals of R (for example, how to install packages).

Part II: Getting Down to Work in R

In this part, we fill you in on the three R’s: reading, ’riting, and ’rithmetic — in other words, working with text and numbers (and dates for good measure). You also get to use the very important data structures of lists and data frames.

Part III: Coding in R

R is a programming language, so you need to know how to write and understand functions. In this part, we show you how to do this, as well as how to control the logic flow of your scripts by making choices using if statements, as well as looping through your code to perform repetitive actions. We explain how to make sense of and deal with warnings and errors that you may experience in your code. Finally, we show you some tools to debug any issues that you may experience.

Part IV: Making the Data Talk

In this part, we introduce the different data structures that you can use in R, such as lists and data frames. You find out how to get your data in and out of R (for example, by reading data from files or the Clipboard). You also see how to interact with other applications, such as Microsoft Excel.

Then you discover how easy it is to do some advanced data reshaping and manipulation in R. We show you how to select a subset of your data and how to sort and order it. We explain how to merge different datasets based on columns they may have in common. Finally, we show you a very powerful generic strategy of splitting and combining data and applying functions over subsets of your data. When you understand this strategy, you can use it over and over again to do sophisticated data analyses in only a few small steps.

After reading this part, you’ll know how to describe and summarize your variables and data using R. You’ll be able to do some classical tests (for example, calculating a t-test). And you’ll know how to use random numbers to simulate some distributions.

Finally, we show you some of the basics of using linear models (for example, linear regression and analysis of variance). We also show you how to use R to predict the values of new data using models that you’ve fitted to your data.

Part V: Working with Graphics

They say that a picture is worth a thousand words. This is certainly the case when you want to share your results with other people. In this part, you discover how to create basic and more sophisticated plots to visualize your data. We move on from bar charts and line charts, and show you how to present cuts of your data using facets.

Part VI: The Part of Tens

In this part, we show you how to do ten things in R that you probably use Microsoft Excel for at the moment (for example, how to do the equivalent of pivot tables and lookup tables). We also give you ten tips for working with packages that are not part of base R.

Icons Used in This Book

As you read this book, you’ll find little pictures in the margins. These pictures, or icons, mark certain types of text:

When you see the Tip icon, you can be sure to find a way to do something more easily or quickly.

You don’t have to memorize this book, but the Remember icon points out some useful things that you really should remember. Usually this indicates a design pattern or idiom that you’ll encounter in more than one chapter.

When you see the Warning icon, listen up. It points out something you definitely don’t want to do. Although it’s really unlikely that using R will cause something disastrous to happen, we use the Warning icon to alert you if something is bound to lead to confusion.

The Technical Stuff icon indicates technical information you can merrily skip over. We do our best to make this information as interesting and relevant as possible, but if you’re short on time or you just want the information you absolutely need to know, you can move on by.

Beyond the Book

R For Dummies includes the following goodies online for easy download:

Cheat Sheet:

You can find the Cheat Sheet for this book here:

www.dummies.com/cheatsheet/r

Extras:

We provide a few extra articles here:

www.dummies.com/extras/r

Example code:

We provide the example code for the book here:

www.dummies.com/extras/r

If we have updates to the content of the book, look here for it:

www.dummies.com/extras/r

Where to Go from Here

There’s only one way to learn R: Use it! In this book, we try to make you familiar with the usage of R, but you’ll have to sit down at your PC and start playing around with it yourself. Crack the book open so the pages don’t flip by themselves, and start hitting the keyboard!

Part I

Getting Started with R Programming

Visit www.dummies.com for great Dummies content online.

In this part …

Introducing R programming concepts.

Creating your first script.

Making clear, legible code.

Visit www.dummies.com for great Dummies content online.

Chapter 1

Introducing R: The Big Picture

In This Chapter

Discovering the benefits of R

Identifying some programming concepts that make R special

With an estimated worldwide user base of more than 2 million people, the R language has rapidly grown and extended since its origin as an academic demonstration language in the 1990s.

Some people would argue — and we think they’re right — that R is much more than a statistical programming language. It’s also

A very powerful tool for all kinds of data processing and manipulation

A community of programmers, users, academics, and practitioners

A tool that makes all kinds of publication-quality graphics and data visualizations

A collection of freely distributed add-on packages

A versatile toolbox for extensive automation of your work

In this chapter, we fill you in on the benefits of R, as well as its unique features and quirks.

You can download R at www.r-project.org. This website also provides more information on R and links to the online manuals, mailing lists, conferences, and publications.

Tracing the history of R

Ross Ihaka and Robert Gentleman developed R as a free software environment for their teaching classes when they were colleagues at the University of Auckland in New Zealand. Because they were both familiar with S, a programming language for statistics, it seemed natural to use similar syntax in their own work. After Ihaka and Gentleman announced their software on the S-news mailing list, several people became interested and started to collaborate with them, notably Martin Mächler.

Currently, a group of 21 people has rights to modify the central archive of source code (http://www.r-project.org/contributors.html). This group is referred to as the R Core Team. In addition, many other people have contributed new code and bug fixes to the project.

Here are some milestone dates in the development of R:

Early 1990s: The development of R began.August 1993: The software was announced on the S-news mailing list. Since then, a set of active R mailing lists has been created. The web page at www.r-project.org/mail.html provides descriptions of these lists and instructions for subscribing. (For more information, turn to “It provides an engaged community,” later in this chapter.)June 1995: After some persuasive arguments by Martin Mächler (among others) to make the code available as “free software,” the code was made available under the Free Software Foundation’s GNU General Public License (GPL), Version 2.Mid-1997: The initial R Development Core Team was formed (although, at the time, it was simply known as the core group).February 2000: The first version of R, version 1.0.0, was released.October 2004: Release of R version 2.0.0.April 2013: Release of R version 3.0.0.April 2015: Release of R-3.2.0 (the version used in this book).

Ross Ihaka wrote a comprehensive overview of the development of R. The web page http://cran.r-project.org/doc/html/interface98-paper/paper.html provides a fascinating history.

Recognizing the Benefits of Using R

Of the many attractive benefits of R, a few stand out: It’s actively maintained, it has good connectivity to various types of data and other systems, and it’s versatile enough to solve problems in many domains. Possibly best of all, it’s available for free, in more than one sense of the word.

It comes as free, open-source code

R is available under an open-source license, which means that anyone can download and modify the code. This freedom is often referred to as “free as in speech.” R is also available free of charge — a second kind of freedom, sometimes referred to as “free as in beer.” In practical terms, this means that you can download and use R free of charge.

As a result of this freedom, many excellent programmers have contributed improvements and fixes to the R code. For this reason, R is very stable and reliable.

Any freedom also has associated obligations. In the case of R, these obligations are described in the conditions of the license under which it is released: GNU General Public License (GPL), Version 2. The full text of the license is available at www.r-project.org/COPYING. It’s important to stress that the GPL does not pertain to your usage of R. There are no obligations for using the software — the obligations just apply to redistribution. In short, if you change and redistribute the R source code, you have to make those changes available for anybody else to use.

It runs anywhere

The R Core Team has put a lot of effort into making R available for different types of hardware and software. This means that R is available for Windows, Unix systems (such as Linux), and the Mac.

It supports extensions

R itself is a powerful language that performs a wide variety of functions, such as data manipulation, statistical modeling, and graphics. One really big advantage of R, however, is its extensibility. Developers can easily write their own software and distribute it in the form of add-on packages. Because of the relative ease of creating and using these packages, literally thousands of packages exist. In fact, many new (and not-so-new) statistical methods are published with an R package attached.

It provides an engaged community

The R user base keeps growing. Many people who use R eventually start helping new users and advocating the use of R in their workplaces and professional circles. Sometimes they also become active on

The R mailing lists (

http://www.r-project.org/mail.html

Question-and-answer (Q&A) websites, such as

StackOverflow, a programming Q&A website (

www.stackoverflow.com/questions/tagged/r

)

CrossValidated, a statistics Q&A website (

http://stats.stackexchange.com/questions/tagged/r

)

In addition to these mailing lists and Q&A websites, R users may

Blog actively (

www.r-bloggers.com

).

Participate in social networks such as Twitter (

www.twitter.com/search/rstats

).

Attend regional and international R conferences.

See Chapter 11 for more information on R communities.

It connects with other languages

As more and more people moved to R for their analyses, they started trying to incorporate R in their previous workflows. This led to a whole set of packages for linking R to file systems, databases, and other applications. Many of these packages have since been incorporated into the base installation of R.

For example, the R package foreign (http://cran.r-project.org/web/packages/foreign/index.html) forms part of the recommended packages of R and enables you to read data from the statistical packages SPSS, SAS, Stata, and others (see Chapter 12).

Several add-on packages exist to connect R to database systems, such as

RODBC

, to read from databases using the Open Database Connectivity protocol (ODBC) (

http://cran.r-project.org/web/packages/RODBC/index.html

)

ROracle

, to read Oracle data bases (

http://cran.r-project.org/web/packages/ROracle/index.html

).

Initially, most of R was based on Fortran and C. Code from these two languages easily could be called from within R. As the community grew, C++, Java, Python, and other popular programming languages got more and more connected with R.

As more data analysts started using R, the developers of commercial data software no longer could ignore the new kid on the block. Many of the big commercial packages have add-ons to connect with R. Notably, both IBM’s SPSS and SAS Institute’s SAS allow you to move data and graphics between the two packages, and also call R functions directly from within these packages.

Other third-party developers also have contributed to better connectivity between different data analysis tools. For example, Statconn developed RExcel, an Excel add-on that allows users to work with R from within Excel (http://www.statconn.com/products.html).

Looking At Some of the Unique Features of R

R is more than just a domain-specific programming language aimed at data analysis. It has some unique features that make it very powerful, the most important one arguably being the notion of vectors. These vectors allow you to perform sometimes complex operations on a set of values in a single command.

Performing multiple calculations with vectors

R is a vector-based language. You can think of a vector as a row or column of numbers or text. The list of numbers {1,2,3,4,5}, for example, could be a vector. Unlike most other programming languages, R allows you to apply functions to the whole vector in a single operation without the need for an explicit loop.

It is time to illustrate vectors with some real R code. First, assign the values 1:5 to a vector called x:

> x <- 1:5> x[1] 1 2 3 4 5

Next, add the value 2 to each element in the vector x:

> x + 2[1] 3 4 5 6 7

You can also add one vector to another. To add the values 6:10 element-wise to x, you do the following:

> x + 6:10[1] 7 9 11 13 15

To do this in most other programming language would require an explicit loop to run through each value of x. However, R is designed to perform many operations in a single step. This functionality is one of the features that make R so useful — and powerful — for data analysis.

We introduce the concept of vectors in Chapter 2 and expand on vectors and vectorization in much more depth in Chapter 4.

Processing more than just statistics

R was developed by statisticians to make statistical data analysis easier. This heritage continues, making R a very powerful tool for performing virtually any statistical computation.

As R started to expand away from its origins in statistics, many people who would describe themselves as programmers rather than statisticians have become involved with R. The result is that R is now eminently suitable for a wide variety of nonstatistical tasks, including data processing, graphical visualization, and analysis of all sorts. R is being used in the fields of finance, natural language processing, genetics, biology, and market research, to name just a few.

R is Turing complete, which means that you can use R alone to program anything you want. (Not every task is easy to program in R, though.)

In this book, we assume that you want to find out about R programming, not statistics, although we provide an introduction to statistics with R in Part IV.

Running code without a compiler

R is an interpreted language, which means that — contrary to compiled languages like C and Java — you don’t need a compiler to first create a program from your code before you can use it. R interprets the code you provide directly and converts it into lower-level calls to pre-compiled code/functions.

In practice, it means that you simply write your code and send it to R, and the code runs, which makes the development cycle easy. This ease of development comes at the cost of speed of code execution, however. The downside of an interpreted language is that the code usually runs slower than the equivalent compiled code.

If you have experience in other languages, be aware that R is not C or Java. Although you can use R as a procedural language such as C or an object-oriented language such as Java, R is mostly based on the functional programming paradigm. As we discuss later in this book, especially in Part III, this characteristic requires a bit of a different mindset. Forget what you know about other languages, and prepare for something completely different.

Chapter 2

Exploring R

In This Chapter

Looking at your R editing options

Starting R

Writing your first R script

Finding your way around the R environment

In order to start working in R, you need two things. First, you need a tool to easily write and edit code (an editor). You also need an interface, so you can send that code to R. Which tools you use depend to some extent on your operating system. The basic R install gives you these options:

Windows:

A basic user interface called

RGui.

Mac OS X:

A basic user interface called

R.app.

Linux:

There is no specific interface on Linux, but you can use any code editor (like Vim or Emacs) to edit your R code. R itself opens by default in a terminal window.

At a practical level, this difference between operating systems and interfaces doesn’t matter very much. R is a programming language, and you can be sure that R interprets your code identically across operating systems.

Still, we want to show you how to use a standard R interface, so in this chapter we briefly illustrate how to use R with the Windows RGui. Our advice also works on the Mac R.app.

Fortunately, there is an alternative, third-party interface called RStudio that provides a consistent user interface regardless of operating system. RStudio increasingly is the standard editing tool for R, so we also illustrate how to use RStudio.

In this chapter, after opening an R console, you flex your R muscles and write some scripts. You do some calculations, create some numeric and text objects, take a look at the built-in help, and save your work.

Working with a Code Editor

R is many things: a programming language, a statistical processing environment, a way to solve problems, and a collection of helpful tools to make your life easier. The one thing that R is not is an application, which means that you have the freedom of selecting your own editing tools to interact with R.

In this section we discuss the Windows R interface, RGui (short for R graphical user interface). This interface also includes a very basic editor for your code. Since this standard editor is so, well, basic, we also introduce you to RStudio. RStudio offers a richer editing environment than RGui and many handy shortcuts for common tasks in R.

Alternatives to the standard R editors

Among the many freedoms that R offers you is the freedom to choose your own code editor and development environment, so you don’t have to use the standard R editors or RStudio.

These are powerful full-featured editors and development environments:

Eclipse StatET (www.walware.de/goto/statet): Eclipse, another powerful integrated development environment, has an R add-in called StatET. If you’ve done software development on large projects, you may find Eclipse useful. Eclipse requires you to install Java on your computer.Emacs Speaks Statistics (http://ess.r-project.org): Emacs, a powerful text and code editor, is widely used in the Linux world and also is available for Windows. It has a statistics add-in called Emacs Speaks Statistics (ESS), which is famous for having keyboard shortcuts for just about everything you could possibly do and for its very loyal fan base. If you’re a programmer coming from the Linux world, this editor may be a good choice for you.Tinn-R (http://nbcgib.uesc.br/lec/software/editores/tinn-r/en): This editor, developed specifically for working with R, is available only for Windows. It has some nice features for setting up collections of R scripts in projects. Tinn-R is easier to install and use than either Eclipse or Emacs, but it isn’t as fully featured.

A couple of interfaces are designed as tools for special purposes:

Rcommander (http://www.rcommander.com/): Rcommander provides a simple GUI for data analysis in R and contains a variety of plugins for different tasks.Rattle (http://rattle.togaware.com/): Rattle is a GUI designed for typical data mining tasks.

Exploring RGui

As part of the process of downloading and installing R, you get the standard graphical user interface (GUI), called RGui. RGui gives you some tools to manage your R environment — most important, a console window. The console is where you type instructions and generally get R to do useful things for you.

Seeing the naked R console

The standard installation process creates useful menu shortcuts (although this may not be true if you use Linux, because there is no standard GUI interface for Linux). In the menu system, look for a folder called R, and then find an icon called R followed by a version number (for example, R 3.2.0, as shown in Figure 2-1).

Figure 2-1: Shortcut icons for RGui (R x64) and RStudio.

When you open RGui for the first time, you see the R Console screen (shown in Figure 2-2), which lists some basic information such as your version of R and the licensing conditions.

Figure 2-2: A brand-new session in RGui.

Below all this information is the R prompt, denoted by a > symbol. The prompt indicates where you type your commands to R; you see a blinking cursor to the right of the prompt.

We explore the R console in more depth in “Navigating the Environment,” later in this chapter.

Issuing a simple command

Use the console to issue a very simple command to R. Type the following to calculate the sum of some numbers, directly after the prompt:

> 24 + 7 + 11

R responds immediately to your command, calculates and displays the total in the console:

> 24 + 7 + 11[1] 42

The answer is 42. R gives you one other piece of information: The [1] preceding 42 indicates that the value 42 is the first element in your answer. It is, in fact, the only element in your answer! One of the clever things about R is that it can deal with calculating many values at the same time, which is called vector operations. We talk about vectors later in this chapter — for now, all you need to know is that R can handle more than one value at a time.

Closing the console

To quit your R session, type the following code in the console, after the command prompt (>):

> quit()

R asks you a question to make sure that you meant to quit, as shown in Figure 2-3. Click No, because you have nothing to save. This action closes your R session (as well as RGui, if you’ve been using RGui as your code editor). In fact, saving a workspace image rarely is useful.

Figure 2-3: R asks you a simple question.

Dressing up with RStudio

RStudio is a code editor and development environment with some very nice features that make code development in R easy and fun:

Code highlighting that gives different colors to keywords and variables, making it easier to read

Automatic bracket and parenthesis matching

Code completion, so you don’t have to type out all commands in full

Easy access to R Help, with some nice features for exploring functions and parameters of functions

Easy exploration of variables and values

Because RStudio is available free of charge for Linux, Windows, and Apple OS X, we think it’s a good option to use with R. In fact, we like RStudio so much that we use it to illustrate the examples in this book. Throughout the book, you find some tips and tricks on how things can be done in RStudio. If you decide to use a different code editor, you can still use all the code examples and you’ll get identical results.

To open RStudio, click the RStudio icon in your menu system or on your desktop. (You can find installation instructions in this book’s appendix.)

Once RStudio starts, choose File⇒New⇒R Script to open a new script file.

Your screen should look like Figure 2-4. You have four work areas (also called panes):

Source:

The top-left corner of the screen contains a text editor that lets you work with source script files. Here, you can enter multiple lines of code, save your script file to disk, and perform other tasks on your script. This code editor works a bit like every other text editor you’ve ever seen, but it’s smart. It recognizes and highlights various elements of your code, for example (using different colors for different elements), and it also helps you find matching brackets in your scripts.

Console:

In the bottom-left corner, you find the console. The console in RStudio can be used in the same way as the console in RGui (refer to “

Seeing the naked R console

,” earlier in this chapter). This is where you do all the interactive work with R.

Environment and History:

The top-right corner is a handy overview of your environment, where you can inspect the variables you created in your session, as well as their values. (We discuss the environment in more detail later in this chapter.) This is also the area where you can see a history of the commands you’ve issued in R.

Files, plots, package, help, and viewer:

In the bottom-right corner, you have access to several tools:

Files:

This is where you can browse the folders and files on your computer.

Plots:

This is where R displays your plots (charts or graphs). We discuss plots in

Part V

.

Packages: You can view a list of all installed packages.

A package is a self-contained set of code that adds functionality to R, similar to the way that add-ins add functionality to Microsoft Excel.

Help:

This is where you can browse R's built-in Help system.

Viewer:

This is where RStudio displays previews of some advanced features, such as dynamic web pages and presentations that you can create with R and add-on packages.

Figure 2-4: RStudio’s four work areas (panes).

Starting Your First R Session

By now, you probably are itching to get started on some real code. In this section, you get to do exactly that. Get ready to get your hands dirty!

Saying hello to the world

Programming books typically start with a very simple program. Often, this first program creates the message "Hello world!". In R, hello world program consists of one line of code.

Start a new R session, type the following in your console, and press Enter:

> print("Hello world!")

R responds immediately with this output:

[1] "Hello world!"

As we explain in the introduction to this book, we collapse input and output into a single block of code, like this:

> print("Hello world!")[1] "Hello world!"

Doing simple math

Type the following in your console to calculate the sum of five numbers:

> 1 + 2 + 3 + 4 + 5[1] 15

The answer is 15, which you can easily verify for yourself. You may think that there’s an easier way to calculate this value, though — and you’d be right. We explain how in the following section.

Using vectors

A vector is the simplest type of data structure in R. The R manual defines a vector as “a single entity consisting of a collection of things”. A collection of numbers, for example, is a numeric vector — the first five integer numbers form a numeric vector of length 5.

To construct a vector, type into the console:

> c(1, 2, 3, 4, 5)[1] 1 2 3 4 5

In constructing your vector, you have successfully used a function in R. In programming language, a function is a piece of code that takes some inputs and does something specific with them. In constructing a vector, you tell the c() function to construct a vector with the first five integers. The entries inside the parentheses are referred to as arguments.

You also can construct a vector by using operators. An operator is a symbol you stick between two values to make a calculation. The symbols +, -, *, and / are all operators, and they have the same meaning they do in mathematics. Thus, 1+2 in R returns the value 3, just as you’d expect.

One very handy operator is called sequence, and it looks like a colon (:). Type the following in your console:

> 1:5[1] 1 2 3 4 5

That’s more like it. With three keystrokes, you’ve generated a vector with the values 1 through 5. To calculate the sum of this vector, type into your console:

> sum(1:5)[1] 15

While quite basic, this example shows you that using vectors allows you to do complex operations with a small amount of code. As vectors are the smallest possible unit of data in R, you get to work with vectors extensively in later chapters.

Storing and calculating values

Using R as a calculator is very interesting but perhaps not all that useful. A much more useful capability is storing values and then doing calculations on these stored values. Try this:

> x <- 1:5> x[1] 1 2 3 4 5

In these two lines of code, you first assign the sequence 1:5 to an object called x. Then you ask R to print the value of x by typing x in the console and pressing Enter.

In R, the assignment operator is <-, which you type in the console by using two keystrokes: the less-than symbol (<) followed by a hyphen (-). The combination of these two symbols represents assignment. It's good practice to always surround the <- with spaces. This makes your code much easier to read and understand.

In addition to retrieving the value of a variable, you can do calculations on that value. Create a second variable called y, and assign it the value 10. Then add the values of x and y, as follows:

> y <- 10> x + y[1] 11 12 13 14 15

The values of the two variables themselves don’t change unless you assign a new value to either of them. You can check this by typing the following:

> x[1] 1 2 3 4 5> y[1] 10

Now create a new variable z, assign it the value of x + y, and print its value:

> z <- x + y> z[1] 11 12 13 14 15

Variables also can take on text values. You can assign the value "Hello" to a variable called h, for example, by presenting the text to R inside quotation marks, like this:

> h <- "Hello"> h[1] "Hello"

You must enter text or character values to R inside quotation marks — either single or double. R accepts both. So both h <- "Hello" and h <- 'Hello' are examples of valid R syntax. For consistency, we use double quotes throughout this book.

In “Using vectors,” earlier in this chapter, you use the c() function to combine numeric values into vectors. This technique also works for text:

> hw <- c("Hello", "world!")> hw[1] "Hello" "world!"

You use the paste() function to concatenate multiple text elements. By default, paste() puts a space between the different elements, like this:

> paste("Hello", "world!")[1] "Hello world!"

Talking back to the user

You can write R scripts that have some interaction with a user. To ask the user questions, you can use the readline() function. In the following code snippet, you read a value from the keyboard and assign it to the variable yourname:

> h <- "Hello"> yourname <- readline("What is your name? ")What is your name? Andrie> paste(h, yourname)[1] "Hello Andrie"

This code seems to be a bit cumbersome, however. Clearly, it would be much better to send these three lines of code simultaneously to R and get them evaluated in one go. In the next section, we show you how.

Sourcing a Script

Until now, you’ve worked directly in the R console and issued individual commands in an interactive style of coding. In other words, you issue a command, R responds, you issue the next command, R responds, and so on.

In this section, you kick it up a notch and tell R to perform several commands one after the other without waiting for additional instructions. Because the R function to run an entire script is source(), R users refer to this process as sourcing a script.

To prepare your script to be sourced, you first write the entire script in an editor window. In RStudio, for example, the editor window is in the top-left corner of the screen (refer to Figure 2-4). Whenever you press Enter in the editor window, the cursor moves to the next line, as in any text editor.

To create a new script in RStudio, begin by opening the editor window (choose File ⇒ New File ⇒ R script to open the editor window). Type the following lines of code in the editor window. Notice that the last line contains a small addition to the code you saw earlier: the print() function.

h <- "Hello"yourname <- readline("What is your name?")print(paste(h, yourname))

Remember to type the print() function as part of your script. Sourced scripts behave differently from interactive code in printing results. In interactive mode, a result is printed without needing to use a print() function. But when you source a script, output is by default printed only if you have an explicit print() function.

You can type multiple lines of code into the source editor without having each line evaluated by R. Then, when you’re ready, you can send the instructions to R — in other words, source the script.

When you use RGui or RStudio, you can do this in one of three ways:

Send an individual line of code from the editor to the console.

Click the line of code you want to run, and then press Ctrl+R in RGui. In RStudio, you can press Ctrl+Enter or click the Run button.

Send a block of highlighted code to the console.

Select the block of code you want to run, and then press Ctrl+R (in RGui) or Ctrl+Enter (in RStudio).

Send the entire script to the console (which is called sourcing a script).

In RGui, click anywhere in your script window, and then choose Edit ⇒ Run all. In RStudio, click anywhere in the source editor, and press Ctrl+Shift+S or click the Source button.

These keyboard shortcuts are defined only in RGui or RStudio. If you use a different source editor, you may have different options.

Now you can send the entire script to the R console. To do this, click the Source button in the top-right corner of the editor window or choose Edit⇒Source. The script starts, reaches the point where it asks for input, and then waits for you to enter your name in the console window. Your screen should now look like Figure 2-5. Notice that the Environment pane now lists the two objects you created: h and yourname.

Figure 2-5: Sending a script to the console in RStudio.

When you click the Source button, source('~/.active-rstudio-document') appears in the console. What RStudio actually does here is save your script in a temporary file and then use the R function source() to call that script in the console. Remember this function; you’ll meet it again.

Echoing your work

If you click on the little arrow next to the Source button in RStudio, you see two different source options, as shown in Figure 2-6. By clicking the Source button before, you used the option without echo. This means that R will run the complete script at once, but won't send any output to the console.

Figure 2-6: Sourcing your code with or without echo in RStudio

If you click on the second option, R runs again the complete script in one go, but this time it will show every individual line in the console. So both options differ only in the output you see. You can safely try out both options to compare.

You can use the echo option also outside RStudio by using the source() function with the argument echo set to TRUE. We explain functions and arguments in Chapter 3, and far more detailed again in Chapter 8.

Whether you source with or without echo doesn't make any difference regarding the results of your code. You can use the echo option if you want to source a long script and keep track of which part of the script R is currently carrying out.

Finding help on functions

We discuss R’s built-in help system in Chapter 11, but for now, to get help on any function, type ? in the console. To get help with the paste() function, for example, type the following:

> ?paste

This code opens a Help window. In RStudio, this Help window is in the bottom-right corner of your screen by default. In other editors, the Help window sometimes appears as a local web page in your default web browser.

You also can type help, but remember to use parentheses around your search term:

> help(paste)

Navigating the Environment

So far in this chapter, you’ve created several variables. These form part of what R calls the global environment. The global environment refers to all the variables and functions (collectively called objects) that you create during the session, as well as any packages that are loaded.

Often, you want to remind yourself of all the variables you’ve created in the environment. To do this, use the ls() function to list the objects in the environment. In the console, type the following:

> ls()[1] "h"  "hw"  "x"  "y"  "yourname" "z"

R tells you the names of all the variables that you created.

One very nice feature of RStudio lets you examine the contents of the environment at any time without typing any R commands. By default, the top-right window in RStudio has two tabs: Environment and History. Click the Environment tab to see the variables in your global environment, as well as their values. For example, in Figure 2-5 you see that the global environment contains one object called h that contains the value "hello".

Manipulating the content of the environment

If you decide that you don’t need some variables anymore, you can remove them. Suppose that the object z is simply the sum of two other variables and no longer needed. To remove it permanently, use the rm() function and then use the ls() function to display the contents of the environment, as follows:

> rm(z)> ls()[1] "h"  "hw"  "x"  "y"  "yourname"

Notice that the object z is no longer there.

Saving your work

You have several options for saving your work:

You can save individual variables with the

save()

function.

You can save the entire environment with the

save.image()

function.

You can save your R script file, using the appropriate save menu command in your code editor.

Suppose you want to save the value of yourname. To do that, follow these steps:

Find out which working directory R will use to save your file by typing the following:

> getwd()[1] "c:/users/andrie"

The default working directory should be your user folder. The exact name and path of this folder depend on your operating system. (In Chapter 12, you get more familiar with the working directory.)

If you use the Windows operating system, the path is displayed with slashes instead of backslashes. In R, similar to many other programming languages, the backslash character has a special meaning. The backslash indicates an escape sequence, indicating that the character following the backslash means something special. For example, \t indicates a tab, rather than the letter t. (You can read more about escape sequences in Chapter 12