25,99 €
A deep dive into the programming language of choice for statistics and data With R All-in-One For Dummies, you get five mini-books in one, offering a complete and thorough resource on the R programming language and a road map for making sense of the sea of data we're all swimming in. Maybe you're pursuing a career in data science, maybe you're looking to infuse a little statistics know-how into your existing career, or maybe you're just R-curious. This book has your back. Along with providing an overview of coding in R and how to work with the language, this book delves into the types of projects and applications R programmers tend to tackle the most. You'll find coverage of statistical analysis, machine learning, and data management with R. * Grasp the basics of the R programming language and write your first lines of code * Understand how R programmers use code to analyze data and perform statistical analysis * Use R to create data visualizations and machine learning programs * Work through sample projects to hone your R coding skill This is an excellent all-in-one resource for beginning coders who'd like to move into the data space by knowing more about R.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 784
Veröffentlichungsjahr: 2023
R All-in-One For Dummies®
Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com
Copyright © 2023 by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: WHILE THE PUBLISHER AND AUTHORS HAVE USED THEIR BEST EFFORTS IN PREPARING THIS WORK, THEY MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES REPRESENTATIVES, WRITTEN SALES MATERIALS OR PROMOTIONAL STATEMENTS FOR THIS WORK. THE FACT THAT AN ORGANIZATION, WEBSITE, OR PRODUCT IS REFERRED TO IN THIS WORK AS A CITATION AND/OR POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE PUBLISHER AND AUTHORS ENDORSE THE INFORMATION OR SERVICES THE ORGANIZATION, WEBSITE, OR PRODUCT MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING PROFESSIONAL SERVICES. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR YOUR SITUATION. YOU SHOULD CONSULT WITH A SPECIALIST WHERE APPROPRIATE. FURTHER, READERS SHOULD BE AWARE THAT WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ. NEITHER THE PUBLISHER NOR AUTHORS SHALL BE LIABLE FOR ANY LOSS OF PROFIT OR ANY OTHER COMMERCIAL DAMAGES, INCLUDING BUT NOT LIMITED TO SPECIAL, INCIDENTAL, CONSEQUENTIAL, OR OTHER DAMAGES.
For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit https://hub.wiley.com/community/support/dummies.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2022950749
ISBN: 978-1-119-98369-9 (pbk); 978-1-119-98370-5 (ebk); 978-1-119-98371-2 (ebk)
Cover
Title Page
Copyright
Introduction
About This All-in-One
What You Can Safely Skip
Foolish Assumptions
Icons Used in This Book
Beyond This Book
Where to Go from Here
Book 1: Introducing R
Chapter 1: R: What It Does and How It Does It
The Statistical (and Related) Ideas You Just Have to Know
Getting R
Getting RStudio
A Session with R
R Functions
User-Defined Functions
Comments
R Structures
for Loops and if Statements
Chapter 2: Working with Packages, Importing, and Exporting
Installing Packages
Examining Data
R Formulas
More Packages
Exploring the tidyverse
Importing and Exporting
Book 2: Describing Data
Chapter 1: Getting Graphic
Finding Patterns
Doing the Basics: Base R Graphics, That Is
Kicking It Up a Notch to ggplot2
Putting a Bow On It
Chapter 2: Finding Your Center
Means: The Lure of Averages
Calculating the Mean
The Average in R: mean()
Medians: Caught in the Middle
The Median in R:
median()
Statistics à la Mode
The Mode in R
Chapter 3: Deviating from the Average
Measuring Variation
Back to the Roots: Standard Deviation
Standard Deviation in R
Chapter 4: Meeting Standards and Standings
Catching Some Zs
Standard Scores in R
Where Do You Stand?
Summarizing
Chapter 5: Summarizing It All
How Many?
The High and the Low
Living in the Moments
Tuning in the Frequency
Summarizing a Data Frame
Chapter 6: What’s Normal?
Hitting the Curve
Working with Normal Distributions
Meeting a Distinguished Member of the Family
Book 3: Analyzing Data
Chapter 1: The Confidence Game: Estimation
Understanding Sampling Distributions
An EXTREMELY Important Idea: The Central Limit Theorem
Confidence: It Has Its Limits!
Fit to a t
Chapter 2: One-Sample Hypothesis Testing
Hypotheses, Tests, and Errors
Hypothesis Tests and Sampling Distributions
Catching Some Z’s Again
Z Testing in R
t for One
t Testing in R
Working with t-Distributions
Visualizing t-Distributions
Testing a Variance
Working with Chi-Square Distributions
Visualizing Chi-Square Distributions
Chapter 3: Two-Sample Hypothesis Testing
Hypotheses Built for Two
Sampling Distributions Revisited
t
for Two
Like Peas in a Pod: Equal Variances
t
-Testing in
R
A Matched Set: Hypothesis Testing for Paired Samples
Paired Sample t-testing in R
Testing Two Variances
Working with
F
Distributions
Visualizing
F
Distributions
Chapter 4: Testing More than Two Samples
Testing More than Two
ANOVA in R
Another Kind of Hypothesis, Another Kind of Test
Getting Trendy
Trend Analysis in R
Chapter 5: More Complicated Testing
Cracking the Combinations
Two-Way ANOVA in R
Two Kinds of Variables … at Once
After the Analysis
Multivariate Analysis of Variance
Chapter 6: Regression: Linear, Multiple, and the General Linear Model
The Plot of Scatter
Graphing Lines
Regression: What a Line!
Linear Regression in R
Juggling Many Relationships at Once: Multiple Regression
ANOVA: Another Look
Analysis of Covariance: The Final Component of the GLM
But Wait — There’s More
Chapter 7: Correlation: The Rise and Fall of Relationships
Understanding Correlation
Correlation and Regression
Testing Hypotheses about Correlation
Correlation in
R
Multiple Correlation
Partial Correlation
Partial Correlation in
R
Semipartial Correlation
Semipartial Correlation in
R
Chapter 8: Curvilinear Regression: When Relationships Get Complicated
What Is a Logarithm?
What Is e?
Power Regression
Exponential Regression
Logarithmic Regression
Polynomial Regression: A Higher Power
Which Model Should You Use?
Chapter 9: In Due Time
A Time Series and Its Components
Forecasting: A Moving Experience
Forecasting: Another Way
Working with Real Data
Chapter 10: Non-Parametric Statistics
Independent Samples
Matched Samples
Correlation: Spearman’s
r
S
Correlation: Kendall’s Tau
A Heads-Up
Chapter 11: Introducing Probability
What Is Probability?
Compound Events
Conditional Probability
Large Sample Spaces
R Functions for Counting Rules
Random Variables: Discrete and Continuous
Probability Distributions and Density Functions
The Binomial Distribution
The Binomial and Negative Binomial in R
Hypothesis Testing with the Binomial Distribution
More on Hypothesis Testing: R versus Tradition
Chapter 12: Probability Meets Regression: Logistic Regression
Getting the Data
Doing the Analysis
Visualizing the Results
Book 4: Learning from Data
Chapter 1: Tools and Data for Machine Learning Projects
The UCI (University of California-Irvine) ML Repository
Introducing the
Rattle
package
Using
Rattle
with
iris
Chapter 2: Decisions, Decisions, Decisions
Decision Tree Components
Decision Trees in R
Decision Trees in
Rattle
Project: A More Complex Decision Tree
Suggested Project: Titanic
Chapter 3: Into the Forest, Randomly
Growing a Random Forest
Random Forests in R
Project: Identifying Glass
Suggested Project: Identifying Mushrooms
Chapter 4: Support Your Local Vector
Some Data to Work With
Separability: It’s Usually Nonlinear
Support Vector Machines in R
Project: House Parties
Chapter 5: K-Means Clustering
How It Works
K-Means Clustering in R
Project: Glass Clusters
Chapter 6: Neural Networks
Networks in the Nervous System
Artificial Neural Networks
Neural Networks in R
Project: Banknotes
Suggested Projects: Rattling Around
Chapter 7: Exploring Marketing
Analyzing Retail Data
Enter Machine Learning
Suggested Project: Another Data Set
Chapter 8: From the City That Never Sleeps
Examining the Data Set
Warming Up
Quick Suggested Project: Airline Names
Suggested Project: Departure Delays
Quick Suggested Project: Analyze Weekday Differences
Suggested Project: Delay and Weather
Book 5: Harnessing R: Some Projects to Keep You Busy
Chapter 1: Working with a Browser
Getting Your Shine On
Creating Your First shiny Project
Working with ggplot
Another shiny Project
Suggested Project
Chapter 2: Dashboards — How Dashing!
The shinydashboard Package
Exploring Dashboard Layouts
Working with the Sidebar
Interacting with Graphics
Index
About the Author
Connect with Dummies
End User License Agreement
Cover
Title Page
Copyright
Table of Contents
Begin Reading
Index
About the Author
i
ii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
93
94
95
96
97
98
99
100
101
103
104
105
106
107
108
109
110
111
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
In this book, I’ve brought together all the information you need to hit the ground running with R. It’s heavy on statistics, of course, because R’s creators built this language to analyze data.
So it’s necessary that you learn the foundations of statistics. Let me tell you at the outset: This All-in-One is not a cookbook. I’ve never taught statistics that way and I never will. Before I show you how to use R to work with a statistical concept, I give you a strong grounding in what that concept is all about.
In fact, Books 2 and 3 of this 5-book compendium are something like an introductory statistics text that happens to use R as a way of explaining statistical ideas.
Book 4 follows that path by teaching the ideas behind machine learning before you learn how to use R to implement them. Book 5 gives you a set of projects that give you a chance to exercise your newly minted R skill set.
Want some more details? Read on.
The volume you’re holding (or the e-book you’re viewing) consists of five books that cover a lot of the length and breadth of R.
As I said earlier in this introduction, R is a language that deals with statistics. Accordingly, Book 1 introduces you to the fundamental concepts of statistics that you just have to know in order to progress with R.
You then learn about R and RStudio, a widely used development environment for working with R. I begin by describing the rudiments of R code, and I discuss R functions and structures.
R truly comes alive when you use its specialized packages, which you learn about early on.
Part of working with statistics is to summarize data in meaningful ways. In Book 2, you find out how to do just that.
Most people know about averages and how to compute them. But that’s not the whole story. In Book 2, I tell you about additional descriptive statistics that fill in the gaps, and I show you how to use R to calculate and work with those statistics. You also learn to create graphics that visualize the data descriptions and analyses you encounter in Books 2 and 3.
Book 3 addresses the fundamental aim of statistical analysis: to go beyond the data and help you make decisions. Usually, the data are measurements of a sample taken from a large population. The goal is to use these data to figure out what’s going on in the population.
This opens a wide range of questions: What does an average mean? What does the difference between two averages mean? Are two things associated? These are only a few of the questions I address in Book 3, and you learn to use the R tools that help you answer them.
Effective machine learning model creation comes with experience. Accordingly, in Book 4 you gain experience by completing machine learning projects. In addition to the projects you complete along with me, I suggest additional projects for you to try on your own.
I begin by telling you about the University of California-Irvine Machine Learning Repository, which provides the data sets for most of the projects you encounter in Book 4.
To give you a gentle on-ramp into the field, I show you the Rattle package for creating machine learning applications. It’s a friendly interface to R’s machine learning functionality. I like Rattle a lot, and I think you will, too. You use it to learn about and work with decision trees, random forests, support vector machines, k-means clustering, and neural networks.
You also work with fairly large data sets — not the terabytes and petabytes data scientists work with, but large enough to get you started. In one project, you analyze a data set of more than 500,000 airline flights. In another, you complete a customer segmentation analysis of over 300,000 customers of an online retailer.
As its title suggests, Book 5 is also organized around projects.
In these projects, you create applications that respond to users. I show you the shiny package for working with web browsers and the shinydashboard package for creating dashboards.
All this is a little far afield from R’s original mission in life, but you get an idea of R’s potential to expand in new directions.
After you’ve worked with R for a while, maybe you can discover some of those new directions!
Any reference book throws a lot of information at you, and this one is no exception. I intended it all to be useful, but I didn’t aim it all at the same level. So if you’re not deeply into the subject matter, you can avoid paragraphs marked with the Technical Stuff icon, and you can also skip the sidebars.
I’m assuming that you
Know how to work with Windows or the Mac. I don’t go through the details of pointing, clicking, selecting, and so forth.
Can install R and RStudio (I show you how in
Book 1
) and follow along with the examples. I use the Windows version of RStudio, but you should have no problem if you’re working on a Mac.
As is the case in all For Dummies books, icons help guide you through your journey. Each one is a little picture in the margin that lets you know something special about the paragraph it’s next to.
This icon points out a hint or a shortcut that helps you in your work and makes you an all-around better person.
This one points out timeless wisdom to take with you as you continue on the path to enlightenment.
Pay attention to this icon. It’s a reminder to avoid something that might gum up the works for you.
As I mention in “What You Can Safely Skip,” this icon indicates material you can blow past if it’s just too technical. (I’ve kept this content to a minimum.)
In addition to what you’re reading right now, this book comes with a free, access-anywhere Cheat Sheet that will help you quickly use the tools I discuss. To find this Cheat Sheet, visit www.dummies.com and search for R All-in-One For Dummies Cheat Sheet in the Search box.
If you’ve read any of my earlier books, welcome back!
Time to hit the books! You can start from anywhere, but here are a couple of hints. Want to introduce yourself to R and packages? Book 1 is for you. Has it been a while (or maybe never?) since your last statistics course? Hit Book 2. For anything else, find it in the table of contents or in the index and go for it.
If you prefer to read from cover to cover, just turn the page… .
Book 1
Chapter 1: R: What It Does and How It Does It
The Statistical (and Related) Ideas You Just Have to Know
Getting R
Getting RStudio
A Session with R
R Functions
User-Defined Functions
Comments
R Structures
for Loops and if Statements
Chapter 2: Working with Packages, Importing, and Exporting
Installing Packages
Examining Data
R Formulas
More Packages
Exploring the tidyverse
Importing and Exporting