Statistical Analysis with R Essentials For Dummies - Joseph Schmuller - E-Book

Statistical Analysis with R Essentials For Dummies E-Book

Joseph Schmuller

0,0
10,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

The easy way to get started coding and analyzing data in the R programming language

Statistical Analysis with R Essentials For Dummies is your reference to all the core concepts about R—the widely used, open-source programming language and data analysis tool. This no-nonsense book gets right to the point, eliminating review material, wordy explanations, and fluff. Understand all you need to know about the foundations of R, swiftly and clearly. Perfect for a brush-up on the basics or as an everyday desk reference on the job, this is the reliable little book you can always turn to for answers.

  • Get a quick and thorough intro to the basic concepts of coding for data analysis in R
  • Review what you've already learned or pick up essential new skills
  • Perform statistical analysis for school, business, and beyond with R programming
  • Keep this concise reference book handy for jogging your memory as you work

This book is to the point, focusing on the key topics readers need to know about this popular programming language. Great for supplementing classroom learning, reviewing for a certification, or staying knowledgeable on the job.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 193

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Statistical Analysis with R Essentials For Dummies®

To view this book's Cheat Sheet, simply go to www.dummies.com and search for “Statistical Analysis with R Essentials For Dummies Cheat Sheet” in the Search box.

Table of Contents

Cover

Title Page

Copyright

Introduction

About This Book

Foolish Assumptions

Icons Used in This Book

Where to Go from Here

Chapter 1: Data, Statistics, and Decisions

The Statistical (and Related) Notions You Just Have to Know

Inferential Statistics: Testing Hypotheses

Chapter 2: Introducing R

Downloading R and RStudio

A Session with R

R Functions

User-Defined Functions

R Structures

for Loops and if Statements

Chapter 3: Digging Deeper Into R

Packages

More on Packages

R Formulas

Reading and Writing

Chapter 4: Finding Your Center

Means: The Lure of Averages

The Average in R: mean()

Medians: Caught in the Middle

The Median in R: median()

Statistics à la Mode

The Mode in R

Chapter 5: Deviating from the Average

Measuring Variation

Back to the Roots: Standard Deviation

Standard Deviation in R

Conditions, Conditions, Conditions …

Chapter 6: Standards, Standings, and Summaries

Catching Some Zs

Standard Scores in R

Where Do You Stand?

Creating Summaries

How Many?

The High and the Low

Summarizing a Data Frame

Chapter 7: What’s Normal?

Hitting the Curve

Distributions in R

A Distinguished Member of the Family

Chapter 8: The Confidence Game: Estimation

Understanding Sampling Distributions

An EXTREMELY Important Idea: The Central Limit Theorem

Confidence: It Has its Limits!

Fit to a t

Chapter 9: One-Sample Hypothesis Testing

Hypotheses, Tests, and Errors

Hypothesis Tests and Sampling Distributions

Catching Some Z's Again

Z Testing in R

t for One

t Testing in R

Working with t-Distributions

Chapter 10: Two-Sample Hypothesis Testing

Hypotheses Built for Two

Sampling Distributions Revisited

t for Two

Like Peas in a Pod: Equal Variances

t-Testing in R

A Matched Set: Hypothesis Testing for Paired Samples

Paired Sample t-testing in R

Chapter 11: Testing More Than Two Samples

Testing More Than Two

ANOVA in R

Another Kind of Hypothesis, Another Kind of Test

Getting Trendy

Trend Analysis in R

Chapter 12: Linear Regression

The Plot of Scatter

Regression: What a Line!

Testing Hypotheses about Regression

Linear Regression in R

Making Predictions

Chapter 13: Correlation: The Rise and Fall of Relationships

Understanding Correlation

Correlation and Regression

Testing Hypotheses About Correlation

Analyzing Correlation in R

Chapter 14: Ten Valuable Online Resources

R-bloggers

Posit

Quick-R

Stack Overflow

R Manuals

R Documentation

RDocumentation

YOU CANanalytics

Geocomputation with R

The R Journal

Index

About the Author

Connect with Dummies

End User License Agreement

List of Tables

Chapter 5

TABLE 5-1 The First Group of Heights and Their Deviations

TABLE 5-2 The Second Group of Heights and Their Deviations

TABLE 5-3 The Second Group of Heights and Their Squared Deviations

Chapter 10

TABLE 10-1 Sample Statistics from the FarKlempt Machine Study

TABLE 10-2 Data for the Weight-Loss Example

Chapter 11

TABLE 11-1 Data from Three Training Methods

TABLE 11-2 Data for the Weight-Loss Example

Chapter 12

TABLE 12-1 Aptitude Scores and Performance Scores for 16 FarMisht Consultants

TABLE 12-2 Aptitude Scores, Performance Scores, and Predicted Performance Scores...

Chapter 13

TABLE 13-1 Aptitude Scores and Performance Scores for 16 FarMisht Consultants

List of Illustrations

Chapter 1

FIGURE 1-1: The relationship between populations, samples, parameters, and stat...

Chapter 2

FIGURE 2-1: RStudio, immediately after you install it.

FIGURE 2-2: The RStudio Packages tab.

FIGURE 2-3: The RStudio Help tab.

FIGURE 2-4: RStudio, after you click icon in the upper-right corner of the Cons...

FIGURE 2-5: The RStudio Environment tab, after creating the vector

x

.

FIGURE 2-6: RStudio after creating and working with a vector.

FIGURE 2-7: The Quit R Session dialog box.

Chapter 3

FIGURE 3-1: The Help tab, showing information about the MASS package.

FIGURE 3-2: The

anorexia

data frame in the MASS package.

FIGURE 3-3: The Install Packages dialog box.

FIGURE 3-4: The Packages tab after installing

DataEditR

and putting it in the l...

FIGURE 3-5: The

anorexia

data frame, exported to an Excel spreadsheet.

FIGURE 3-6: The

anorexia

data frame as a tab-delimited text file.

Chapter 7

FIGURE 7-1: The bell curve.

FIGURE 7-2: The normal distribution of IQ, divided into standard deviations.

Chapter 8

FIGURE 8-1: Creating the sampling distribution of the mean.

FIGURE 8-2: The sampling distribution of the mean, partitioned into standard er...

FIGURE 8-3: The sampling distribution of the mean for the battery example.

FIGURE 8-4: The 95 percent confidence limits on the battery sampling distributi...

FIGURE 8-5: Some members of the t-distribution family.

Chapter 9

FIGURE 9-1: H

0

and H

1

each correspond to a sampling distribution.

Chapter 10

FIGURE 10-1: Creating the sampling distribution of the difference between means...

FIGURE 10-2: The sampling distribution of the difference between means, accordi...

FIGURE 10-3: The sampling distribution of the difference between means, along w...

Chapter 12

FIGURE 12-1: Aptitude and Performance at FarMisht Consulting.

FIGURE 12-2: The deviations in a scatterplot.

Chapter 13

FIGURE 13-1: Scatterplot of 16 FarMisht consultants, including the regression l...

FIGURE 13-2: One point in the scatterplot and its associated distances.

Guide

Cover

Table of Contents

Title Page

Copyright

Begin Reading

Index

About the Author

Pages

i

ii

1

2

3

4

5

6

7

8

9

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

37

38

39

40

41

42

43

44

45

46

47

49

50

51

52

53

54

55

57

58

59

60

61

62

63

64

65

67

68

69

70

71

72

73

74

75

76

77

78

79

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

147

148

149

150

151

152

153

154

155

156

157

158

159

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

Statistical Analysis with R Essentials For Dummies®

Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com

Copyright © 2024 by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.

For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit https://hub.wiley.com/community/support/dummies.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2024933673

ISBN 978-1-394-26342-4 (pbk); ISBN 978-1-394-26344-8 (ebk); ISBN 978-1-394-26343-1 (ebk)

Introduction

As the title indicates, this book covers the essentials of statistics and R. Although it’s designed to get you up and running in a hurry, and to quickly answer your questions, it’s not just a cookbook. Before I tell you about one of R’s features, I give you the statistical foundation it’s based on. My goal is that you understand that feature when you use it — and that you use it effectively.

In the proper context, R can be a great tool for learning statistics and for refreshing what you already know. I’ve tried to supply that context in this book.

About This Book

Although the development of statistics concepts proceeds in a logical way, I organized this book so you can open it up in any chapter and start reading. The idea is for you to quickly find what you’re looking for and use it immediately — whether it’s a statistical concept or an R feature.

On the other hand, cover-to-cover is okay if you’re so inclined. If you’re a statistics newbie and you have to use R to analyze your data, I recommend you begin at the beginning.

One caveat: I don’t cover R graphics. Although graphics are a key feature of R, I confined this book to statistics concepts and how R implements them.

Foolish Assumptions

I’m assuming:

You know how to work with Windows or the Mac. I don’t go through the details of pointing, clicking, selecting, and so forth.

You’ll be able to install R and RStudio (I show you how in

Chapter 2

), and follow along with the examples. I use the Windows version of RStudio, but you should have no problem if you’re working on a Mac.

Icons Used in This Book

Icons appear all over For Dummies books, and this one is no exception. Each one is a little picture in the margin that lets you know something special about the paragraph it’s next to.

This icon points out a hint or a shortcut that helps you in your work and makes you a finer, kinder, and more insightful human being.

This one points out timeless wisdom to take with you on your continuing quest for knowledge.

Pay attention to this icon. It’s a reminder to avoid something that might gum up the works for you.

Where to Go from Here

You can start the book anywhere, but here are a couple of hints. Want to learn the foundations of statistics? Turn the page. Introduce yourself to R? That’s Chapter 2. For anything else, find it in the Table of Contents or in the Index and go for it.

Chapter 1

Data, Statistics, and Decisions

IN THIS CHAPTER

Introducing statistical concepts

Generalizing from samples to populations

Testing hypotheses

Looking at two types of errors

Statistics, first and foremost, is about decision-making. Statisticians look at data and wonder what the numbers are saying.

R helps you crunch the data and compute the numbers. As a bonus, R can also help you comprehend statistical concepts.

Developed specifically for statistical analysis, R is a computer language that implements many of the analytical tools statisticians have developed for decision-making. I wrote this book to show how to use these tools in your work.

The Statistical (and Related) Notions You Just Have to Know

The analytical tools that R provides are based on statistical concepts in the remainder of this chapter. These concepts are based on common sense.

Samples and populations

If you watch TV on election night, you know that one of the main events is the prediction of the outcome immediately after the polls close (and before all the votes are counted).

The idea is to talk to a sample of voters right after they vote. If they’re truthful about how they marked their ballots, and if the sample is representative of the population of voters, analysts can use the sample data to draw conclusions about the population.

That, in a nutshell, is what statistics is all about — using the data from samples to draw conclusions about populations.

Here’s another example. Imagine that your job is to find the average height of 10-year-old children in the United States. Because you probably wouldn’t have the time or the resources to measure every child, you’d measure the heights of a representative sample. Then you’d average those heights and use that average as the estimate of the population average.

Estimating the population average is one kind of inference that statisticians make from sample data. I discuss inference in more detail in the upcoming section “Inferential Statistics: Testing Hypotheses.”

Here’s some important terminology: Properties of a population (like the population average) are called parameters, and properties of a sample (like the sample average) are called statistics. If your only concern is the sample properties (like the heights of the children in your sample), the statistics you calculate are descriptive. If you’re concerned about estimating the population properties, your statistics are inferential.

Now for an important convention about notation: Statisticians use Greek letters (μ, σ, ρ) to stand for parameters, and English letters (, s, r) to stand for statistics. Figure 1-1 summarizes the relationship between populations and samples, and between parameters and statistics.

FIGURE 1-1: The relationship between populations, samples, parameters, and statistics.

Variables: Dependent and independent

A variable is something that can take on more than one value — like your age, the value of the dollar against another currency, or the number of games your favorite sports team wins. Something that can have only one value is a constant. Scientists tell us that the speed of light is a constant, and we use the constant π to calculate the area of a circle.

Statisticians work with independent variables and dependent variables. In any study or experiment, you’ll find both kinds. Statisticians assess the relationship between them.

A dependent variable is what a researcher measures. In an experiment, an independent variable is what a researcher manipulates. In some contexts, a researcher can’t manipulate an independent variable. Instead, he notes naturally occurring values of the independent variable and how they affect a dependent variable.

In general, the objective is to find out whether changes in a dependent variable are associated with changes in an independent variable.

In examples that appear throughout this book, I show you how to use R to calculate characteristics of groups of scores, or to compare groups of scores. Whenever I show you a group of scores, I'm talking about the values of a dependent variable.

Types of data

When you do statistical work, you can run into four kinds of data. And when you work with a variable, the way you work with it depends on what kind of data it is:

The first kind is nominal data. If a set of numbers happens to be nominal data, the numbers are labels — their values don’t signify anything.

The next kind is ordinal data. In this data-type, the numbers are more than just labels. The order of the numbers is important. If I ask you to rank ten foods from the one you like best (one), to the one you like least (ten), we’d have a set of ordinal data.

But the difference between your third-favorite food and your fourth-favorite food might not be the same as the difference between your ninth-favorite and your tenth-favorite. This type of data lacks equal intervals and equal differences.

The third kind of data, interval, gives us equal differences. The Fahrenheit scale of temperature is a good example. The difference between 30° and 40° is the same as the difference between 90° and 100°. Each degree is an interval.

On the Fahrenheit scale, a temperature of 80° is not twice as hot as 40°. For ratio statements (“twice as much as,” “half as much as”) to make sense, “zero” has to mean the complete absence of the thing you’re measuring. A temperature of 0°F doesn’t mean the complete absence of heat — it’s just an arbitrary point on the Fahrenheit scale. (The same holds true for Celsius.)

The fourth kind of data, ratio, provides a meaningful zero point. On the Kelvin Scale of temperature, zero means “absolute zero,” where all molecular motion (the basis of heat) stops. So 200° Kelvin is twice as hot as 100° Kelvin. Another example is length. Eight inches is twice as long as four inches. “Zero inches” means “a complete absence of length.”

An independent variable or a dependent variable can be either nominal, ordinal, interval, or ratio. The analytical tools you use depend on the type of data you work with.

A little probability

When statisticians make decisions, they use probability to express their confidence about those decisions. They can never be absolutely certain about what they decide. They can only tell you how probable their conclusions are.

What do we mean by probability? In my experience, the best way to understand probability is with examples.

If you toss a coin, what’s the probability that it turns up heads? If the coin is fair, you might figure that you have a 50-50 chance of heads and a 50-50 chance of tails. And you’d be right. In terms of the kinds of numbers associated with probability, that’s ½.

Think about rolling a fair die (one member of a pair of dice). What’s the probability that you roll a 4? Well, a die has six faces and one of them is 4, so that’s ⅙.

Still another example: Select one card at random from a standard deck of 52 cards. What’s the probability that it’s a diamond? A deck of cards has four suits, so that’s ¼.

In general, the formula for the probability that a particular event occurs is

At the beginning of this section, I say that statisticians express their confidence about their conclusions in terms of probability, which is why I brought all this up in the first place. This line of thinking leads to conditional probability — the probability that an event occurs given that some other event occurs. Suppose that I roll a die, look at it (so that you don’t see it), and tell you that I rolled an odd number. What’s the probability that I’ve rolled a 5? Ordinarily, the probability of a 5 is ⅙, but “I rolled an odd number” narrows it down. That piece of information eliminates the three even numbers (2, 4, 6) as possibilities. Only the three odd numbers (1,3, 5) are possible, so the probability is ⅓.

What’s the big deal about conditional probability? What role does it play in statistical analysis? Read on.

Inferential Statistics: Testing Hypotheses

Before a statistician does a study, he draws up a tentative explanation — a hypothesis that tells why the data might come out a certain way. After gathering all the data, the statistician has to decide whether or not to reject the hypothesis.

That decision is the answer to a conditional probability question — what’s the probability of obtaining the data, given that this hypothesis is correct? Statisticians have tools that calculate the probability. If the probability turns out to be low, the statistician rejects the hypothesis.

Back to coin-tossing for an example: Imagine that you’re interested in whether a particular coin is fair — whether it has an equal chance of heads or tails on any toss. Let’s start with “The coin is fair” as the hypothesis.

To test the hypothesis, you’d toss the coin a number of times — let’s say, a hundred. These 100 tosses are the sample data. If the coin is fair (as per the hypothesis), you’d expect 50 heads and 50 tails.

If it’s 99 heads and 1 tail, you’d surely reject the fair-coin hypothesis: The conditional probability of 99 heads and 1 tail given a fair coin is very low. Of course, the coin could still be fair and you could, quite by chance, get a 99-1 split, right? Sure. You never really know. You have to gather the sample data (the 100 toss-results) and then decide. Your decision might be right, or it might not.

Null and alternative hypotheses

Think again about that coin-tossing study I just mentioned. The sample data are the results from the 100 tosses. I said that we can start with the hypothesis that the coin is fair. This starting point is called the null hypothesis