23,99 €
There's nothing random about it--this is the book on statistical analysis with Excel Stunned by statistics? Exhausted by Excel? Relax! Statistical Analysis with Excel For Dummies, 4th Edition shows you how to use the world's most popular spreadsheet program to crunch numbers and interpret statistics--even if you've never taken a statistics or advanced math course. In no time, you'll learn to use Excel to create and translate statistics in everyday life, understand common statistical terms, and improve your classroom or professional skills. Statistics has a reputation for being a challenging, math-intensive pursuit--but it doesn't have to make your palms sweat. Using a minimum of equations and assuming no prior knowledge of statistics or Excel, this hands-on guide cuts through the jargon and shows you how to make sense of formulas and functions, charts and PivotTables, samples and normal distributions, probabilities and related distributions, trends and correlations, and much more. * Use Excel's tools to analyze and understand data * Apply statistical analysis to predict trends and make decisions * Interpret sales figures, gambling odds, and sports stats * Develop a grading curve or medical correlations Forget the mumbo jumbo! This guide shows you that statistical analysis with Excel can be easy, fun, and useful!
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 585
Veröffentlichungsjahr: 2016
Statistical Analysis with Excel® For Dummies®, 4th Edition
Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com
Copyright © 2016 by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. Excel is a registered trademark of Microsoft Corporation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.
For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit https://hub.wiley.com/community/support/dummies.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2016943716
ISBN: 978-1-119-27115-4; 978-1-119-27116-1 (ebk); 978-1-119-27117-8 (ebk)
Table of Contents
Cover
Introduction
About This Book
What You Can Safely Skip
Foolish Assumptions
How This Book Is Organized
Icons Used in This Book
Where to Go from Here
Part 1: Getting Started with Statistical Analysis with Excel: A Marriage Made in Heaven
Chapter 1: Evaluating Data in the Real World
The Statistical (and Related) Notions You Just Have to Know
Inferential Statistics: Testing Hypotheses
What’s New in Excel 2016?
What’s Old in Excel 2016?
Knowing the Fundamentals
What’s New in This Edition?
Chapter 2: Understanding Excel’s Statistical Capabilities
Getting Started
Setting Up for Statistics
Accessing Commonly Used Functions
Part 2: Describing Data
Chapter 3: Show and Tell: Graphing Data
Why Use Graphs?
Some Fundamentals
Excel’s Graphics (Chartics?) Capabilities
Becoming a Columnist
Slicing the Pie
Drawing the Line
Adding a Spark
Passing the Bar
The Plot Thickens
Finding Another Use for the Scatter Chart
Tasting the Bubbly
Taking Stock
Scratching the Surface
On the Radar
Growing a Treemap and Bursting Some Sun
Building a Histogram
Ordering Columns: Pareto
Of Boxes and Whiskers
3D Maps
Chapter 4: Finding Your Center
Means: The Lore of Averages
Medians: Caught in the Middle
Statistics à la Mode
Chapter 5: Deviating from the Average
Measuring Variation
Back to the Roots: Standard Deviation
Related Functions
Chapter 6: Meeting Standards and Standings
Catching Some Zs
Where Do You Stand?
Chapter 7: Summarizing It All
Counting Out
The Long and Short of It
Getting Esoteric
Tuning In the Frequency
Can You Give Me a Description?
Be Quick About It!
Instant Statistics
Chapter 8: What’s Normal?
Hitting the Curve
A Distinguished Member of the Family
Graphing a Standard Normal Distribution
Part 3: Drawing Conclusions from Data
Chapter 9: The Confidence Game: Estimation
Understanding Sampling Distributions
An EXTREMELY Important Idea: The Central Limit Theorem
The Limits of Confidence
Fit to a t
Chapter 10: One-Sample Hypothesis Testing
Hypotheses, Tests, and Errors
Hypothesis Tests and Sampling Distributions
Catching Some Z’s Again
t for One
Visualizing a t-Distribution
Testing a Variance
Visualizing a Chi-Square Distribution
Chapter 11: Two-Sample Hypothesis Testing
Hypotheses Built for Two
Revisited
t for Two
A Matched Set: Hypothesis Testing for Paired Samples
Testing Two Variances
Visualizing the F-Distribution
Chapter 12: Testing More Than Two Samples
Testing More Than Two
Another Kind of Hypothesis, Another Kind of Test
Chapter 13: Slightly More Complicated Testing
Cracking the Combinations
Cracking the Combinations Again
Two Kinds of Variables … at Once
Using Excel with a Mixed Design
Graphing the Results
After the ANOVA
Chapter 14: Regression: Linear and Multiple
The Plot of Scatter
Graphing Lines
Regression: What a Line!
Worksheet Functions for Regression
Data Analysis Tool: Regression
Juggling Many Relationships at Once: Multiple Regression
Excel Tools for Multiple Regression
Chapter 15: Correlation: The Rise and Fall of Relationships
Scatterplots Again
Understanding Correlation
Correlation and Regression
Testing Hypotheses About Correlation
Worksheet Functions for Correlation
Data Analysis Tool: Correlation
Data Analysis Tool: Covariance
Testing Hypotheses About Correlation
Chapter 16: It’s About Time
A Series and Its Components
A Moving Experience
How To Be a Smoothie, Exponentially
One-Click Forecasting!
Chapter 17: Non-Parametric Statistics
Independent Samples
Matched Samples
Correlation: Spearman’s r
S
A Heads-Up
Part 4: Probability
Chapter 18: Introducing Probability
What Is Probability?
Compound Events
Conditional Probability
Large Sample Spaces
Worksheet Functions
Random Variables: Discrete and Continuous
Probability Distributions and Density Functions
The Binomial Distribution
Worksheet Functions
Hypothesis Testing with the Binomial Distribution
The Hypergeometric Distribution
Chapter 19: More on Probability
Discovering Beta
Poisson
Working with Gamma
Exponential
Chapter 20: A Career in Modeling
Modeling a Distribution
A Simulating Discussion
Part 5: The Part of Tens
Chapter 21: Ten Statistical and Graphical Tips and Traps
Significant Doesn’t Always Mean Important
Trying to Not Reject a Null Hypothesis Has a Number of Implications
Regression Isn’t Always Linear
Extrapolating Beyond a Sample Scatterplot Is a Bad Idea
Examine the Variability Around a Regression Line
A Sample Can Be Too Large
Consumers: Know Your Axes
Graphing a Categorical Variable as Though It’s a Quantitative Variable Is Just Wrong
Whenever Appropriate, Include Variability in Your Graph
Be Careful When Relating Statistics Textbook Concepts to Excel
Chapter 22: Ten Things (Twelve, Actually) That Just Didn’t Fit in Any Other Chapter
Graphing the Standard Error of the Mean
Probabilities and Distributions
Drawing Samples
Testing Independence: The True Use of CHISQ.TEST
Logarithmica Esoterica
Sorting Data
Appendix A: When Your Worksheet Is a Database
Introducing Excel Databases
Counting and Retrieving
Arithmetic
Statistics
Pivot Tables
Appendix B: The Analysis of Covariance
Covariance: A Closer Look
Why You Analyze Covariance
How You Analyze Covariance
ANCOVA in Excel
And One More Thing
About the Author
Advertisement Page
Connect with Dummies
End User License Agreement
Cover
Table of Contents
Begin Reading
iii
iv
vii
viii
ix
x
xi
xii
xiii
xiv
xv
xvi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
351
352
353
354
355
356
357
358
359
360
361
363
364
365
366
367
368
369
370
371
372
373
374
375
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
487
488
489
490
491
492
493
494
495
496
497
498
499
500
529
530
531
532
533
534
535
536
What? Yet another statistics book? Well … this is a statistics book, all right, but in my humble (and thoroughly biased) opinion, it’s not just another statistics book.
What? Yet another Excel book? Same thoroughly biased opinion — it’s not just another Excel book. What? Yet another edition of a book that’s not just another statistics book and not just another Excel book? Well … yes. You got me there.
So here’s the story — for the previous three editions and for this one. Many statistics books teach you the concepts but don’t give you a way to apply them. That often leads to a lack of understanding. With Excel, you have a ready-made package for applying statistics concepts.
Looking at it from the opposite direction, many Excel books show you Excel’s capabilities but don’t tell you about the concepts behind them. Before I tell you about an Excel statistical tool, I give you the statistical foundation it’s based on. That way, you understand the tool when you use it — and you use it more effectively.
I didn’t want to write a book that’s just “select this menu” and “click this button.” Some of that is necessary, of course, in any book that shows you how to use a software package. My goal was to go way beyond that.
I also didn’t want to write a statistics “cookbook” — when-faced-with-problem-#310-use-statistical-procedure-#214. My goal was to go way beyond that, too.
Bottom line: This book isn’t just about statistics or just about Excel — it sits firmly at the intersection of the two. In the course of telling you about statistics, I cover every Excel statistical feature. (Well … almost. I left one out. I left it out of the first three editions, too. It’s called “Fourier Analysis.” All the necessary math to understand it would take a whole book, and you might never use this tool, anyway.)
Although statistics involves a logical progression of concepts, I organized this book so you can open it up in any chapter and start reading. The idea is for you to find what you’re looking for in a hurry and use it immediately — whether it’s a statistical concept or an Excel tool.
On the other hand, cover to cover is okay if you’re so inclined. If you’re a statistics newbie and you have to use Excel for statistical analysis, I recommend you begin at the beginning — even if you know Excel pretty well.
Any reference book throws a lot of information at you, and this one is no exception. I intend it all to be useful, but I don’t aim it all at the same level. So if you’re not deeply into the subject matter, you can avoid paragraphs marked with the Technical Stuff icon.
Every so often, you’ll run into sidebars. They provide information that elaborates on a topic, but they’re not part of the main path. If you’re in a hurry, you can breeze past them.
Because I wrote this book so you can open it up anywhere and start using it, step-by-step instructions appear throughout. Many of the procedures I describe have steps in common. After you go through some of the procedures, you can probably skip the first few steps when you come to a procedure you haven’t been through before.
This is not an introductory book on Excel or on Windows, so I’m assuming:
You know how to work with Windows. I don’t spell out the details of pointing, clicking, selecting, and so forth.
You have Excel 2016 installed on your Windows computer or on your Mac and you can work along with the examples. I don’t walk you through the steps of Excel installation.
You’ve worked with Excel, and you understand the essentials of worksheets and formulas.
If you don’t know much about Excel, consider looking into Greg Harvey’s excellent Excel books in the For Dummies series.
I’ve organized this book into five parts and four appendixes (including two that you can find on this book’s companion website at www.statisticalanalysiswexcel4e).
In Part 1, I provide a general introduction to statistics and to Excel’s statistical capabilities. I discuss important statistical concepts and describe useful Excel techniques. If it’s a long time since your last course in statistics or if you’ve never had a statistics course at all, start here. If you haven’t worked with Excel’s built-in functions (of any kind), definitely start here.
Part of statistics is to take sets of numbers and summarize them in meaningful ways. Here’s where you find out how to do that. We all know about averages and how to compute them. But that’s not the whole story. In this part, I tell you about additional statistics that fill in the gaps, and I show you how to use Excel to work with those statistics. I also introduce Excel graphics in this part.
Part 3 addresses the fundamental aim of statistical analysis: to go beyond the data and help decision-makers make decisions. Usually, the data are measurements of a sample taken from a large population. The goal is to use these data to figure out what’s going on in the population.
This opens a wide range of questions: What does an average mean? What does the difference between two averages mean? Are two things associated? These are only a few of the questions I address in Part 3, and I discuss the Excel functions and tools that help you answer them.
Probability is the basis for statistical analysis and decision-making. In Part 4, I tell you all about it. I show you how to apply probability, particularly in the area of modeling. Excel provides a rich set of built-in capabilities that help you understand and apply probability. Here’s where you find them.
Part 5 meets two objectives. First, I get to stand on the soapbox and rant about statistical peeves and about helpful hints. The peeves and hints total up to ten. Also, I discuss ten (okay, 12) Excel things I couldn’t fit into any other chapter. They come from all over the world of statistics. If it’s Excel and statistical, and if you can’t find it anywhere else in the book, you’ll find it here.
As I said in the first three editions — pretty handy, this Part of Tens.
In addition to performing calculations, Excel serves another purpose: recordkeeping. Although it’s not a dedicated database, Excel does offer some database functions. Some of them are statistical in nature. I introduce Excel database functions in Appendix A, along with pivot tables that allow you to turn your database inside out and look at your data in different ways.
The Analysis of Covariance (ANCOVA) is a statistical technique that combines two other techniques: analysis of variance and regression analysis. If you know how two variables are related, you can use that knowledge in some nifty ways, and this is one of the ways. The kicker is that Excel doesn’t have a built-in tool for ANCOVA — but I show you how to use what Excel does have so you can get the job done.
This appendix is all about importing data into Excel — from the web, from databases, from text, and from PDF documents.
Excel is terrific for managing, manipulating, and analyzing data. It’s also a great tool for helping people understand statistical concepts. This appendix covers some ways for using Excel to do just that.
As is the case with all For Dummies books, icons appear all over the place. Each one is a little picture in the margin that lets you know something special about the paragraph it’s next to.
This icon points out a hint or a shortcut that can help you in your work and make you an all-around better human being.
This one points out timeless wisdom to take with you long after you finish this book, young Jedi.
Pay attention to this icon. It’s a reminder to avoid something that might gum up the works for you.
As I mention earlier, in the section “What You Can Safely Skip,” this icon indicates material you can blow right past if statistics and Excel aren’t your passion.
You can start the book anywhere, but here are a few hints. Want to learn the foundations of statistics? Turn the page. Introduce yourself to Excel’s statistical features? That’s Chapter 2. Want to start with graphics? Hit Chapter 3. For anything else, find it in the table of contents or in the index and go for it.
In addition to what you’re reading right now, this book also comes with a free, access-anywhere Cheat Sheet that will help you quickly use the tools I discuss. To get this Cheat Sheet, visit www.dummies.com and search for “Statistical Analysis with Excel For Dummies Cheat Sheet” in the Search box. And don’t forget to check out the bonus content on this book’s companion website at www.dummies.com/go/statisticalanalysiswexcel4e.
Part 1
IN THIS PART …
Find out about Excel’s statistical capabilities
Explore how to work with populations and samples
Test your hypotheses
Understand errors in decision making
Determine independent and dependent variables
Chapter 1
IN THIS CHAPTER
Introducing statistical concepts
Generalizing from samples to populations
Getting into probability
Making decisions
New and old features in Excel 2016
Understanding important Excel fundamentals
The field of statistics is all about decision-making — decision-making based on groups of numbers. Statisticians constantly ask questions: What do the numbers tell us? What are the trends? What predictions can we make? What conclusions can we draw?
To answer these questions, statisticians have developed an impressive array of analytical tools. These tools help us to make sense of the mountains of data that are out there waiting for us to delve into, and to understand the numbers we generate in the course of our own work.
In advance of doing a study, a statistician draws up a tentative explanation — a hypothesis — as to why the data might come out a certain way. After the study is complete and the sample data are all tabulated, he or she faces the essential decision a statistician has to make: whether or not to reject the hypothesis.
That decision is wrapped in a conditional probability question — what’s the probability of obtaining the data, given that this hypothesis is correct? Statistical analysis provides tools to calculate the probability. If the probability turns out to be low, the statistician rejects the hypothesis.
Suppose you’re interested in whether or not a particular coin is fair — whether it has an equal chance of coming up heads or tails. To study this issue, you’d take the coin and toss it a number of times — say, 100. These 100 tosses make up your sample data. Starting from the hypothesis that the coin is fair, you’d expect that the data in your sample of 100 tosses would show around 50 heads and 50 tails.
If it turns out to be 99 heads and 1 tail, you’d undoubtedly reject the fair coin hypothesis. Why? The conditional probability of getting 99 heads and 1 tail given a fair coin is very low. Wait a second. The coin could still be fair and you just happened to get a 99-1 split, right? Absolutely. In fact, you never really know. You have to gather the sample data (the results from 100 tosses) and make a decision. Your decision might be right, or it might not.
Juries face this dilemma all the time. They have to decide among competing hypotheses that explain the evidence in a trial. (Think of the evidence as data.) One hypothesis is that the defendant is guilty. The other is that the defendant is not guilty. Jury members have to consider the evidence and, in effect, answer a conditional probability question: What’s the probability of the evidence given that the defendant is not guilty? The answer to this question determines the verdict.
Consider once again the coin-tossing study I mention in the preceding section. The sample data are the results from the 100 tosses. Before tossing the coin, you might start with the hypothesis that the coin is a fair one so that you expect an equal number of heads and tails. This starting point is called the null hypothesis. The statistical notation for the null hypothesis is H0. According to this hypothesis, any heads-tails split in the data is consistent with a fair coin. Think of it as the idea that nothing in the results of the study is out of the ordinary.
An alternative hypothesis is possible — that the coin isn’t a fair one, and it’s loaded to produce an unequal number of heads and tails. This hypothesis says that any heads-tails split is consistent with an unfair coin. The alternative hypothesis is called, believe it or not, the alternative hypothesis. The statistical notation for the alternative hypothesis is H1.
With the hypotheses in place, toss the coin 100 times and note the number of heads and tails. If the results are something like 90 heads and 10 tails, it’s a good idea to reject H0. If the results are around 50 heads and 50 tails, don’t reject H0.Similar ideas apply to the reading-speed example I give earlier, in the section “Samples and populations.” One sample of children receives reading instruction under a new method designed to increase reading speed, and the other learns via a traditional method. Measure the children’s reading speeds before and after instruction, and tabulate the improvement for each child. The null hypothesis, H0, is that one method isn’t different from the other. If the improvements are greater with the new method than with the traditional method — so much greater that it’s unlikely that the methods aren’t different from one another — reject H0. If they’re not greater, don’t reject H0.
Notice that I didn’t say “accept H0.” The way the logic works, you never accept a hypothesis. You either reject H0 or don’t reject H0.
Here’s a real-world example to help you understand this idea. When a defendant goes on trial, he or she is presumed innocent until proven guilty. Think of innocent as H0. The prosecutor’s job is to convince the jury to reject H0. If the jurors reject, the verdict is guilty. If they don’t reject, the verdict is not guilty. The verdict is never innocent. That would be like accepting H0.
Back to the coin-tossing example. Remember I said “around 50 heads and 50 tails” is what you could expect from 100 tosses of a fair coin. What does around mean? Also, I said if it’s 90-10, reject H0. What about 85-15? 80-20? 70-30? Exactly how much different from 50-50 does the split have to be for you reject H0? In the reading-speed example, how much greater does the improvement have to be to reject H0?
I won’t answer these questions now. Statisticians have formulated decision rules for situations like this, and you explore those rules throughout the book.
Whenever you evaluate the data from a study and decide to reject H0 or to not reject H0, you can never be absolutely sure. You never really know what the true state of the world is. In the context of the coin-tossing example, that means you never know for certain if the coin is fair or not. All you can do is make a decision based on the sample data you gather. If you want to be certain about the coin, you’d have to have the data for the entire population of tosses — which means you’d have to keep tossing the coin until the end of time.
Because you’re never certain about your decisions, it’s possible to make an error regardless of what you decide. As I mention earlier in this chapter, the coin could be fair and you just happen to get 99 heads in 100 tosses. That’s not likely, and that’s why you reject H0. It’s also possible that the coin is biased, yet you just happen to toss 50 heads in 100 tosses. Again, that’s not likely and you don’t reject H0 in that case.
Although not likely, those errors are possible. They lurk in every study that involves inferential statistics. Statisticians have named them Type I and Type II.
If you reject H0 and you shouldn’t, that’s a Type I error. In the coin example, that’s rejecting the hypothesis that the coin is fair, when in reality it is a fair coin.
If you don’t reject H0 and you should have, that’s a Type II error. That happens if you don’t reject the hypothesis that the coin is fair and in reality it’s biased.
How do you know if you’ve made either type of error? You don’t — at least not right after you make your decision to reject or not reject H0. (If it’s possible to know, you wouldn’t make the error in the first place!) All you can do is gather more data and see if the additional data are consistent with your decision.
If you think of H0 as a tendency to maintain the status quo and not interpret anything as being out of the ordinary (no matter how it looks), a Type II error means you missed out on something big. Looked at in that way, Type II errors form the basis of many historical ironies.
Here’s what I mean: In the 1950s, a particular TV show gave talented young entertainers a few minutes to perform on stage and a chance to compete for a prize. The audience voted to determine the winner. The producers held auditions around the country to find people for the show. Many years after the show went off the air, the producer was interviewed. The interviewer asked him if he had ever turned down anyone at an audition whom he shouldn’t have.
“Well,” said the producer, “once a young singer auditioned for us and he seemed really odd.”
“In what way?” asked the interviewer.
“In a couple of ways,” said the producer. “He sang really loud, gyrated his body and his legs when he played the guitar, and he had these long sideburns. We figured this kid would never make it in show business, so we thanked him for showing up, but we sent him on his way.”
“Wait a minute — are you telling me you turned down …?”
“That’s right. We actually said no … to Elvis Presley!”
Now that’s a Type II error.
Microsoft has made a few changes to Excel’s Ribbon (the tabbed band across the top), reflecting changes in Excel. The most obvious addition is the light bulb, at the top to the right of Add-ins. It’s labeled “Tell me what you want to do.” This is called the Tell Me box, and it’s a new way to connect to Excel Help. Type a phrase like Insert a chart into the Tell Me box, and Excel opens a menu whose choices include icons that you click to insert charts and to find help with inserting charts. Figure 1-2 shows this capability.
FIGURE 1-2: The interface in Excel 2016, showing the Tell Me box.
Sadly, this feature is not part of Excel 2016 for the Mac. This is the case for a number of other capabilities, too (like a couple I mention in the next paragraph). Overall, however, Mac users will find greater consistency across platforms than in previous editions.
Figure 1-2 shows the Insert tab, which incorporates a couple of changes in the Charts area. One addition is a set of Statistical Charts (which are not in the Mac version). Another is 3D Map, the new and improved Power View (which first appeared in Excel 2013 and will not be appearing in a Mac near you). I discuss these features in Chapter 3.
Each tab on the Ribbon presents groups of icon-labeled command buttons separated into categories. When you’re trying to figure out the capability a particular button activates, you can move the cursor to the button (without clicking) and helpful information pops up.
Clicking a button typically opens a whole category of possibilities. Buttons that do this are called category buttons.
Microsoft has developed shorthand for describing a mouse-click on a command button on the Ribbon, and I use that shorthand throughout this book. The shorthand is
Tab | Command Button
To indicate clicking on the Insert tab’s Recommended Charts category button, for example, I write
Insert | Recommended Charts
When I click that button (with some data-containing cells selected), the Insert Chart dialog box, shown in Figure 1-3, appears.
FIGURE 1-3: Clicking Insert | Recommended Charts opens this box.
Notice that its Recommended Charts tab is open. Clicking the All Charts tab (which is not in the Mac version) changes the box to what you see in Figure 1-4, a gallery of all possible Excel charts.
FIGURE 1-4: The All Charts tab in the Insert Chart dialog box.
Chart is Excel’s name for graph.
Incidentally, the All Charts tab shows five of the six charts new in Excel 2016: Waterfall, Treemap, Sunburst, Histogram, and Box & Whisker. (Pareto, the sixth new chart, is buried a bit deeper.) The last three are called “statistical charts. I cover statistical charts (and others) in Chapter 3.
To find the bulk of Excel’s statistical functionality, select
Formulas | More Functions | Statistical
This is an extension of the shorthand. It means, “Select the Formulas tab, click the More Functions button, and then select the Statistical Functions choice from the pop-up menu that opens.” Figure 1-5 shows what I mean.
FIGURE 1-5: Accessing the Statistical Functions menu.
In Chapter 2, I show you how to make the Statistical Functions menu more accessible.
In the 2010 version, Microsoft changed the way Excel names its functions. The objective was to make a function’s purpose as obvious as possible from its name. Excel also changed some of the programming behind these functions to make them more accurate.
Excel 2016 continues this naming style, and maintains the older statistical functions (pre-2010 vintage, and one – FORECAST – from 2013) for compatibility with older versions of Excel. So if you’re creating a spreadsheet for users of older Excel versions, use the older functions.
You won’t find them on the Statistical Functions menu. They have their own menu. To find it, select Formulas | More Functions | Compatibility.
I provide Table 1-1 to help you transition from older Excel versions. The table lists the old functions, their replacements, and the chapter in which I discuss the new function.
TABLE 1-1 Older Excel Statistical Functions, Their Replacements, and the Chapter That Deals with the New Function
Old Function
New Function
Chapter
BETADIST
BETA.DIST
19
BETAINV
BETA.INV
19
BINOMDIST
BINOM.DIST
18
CRITBINOM
BINOM.INV
18
CHIDIST
CHISQ.DIST.RT
10
CHIINV
CHISQ.INV.RT
10
CHITEST
CHISQ.TEST
20
CONFIDENCE
CONFIDENCE.NORM
9
COVAR
COVARIANCE.P
15
EXPONDIST
EXPON.DIST
19
FDIST
F.DIST.RT
11
FINV
F.INV.RT
11
FTEST
F.TEST
11
FORECAST
FORECAST.LINEAR, FORECAST.ETS, FORECAST.ETS.CONFINT, FORECAST.ETS.SEASONALITY, FORECAST.ETS.STAT
16
GAMMADIST
GAMMA.DIST
19
GAMMAINV
GAMMA.INV
19
HYPGEOMDIST
HYPGEOM.DIST
18
LOGNORMDIST
LOGNORM.DIST
22
LOGINV
LOGNORM.INV
22
MODE
MODE.SNGL, MODE.MULT
4
NEGBINOMDIST
NEGBINOM.DIST
18
NORMDIST
NORM.DIST
8
NORMINV
NORM.INV
8
NORMSDIST
NORM.S.DIST
8
NORMSINV
NORM.S.INV
8
PERCENTILE
PERCENTILE.INC
6
PERCENTRANK
PERCENTRANK.INC
6
POISSON
POISSON.DIST
19
QUARTILE
QUARTILE.INC
6
RANK
RANK.EQ
6
STDEVP
STDEV.P
5
STDEV
STDEV.S
5
TDIST
T.DIST.2T
10
TDIST
T.DIST.RT
10
TINV
T.INV.2T
9
TTEST
T.TEST
11
VARP
VAR.P
5
VAR
VAR.S
5
WEIBULL
WEIBULL.DIST
22
ZTEST
Z.TEST
10
The table shows that the FORECAST function has morphed into five functions in Excel 2016: FORECAST.LINEAR, FORECAST.ETS, FORECAST.ETS.CONFINT, FORECAST.ETS.SEASONALITY, and FORECAST.ETS.STAT. Along with Excel’s new one-click forecasting capability, I cover these functions in Chapter 16.
The most important addition in Excel 2016 is on the Macintosh side: After a long absence, the Analysis ToolPak returns to Excel 2016 for the Mac. Available in all Windows versions of Excel, the Analysis ToolPak is a free add-in that supplies analytic tools often found in dedicated statistical software packages. In previous Mac versions, intrepid users accessed a similar set of tools by downloading a third-party application that did not integrate with Excel in the same way as the Analysis ToolPak.
Mac users are a hearty lot, however, and they’ll be happy with this change in Excel 2016. (Have I done enough … Apple-polishing? Sorry.)
I cover the Analysis ToolPak in Chapter 2.
Although I’m assuming you’re not new to Excel, I think it’s wise to take a little time and space to discuss a few fundamental Excel principles that figure prominently in statistical work. Knowing these fundamentals helps you work efficiently with Excel formulas.
The first fundamental feature is autofill, Excel’s capability for repeating a calculation throughout a worksheet. Insert a formula into a cell, and you can drag that formula into adjoining cells.
Figure 1-6 is a worksheet of expenditures for R&D in science and engineering at colleges and universities for the years shown. The data, taken from a U.S. National Science Foundation report, are in millions of dollars. Column H holds the total for each field, and Row 11 holds the total for each year. (More about column I in a moment.)
FIGURE 1-6: Expenditures for R&D in science and engineering.
I started with column H blank and with row 11 blank. How did I get the totals into column H and row 11?
If I want to create a formula to calculate the first row total (for Physical Sciences), one way (among several) is to enter
into cell H2. (A formula always begins with an equal sign: =.) Press Enter and the total appears in H2.
Now, to put that formula into cells H3 through H10, the trick is to position the cursor on the lower-right corner of H2 until a plus sign (+) appears, hold down the left mouse button, and drag the mouse through the cells. That plus sign is called the cell’s fill handle.
When you finish dragging, release the mouse button and the row totals appear. This saves huge amounts of time because you don’t have to reenter the formula eight times.
Same thing with the column totals. One way to create the formula that sums up the numbers in the first column (1990) is to enter
=D2 + D3 + D4 + D5 + D6 + D7 + D8 + D9 + D10
into cell D11. Position the cursor on D11’s fill handle, drag through row 11 and release in column H, and you autofill the totals into E11 through H11.
Dragging isn’t the only way to do it. Another way is to select the array of cells you want to autofill (including the one that contains the formula) and click
Home | Fill
Where’s Fill? On the Home tab, in the Editing area, you see a down arrow. That’s Fill. Clicking Fill opens the Fill pop-up menu (see Figure 1-7). Select Down and you accomplish the same thing as dragging and dropping.
FIGURE 1-7: The Fill pop-up menu.
Still another way is to select Series from the Fill pop-up menu. Doing this opens the Series dialog box (see Figure 1-8). In this dialog box, select the AutoFill radio button and click OK, and you’re all set. This method takes one more step, but the Series dialog box is a bit more compatible with earlier versions of Excel.
FIGURE 1-8: The Series dialog box.
I bring this up because statistical analysis often involves repeating a formula from cell to cell. The formulas are usually more complex than the ones in this section, and you might have to repeat them many times, so it pays to know how to autofill.
A quick way to autofill is to click in the first cell in the series, move the cursor to that cell’s lower-right corner until the autofill handle appears, and double-click. This works in both PC and Mac.
Another important fundamental principle is the way Excel references worksheet cells. Consider again the worksheet in Figure 1-6. Each autofilled formula is slightly different from the original. This, remember, is the formula in cell H2:
After autofill, the formula in H3 is
and the formula in H4 is — well, you get the picture.
This is perfectly appropriate. You want the total in each row, so Excel adjusts the formula accordingly as it automatically inserts it into each cell. This is called relative referencing — the reference (the cell label) gets adjusted relative to where it is in the worksheet. Here, the formula directs Excel to total up the numbers in the cells in the four columns immediately to the left.
Now for another possibility. Suppose you want to know each row total’s proportion of the grand total (the number in H11). That should be straightforward, right? Create a formula for I2, and then autofill cells I3 through I10.
Similar to the earlier example, you start by entering this formula into I2:
=H2/H11
Press Enter and the proportion appears in I2. Position the cursor on the fill handle, drag through column I, release in I10, and — D’oh! Figure 1-9 shows the unhappy result — the extremely ugly #/DIV0! in I3 through I10. What’s the story?
FIGURE 1-9: Whoops! Incorrect autofill!
The story is this: Unless you tell it not to, Excel uses relative referencing when you autofill. So the formula inserted into I3 is not
=H3/H11
Instead, it’s
=H3/H12
Why does H11 become H12? Relative referencing assumes that the formula means “Divide the number in the cell by whatever number is nine cells south of here in the same column.” Because H12 has nothing in it, the formula is telling Excel to divide by zero, which is a no-no.
The idea is to tell Excel to divide all the numbers by the number in H11, not by “whatever number is nine cells south of here.” To do this, you work with absolute referencing. You show absolute referencing by adding $ signs to the cell ID. The correct formula for I2 is
This line tells Excel to not adjust the column and to not adjust the row when you autofill. Figure 1-10 shows the worksheet with the proportions, and you can see the correct formula in the formula bar (the area above the worksheet and below the Ribbon).
FIGURE 1-10: Autofill, based on absolute referencing.
To convert a relative reference into absolute reference format, select the cell address (or addresses) you want to convert and press the F4 key. F4 is a toggle that switches among relative reference (H11, for example), absolute reference for both the row and column in the address ($H$11), absolute reference for the row-part only (H$11), and absolute reference for the column-part only ($H11).
In Excel 2016 for the Mac, toggle a relative reference into an absolute reference by holding down the fn key when you press F4. Another Mac shortcut for this is Command + T.
One prominent new feature in this edition is my emphasis on graphs of distributions. In my experience, graphing a distribution helps you understand it. Because some distributions (t, Chi-Square, and F) form the basis of inferential statistics and other distributions (Poisson) are important in modeling, I felt it important to emphasize their visualization. These visualizations appear in Chapters 8, 10, 11, and 19.
Speaking of visualization, I cover some existing chart types for the first time in this edition: Bubble, Stock, Surface, and Radar. They’re in Chapter 3, along with the new charts I mention earlier.
In the previous edition, I added an online appendix on an analysis-of-variance design — mixed-model ANOVA — that doesn’t appear in the first two editions. In this edition, the material appears in Chapter 13. Because this is a widely used design, I thought it wise to include it in a chapter rather than in an online appendix.
The mixed-model ANOVA combines a Between-Groups variable and a Repeated Measure. If you have no idea what the preceding sentence means, read Chapter 12. Anyway, Excel doesn’t have a tool for working with this design, but in Chapter 13 I show you an Excel-based workaround that enables you to compute this analysis.
Chapter 16 is completely new. As I point out earlier, Excel has expanded its forecasting capabilities. Five new worksheet functions replace the old FORECAST function, and Excel has added one-click forecasting from historical data. This merits an entirely new chapter on time series.
Chapter 17 is also completely new. Its subject matter — nonparametric statistics — is an important branch of statistics. This is another area that has no dedicated Excel tools. After I discuss each subtopic, I show you how to apply Excel.
In the third edition, the section “For Mac Users” appears in many of the chapters. The absence of the Analysis ToolPak in Excel 2011 for the Mac (and the need for a third-party app to fill the void) necessitated this strategy. With the return of the Analysis ToolPak to Excel 2016 for the Mac, those sections are no longer necessary.