Introductory Statistics and Analytics - Peter C. Bruce - E-Book

Introductory Statistics and Analytics E-Book

Peter C. Bruce

0,0
60,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Concise, thoroughly class-tested primer that features basic statistical concepts in the concepts in the context of analytics, resampling, and the bootstrap A uniquely developed presentation of key statistical topics, Introductory Statistics and Analytics: A Resampling Perspective provides an accessible approach to statistical analytics, resampling, and the bootstrap for readers with various levels of exposure to basic probability and statistics. Originally class-tested at one of the first online learning companies in the discipline, www.statistics.com, the book primarily focuses on applications of statistical concepts developed via resampling, with a background discussion of mathematical theory. This feature stresses statistical literacy and understanding, which demonstrates the fundamental basis for statistical inference and demystifies traditional formulas. The book begins with illustrations that have the essential statistical topics interwoven throughout before moving on to demonstrate the proper design of studies. Meeting all of the Guidelines for Assessment and Instruction in Statistics Education (GAISE) requirements for an introductory statistics course, Introductory Statistics and Analytics: A Resampling Perspective also includes: * Over 300 "Try It Yourself" exercises and intermittent practice questions, which challenge readers at multiple levels to investigate and explore key statistical concepts * Numerous interactive links designed to provide solutions to exercises and further information on crucial concepts * Linkages that connect statistics to the rapidly growing field of data science * Multiple discussions of various software systems, such as Microsoft Office Excel®, StatCrunch, and R, to develop and analyze data * Areas of concern and/or contrasting points-of-view indicated through the use of "Caution" icons Introductory Statistics and Analytics: A Resampling Perspective is an excellent primary textbook for courses in preliminary statistics as well as a supplement for courses in upper-level statistics and related fields, such as biostatistics and econometrics. The book is also a general reference for readers interested in revisiting the value of statistics.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 487

Veröffentlichungsjahr: 2015

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Copyright

Preface

Book Website

Acknowledgments

Stan Blank

Michelle Everson

Robert Hayden

Introduction

If You Can't Measure it, You Can't Manage It

Phantom Protection from Vitamin E

Statistician, Heal Thyself

Identifying Terrorists in Airports

Looking Ahead in the Book

Resampling

Big Data and Statisticians

Chapter 1: Designing and Carrying Out a Statistical Study

1.1 A Small Example

1.2 Is Chance Responsible? The Foundation of Hypothesis Testing

1.3 A Major Example

1.4 Designing an Experiment

1.5 What to Measure—Central Location

1.6 What to Measure—Variability

1.7 What to Measure—Distance (Nearness)

1.8 Test Statistic

1.9 The Data

1.10 Variables and Their Flavors

1.11 Examining and Displaying the Data

1.12 Are we Sure we Made a Difference?

Appendix: Historical Note

1.13 EXERCISES

Chapter 2: Statistical Inference

2.1 Repeating the Experiment

2.2 How Many Reshuffles?

2.3 How Odd is Odd?

2.4 Statistical and Practical Significance

2.5 When to Use Hypothesis Tests

2.6 Exercises

Chapter 3: Displaying and Exploring Data

3.1 Bar Charts

3.2 Pie Charts

3.3 Misuse of Graphs

3.4 Indexing

3.5 Exercises

Chapter 4: Probability

4.1 Mendel's Peas

4.2 Simple Probability

4.3 Random Variables and their Probability Distributions

4.4 The Normal Distribution

4.5 Exercises

Chapter 5: Relationship Between Two Categorical Variables

5.1 Two-Way Tables

5.2 Comparing Proportions

5.3 More Probability

5.4 From Conditional Probabilities to Bayesian Estimates

5.5 Independence

5.6 Exploratory Data Analysis (EDA)

5.7 Exercises

Chapter 6: Surveys and Sampling

6.1 Simple Random Samples

6.2 Margin of Error: Sampling Distribution for a Proportion

6.3 Sampling Distribution for a Mean

6.4 A Shortcut—The Bootstrap

6.5 Beyond Simple Random Sampling

6.6 Absolute Versus Relative Sample Size

6.7 Exercises

Chapter 7: Confidence Intervals

7.1 Point Estimates

7.2 Interval Estimates (Confidence Intervals)

7.3 Confidence Interval for a Mean

7.4 Formula-Based Counterparts to the Bootstrap

7.5 Standard Error

7.6 Confidence Intervals for a Single Proportion

7.7 Confidence Interval for a Difference in Means

7.8 Confidence Interval for a Difference in Proportions

7.9 Recapping

Appendix A: More on the Bootstrap

Resampling Procedure—Parametric Bootstrap

Formulas and the Parametric Bootstrap

Appendix B: Alternative Populations

Appendix C: Binomial Formula Procedure

7.10 Exercises

Chapter 8: Hypothesis Tests

8.1 Review of Terminology

8.2 A–B Tests: The Two Sample Comparison

8.3 Comparing Two Means

8.4 Comparing Two Proportions

8.5 Formula-Based Alternative—

t

-Test for Means

8.6 The Null and Alternative Hypotheses

8.7 Paired Comparisons

Appendix A: Confidence Intervals Versus Hypothesis Tests

Confidence Interval

Relationship Between the Hypothesis Test and the Confidence Interval

Comment

Appendix B: Formula-Based Variations of Two-Sample Tests

Z

-Test With Known Population Variance

Pooled Versus Separate Variances

Formula-Based Alternative:

Z

-Test for Proportions

8.8 Exercises

Chapter 9: Hypothesis Testing—2

9.1 A Single Proportion

9.2 A Single Mean

9.3 More Than Two Categories or Samples

9.4 Continuous Data

9.5 Goodness-of-FIT

Appendix: Normal Approximation; Hypothesis Test of a Single Proportion

Confidence Interval for a Mean

9.6 Exercises

Chapter 10: Correlation

10.1 Example: Delta Wire

10.2 Example: Cotton Dust and Lung Disease

10.3 The Vector Product and Sum Test

10.4 Correlation Coefficient

10.5 Other Forms of Association

10.6 Correlation is not Causation

10.7 Exercises

Chapter 11: Regression

11.1 Finding the Regression Line by Eye

11.2 Finding the Regression Line by Minimizing Residuals

11.3 Linear Relationships

11.4 Inference for Regression

11.5 Exercises

Chapter 12: Analysis of Variance—ANOVA

12.1 Comparing More Than Two Groups: ANOVA

12.2 The Problem of Multiple Inference

12.3 A Single Test

12.4 Components of Variance

12.5 Two-Way ANOVA

12.6 Factorial Design

12.7 Exercises

Chapter 13: Multiple Regression

13.1 Regression as Explanation

13.2 Simple Linear Regression—Explore the Data First

13.3 More Independent Variables

13.4 Model Assessment and Inference

13.5 Assumptions

13.6 Interaction, Again

13.7 Regression for Prediction

13.8 Exercises

Index

End User License Agreement

Pages

ix

x

xi

xii

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

283

284

285

Guide

Cover

Table of Contents

Preface

Introduction

Chapter 1: Designing and Carrying Out a Statistical Study

List of Illustrations

Figure 1.1

Figure 1.2

Figure 1.3

Figure 1.4

Figure 1.5

Figure 1.6

Figure 1.7

Figure 1.8

Figure 1.9

Figure 2.1

Figure 2.2

Figure 2.3

Figure 3.1

Figure 3.2

Figure 3.3

Figure 3.4

Figure 3.5

Figure 3.6

Figure 3.7

Figure 3.8

Figure 3.10

Figure 4.2

Figure 4.3

Figure 4.4

Figure 4.5

Figure 4.6

Figure 5.1

Figure 5.2

Figure 6.1

Figure 6.2

Figure 6.3

Figure 6.4

Figure 6.5

Figure 6.6

Figure 6.7

Figure 7.1

Figure 7.2

Figure 7.3

Figure 7.4

Figure 7.5

Figure 7.6

Figure 7.7

Figure 7.8

Figure 7.9

Figure 7.10

Figure 7.11

Figure 8.1

Figure 8.2

Figure 8.3

Figure 8.4

Figure 8.5

Figure 8.6

Figure 8.7

Figure 8.8

Figure 8.9

Figure 8.10

Figure 8.11

Figure 8.12

Figure 8.13

Figure 9.1

Figure 9.2

Figure 9.3

Figure 9.4

Figure 9.5

Figure 10.1

Figure 10.2

Figure 10.3

Figure 10.4

Figure 10.5

Figure 10.6

Figure 10.7

Figure 10.8

Figure 10.9

Figure 10.10

Figure 11.1

Figure 11.2

Figure 11.3

Figure 11.4

Figure 11.5

Figure 11.6

Figure 11.7

Figure 11.8

Figure 11.9

Figure 11.10

Figure 11.11

Figure 11.12

Figure 11.13

Figure 12.1

Figure 12.2

Figure 12.3

Figure 12.4

Figure 12.5

Figure 12.6

Figure 12.7

Figure 12.8

Figure 13.1

Figure 13.2

Figure 13.3

Figure 13.4

Figure 13.5

Figure 13.6

Figure 13.7

Figure 13.8

Figure 13.9

Figure 13.10

Figure 13.11

Figure 13.12

Figure 13.13

Figure 13.14

List of Tables

Table 1.1

Table 1.2

Table 1.3

Table 1.4

Table 1.5

Table 1.6

Table 1.8

Table 1.9

Table 1.10

Table 1.11

Table 1.12

Table 1.13

Table 1.14

Table 2.1

Table 2.2

Table 3.1

Table 3.2

Table 3.3

Table 3.4

Table 3.5

Table 3.6

Table 3.7

Table 4.1

Table 4.2

Table 5.1

Table 5.2

Table 5.3

Table 5.4

Table 5.5

Table 5.6

Table 5.7

Table 5.8

Table 5.9

Table 5.10

Table 5.11

Table 5.12

Table 5.13

Table 5.14

Table 5.15

Table 5.16

Table 5.17

Table 5.18

Table 6.1

Table 7.1

Table 7.2

Table 8.1

Table 8.2

Table 8.3

Table 8.4

Table 8.5

Table 9.1

Table 9.2

Table 9.3

Table 9.4

Table 9.5

Table 10.1

Table 10.2

Table 12.1

Table 12.2

Table 12.3

Table 12.4

Table 12.5

Table 12.6

Table 13.1

Table 13.2

Table 13.3

Introductory Statistics and Analytics

A Resampling Perspective

 

 

 

Peter C. Bruce

Institute for Statistics Education

Statistics.com

Arlington, VA

 

 

 

 

Copyright © 2015 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data is available.

ISBN: 978-1-118-88135-4

Preface

This book was developed by Statistics.com to meet the needs of its introductory students, based on experience in teaching introductory statistics online since 2003. The field of statistics education has been in ferment for several decades. With this book, which continues to evolve, we attempt to capture three important strands of recent thinking:

Connection with the field of

data science

—an amalgam of traditional statistics, newer machine learning techniques, database methodology, and computer programming to serve the needs of large organizations seeking to extract value from “big data.”

Guidelines for the introductory statistics course, developed in 2005 by a group of noted statistics educators with funding from the American Statistical Association. These Guidelines for Assessment and Instruction in Statistics Education (GAISE) call for the use of real data with active learning, stress statistical literacy and understanding over memorization of formulas, and require the use of software to develop concepts and analyze data.

The use of resampling/simulation methods to develop the underpinnings of statistical inference (the most difficult topic in an introductory course) in a transparent and understandable manner.

We start off with some examples of statistics in action (including two of statistics gone wrong) and then dive right in to look at the proper design of studies and account for the possible role of chance. All the standard topics of introductory statistics are here (probability, descriptive statistics, inference, sampling, correlation, etc.), but sometimes, they are introduced not as separate standalone topics but rather in the context of the situation in which they are needed.

Throughout the book, you will see “Try It Yourself” exercises. The answers to these exercises are found at the end of each chapter after the homework exercises.

Book Website

Data sets, Excel worksheets, software information, and instructor resources are available at the book website: www.introductorystatistics.com

Peter C. Bruce

Acknowledgments

Stan Blank

The programmer for Resampling Stats, Stan has participated actively in many sessions of Statistics.com courses based on this work and has contributed well both to the presentation of regression and to the clarification and improvement of sections that deal with computational matters.

Michelle Everson

Michelle Everson, editor (2013) of the Journal of Statistics Education, has taught many sessions of the introductory sequence at Statistics.com and is responsible for the material on decomposition in the ANOVA chapter. Her active participation in the statistics education community has been an asset as we have strived to improve and perfect this text.

Robert Hayden

Robert Hayden has taught early sessions of this course and has written course materials that served as the seed from which this text grew. He was instrumental in getting this project launched.

In the beginning, Julian Simon, an early resampling pioneer, first kindled my interest in statistics with his permutation and bootstrap approach to statistics, his Resampling Stats software (first released in the late 1970s), and his statistics text on the same subject. Simon, described as an “iconoclastic polymath” by Peter Hall in his “Prehistory of the Bootstrap,” (Statistical Science, 2003, vol. 18, #2), is the intellectual forefather of this work.

Our Advisory Board—Chris Malone, William Peterson, and Jeff Witmer (all active in GAISE and the statistics education community in general) reviewed the overall concept and outline of this text and offered valuable advice.

Thanks go also to George Cobb, who encouraged me to proceed with this project and reinforced my inclination to embed resampling and simulation more thoroughly than what is found in typical college textbooks.

Meena Badade also teaches using this text and has also been very helpful in bringing to my attention errors and points requiring clarification and has helped to add the sections dealing with standard statistical formulas.

Kuber Deokar, Instructional Operations Supervisor at Statistics.com, and Valerie Troiano, the Registrar at STatisticscom, diligently and carefully shepherded the use of earlier versions of this text in courses at Statistics.com.

The National Science Foundation provided support for the Urn Sampler project, which evolved into the Box Sampler software used both in this course and for its early web versions. Nitin Patel, at Cytel Software Corporation, provided invaluable support and design assistance for this work. Marvin Zelen, an early advocate of urn-sampling models for instruction, shared illustrations that sharpened and clarified my thinking.

Many students at The Institute for Statistics Education at Statistics.com have helped me clarify confusing points and refine this book over the years.

Finally, many thanks to Stephen Quigley and the team at Wiley, who encouraged me and moved quickly on this project to bring it to fruition.

Introduction

As of the writing of this book, the fields of statistics and data science are evolving rapidly to meet the changing needs of business, government, and research organizations. It is an oversimplification, but still useful, to think of two distinct communities as you proceed through the book:

The traditional academic and medical

research communities

that typically conduct extended research projects adhering to rigorous regulatory or publication standards, and

Business and large organizations that use statistical methods to extract value from their data, often on the fly. Reliability and value are more important than academic rigor to this

data science community

.

If You Can't Measure it, You Can't Manage It

You may be familiar with this phrase or its cousin: if you can't measure it, you can't fix it. The two come up frequently in the context of Total Quality Management or Continuous Improvement programs in organizations. The flip side of these expressions is the fact that if you do measure something and make the measurements available to decision-makers, the something that you measure is likely to change.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!