Using Statistics in the Social and Health Sciences with SPSS and Excel - Martin Lee Abbott - E-Book

Using Statistics in the Social and Health Sciences with SPSS and Excel E-Book

Martin Lee Abbott

0,0
111,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Provides a step-by-step approach to statistical procedures to analyze data and conduct research, with detailed sections in each chapter explaining SPSS® and Excel® applications This book identifies connections between statistical applications and research design using cases, examples, and discussion of specific topics from the social and health sciences. Researched and class-tested to ensure an accessible presentation, the book combines clear, step-by-step explanations for both the novice and professional alike to understand the fundamental statistical practices for organizing, analyzing, and drawing conclusions from research data in their field. The book begins with an introduction to descriptive and inferential statistics and then acquaints readers with important features of statistical applications (SPSS and Excel) that support statistical analysis and decision making. Subsequent chapters treat the procedures commonly employed when working with data across various fields of social science research. Individual chapters are devoted to specific statistical procedures, each ending with lab application exercises that pose research questions, examine the questions through their application in SPSS and Excel, and conclude with a brief research report that outlines key findings drawn from the results. Real-world examples and data from social and health sciences research are used throughout the book, allowing readers to reinforce their comprehension of the material. Using Statistics in the Social and Health Sciences with SPSS® and Excel® includes: * Use of straightforward procedures and examples that help students focus on understanding of analysis and interpretation of findings * Inclusion of a data lab section in each chapter that provides relevant, clear examples * Introduction to advanced statistical procedures in chapter sections (e.g., regression diagnostics) and separate chapters (e.g., multiple linear regression) for greater relevance to real-world research needs Emphasizing applied statistical analyses, this book can serve as the primary text in undergraduate and graduate university courses within departments of sociology, psychology, urban studies, health sciences, and public health, as well as other related departments. It will also be useful to statistics practitioners through extended sections using SPSS® and Excel® for analyzing data.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 851

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

COVER

TITLE PAGE

COPYRIGHT

DEDICATION

PREFACE

ACKNOWLEDGMENTS

CHAPTER 1: INTRODUCTION

Big Data Analysis

Visual Data Analysis

Importance of Statistics for the Social and Health Sciences and Medicine

Historical Notes: Early Use of Statistics

Approach of the Book

Cases from Current Research

Research Design

Focus on Interpretation

CHAPTER 2: DESCRIPTIVE STATISTICS: CENTRAL TENDENCY

What is the Whole Truth? Research Applications (Spuriousness)

Descriptive and Inferential Statistics

The Nature of Data: Scales of Measurement

Descriptive Statistics: Central Tendency

Using SPSS

®

and Excel to Understand Central Tendency

Distributions

Describing the Normal Distribution: Numerical Methods

Descriptive Statistics: Using Graphical Methods

Terms and Concepts

Data Lab and Examples (with Solutions)

Data Lab: Solutions

CHAPTER 3: DESCRIPTIVE STATISTICS: VARIABILITY

Range

Percentile

Scores Based on Percentiles

Using SPSS

®

and Excel to Identify Percentiles

Standard Deviation and Variance

Calculating the Variance and Standard Deviation

Population SD and Inferential SD

Obtaining SD from Excel and SPSS

®

Terms and Concepts

Data Lab and Examples (with Solutions)

Data Lab: Solutions

CHAPTER 4: THE NORMAL DISTRIBUTION

The Nature of the Normal Curve

The Standard Normal Score:

Z Score

The

Z

Score Table of Values

Navigating the

Z

Score Distribution

Calculating Percentiles

Creating Rules for Locating

Z

Scores

Calculating

Z

Scores

Working with Raw Score Distributions

Using SPSS

®

to Create

Z

Scores and Percentiles

Using Excel to Create

Z

Scores

Using Excel and SPSS

®

for Distribution Descriptions

Terms and Concepts

Data Lab and Examples (with Solutions)

Data Lab: Solutions

CHAPTER 5: PROBABILITY AND THE Z DISTRIBUTION

The Nature of Probability

Elements of Probability

Combinations and Permutations

Conditional Probability: Using Bayes' Theorem

Z

Score Distribution and Probability

Using SPSS

®

and Excel to Transform Scores

Using the Attributes of the Normal Curve to Calculate Probability

“Exact” Probability

From Sample Values to Sample Distributions

Terms and Concepts

Data Lab and Examples (with Solutions)

Data Lab: Solutions

CHAPTER 6: RESEARCH DESIGN AND INFERENTIAL STATISTICS

Research Design

Experiment

Non-Experimental or Post Facto Research Designs

Inferential Statistics

Z

Test

The Hypothesis Test

Statistical Significance

Practical Significance: Effect Size

Z

Test Elements

Using SPSS

®

and Excel for the

Z

Test

Terms and Concepts

Data Lab and Examples (with Solutions)

Data Lab: Solutions

CHAPTER 7: THE T TEST FOR SINGLE SAMPLES

Introduction

Z

Versus

T

: Making Accommodations

Research Design

Parameter Estimation

The

T

Test

The

T

Test: A Research Example

Interpreting the Results of the

T

Test for a Single Mean

The

T

Distribution

The Hypothesis Test for the Single Sample

T

Test

Type I and Type II Errors

Effect Size

Effect Size for the Single Sample

T

Test

Power, Effect Size, and Beta

One- and Two-Tailed Tests

Point and Interval Estimates

Using SPSS

®

and Excel with the Single Sample

T

Test

Terms and Concepts

Data Lab and Examples (with Solutions)

Data Lab: Solutions

CHAPTER 8: INDEPENDENT SAMPLE T TEST

A Lot of

T

s

Research Design

Experimental Designs and the Independent

T

Test

Dependent Sample Designs

Between and Within Research Designs

Using Different

T

Tests

Independent

T

Test: The Procedure

Creating the Sampling Distribution of Differences

The Nature of the Sampling Distribution of Differences

Calculating the Estimated Standard Error of Difference with Equal Sample Size

Using Unequal Sample Sizes

The Independent

T

Ratio

Independent

T

Test Example

Hypothesis Test Elements for the Example

Before–After Convention with the Independent

T

Test

Confidence Intervals for the Independent

T

Test

Effect Size

The Assumptions for the Independent

T

Test

SPSS

®

Explore for Checking the Normal Distribution Assumption

Excel Procedures for Checking the Equal Variance Assumption

SPSS

®

Procedure for Checking the Equal Variance Assumption

Using SPSS

®

and Excel with the Independent

T

Test

SPSS

®

Procedures for the Independent

T

Test

Excel Procedures for the Independent

T

Test

Effect Size for the Independent

T

Test Example

Parting Comments

Nonparametric Statistics: The Mann–Whitney

U

Test

Terms and Concepts

Data Lab and Examples (With Solutions)

Data Lab: Solutions

Graphics in the Data Summary

CHAPTER 9: ANALYSIS OF VARIANCE

A Hypothetical Example of ANOVA

The Nature of ANOVA

The Components of Variance

The Process of ANOVA

Calculating ANOVA

Effect Size

Post Hoc Analyses

Assumptions of ANOVA

Additional Considerations with ANOVA

The Hypothesis Test: Interpreting ANOVA Results

Are the Assumptions Met?

Using SPSS

®

and Excel with One-Way ANOVA

The Need for Diagnostics

Non-Parametric ANOVA Tests: The Kruskal–Wallis Test

Terms and Concepts

Data Lab and Examples (With Solutions)

Data Lab: Solutions

CHAPTER 10: FACTORIAL ANOVA

Extensions of Anova

Ancova

Manova

Mancova

Factorial Anova

Interaction Effects

Simple Effects

2XANOVA: An Example

Calculating Factorial ANOVA

The Hypotheses Test: Interpreting Factorial ANOVA Results

Effect Size for 2XANOVA: Partial

η

2

Discussing the Results

Using SPSS

®

to Analyze 2XANOVA

Summary Chart for 2XANOVA Procedures

Terms and Concepts

Data Lab and Examples (With Solutions)

Data Lab: Solutions

CHAPTER 11: CORRELATION

The Nature of Correlation

The Correlation Design

Pearson's Correlation Coefficient

Plotting the Correlation: The Scattergram

Using SPSS

®

to Create Scattergrams

Using Excel to Create Scattergrams

Calculating Pearson's

r

The

Z

Score Method

The Computation Method

The Hypothesis Test for Pearson's

r

Effect Size: the Coefficient of Determination

Diagnostics: Correlation Problems

Correlation Using SPSS® and Excel

Nonparametric Statistics: Spearman's Rank Order Correlation (

r

s

)

Terms and Concepts

Data Lab and Examples (with Solutions)

Data Lab: Solutions

CHAPTER 12: BIVARIATE REGRESSION

The Nature of Regression

The Regression Line

Calculating Regression

Effect Size of Regression

The

Z

Score Formula for Regression

Testing the Regression Hypotheses

The Standard Error of Estimate

Confidence Interval

Explaining Variance Through Regression

A Numerical Example of Partitioning the Variation

Using Excel and SPSS

®

with Bivariate Regression

The SPSS

®

Regression Output

The Excel Regression Output

Complete Example of Bivariate Linear Regression

Assumptions of Bivariate Regression

The Omnibus Test Results

Effect Size

The Model Summary

The Regression Equation and Individual Predictor Test of Significance

Advanced Regression Procedures

Detecting Problems in Bivariate Linear Regression

Terms and Concepts

Data Lab and Examples (with Solutions)

Data Lab: Solutions

CHAPTER 13: INTRODUCTION TO MULTIPLE LINEAR REGRESSION

The Elements of Multiple Linear Regression

Same Process as Bivariate Regression

Some Differences between Bivariate Linear Regression and Multiple Linear Regression

Stuff not Covered

Assumptions of Multiple Linear Regression

Analyzing Residuals to Check MLR Assumptions

Diagnostics for MLR: Cleaning and Checking Data

Extreme Scores

Distance Statistics

Influence Statistics

MLR Extended Example Data

Assumptions Met?

Analyzing Residuals: Are Assumptions Met?

Interpreting the SPSS

®

Findings for MLR

Entering Predictors Together as a Block

Entering Predictors Separately

Additional Entry Methods for MLR Analyses

Example Study Conclusion

Terms and Concepts

Data Lab and Example (with Solution)

Data Lab: Solution

CHAPTER 14: CHI-SQUARE AND CONTINGENCY TABLE ANALYSIS

Contingency Tables

The Chi-square Procedure and Research Design

Chi-square Design One: Goodness of Fit

A Hypothetical Example: Goodness of Fit

Effect Size: Goodness of Fit

Chi-square Design Two: The Test of Independence

A Hypothetical Example: Test of Independence

Special 2 × 2 Chi-square

Effect Size in 2 × 2 Tables: PHI

Cramer's

V

: Effect Size for the Chi-square Test of Independence

Repeated Measures Chi-square: Mcnemar Test

Using SPSS

®

and Excel with Chi-square

Using SPSS

®

for the Chi-square Test of Independence

Using Excel for Chi-square Analyses

Terms and Concepts

Data Lab and Examples (with Solutions)

Data Lab: Solutions

CHAPTER 15: REPEATED MEASURES PROCEDURES: Tdep AND ANOVAWS

Independent and Dependent Samples in Research Designs

Using Different

T

Tests

The Dependent

T

Test Calculation: The “Long” Formula

Example: The Long Formula

The Dependent

T

Test Calculation: The “Difference” Formula

T

dep

and Power

Conducting The

T

dep

Analysis Using SPSS®

Conducting The

T

dep

Analysis Using Excel

Within-Subject ANOVA (ANOVA

WS

)

Experimental Designs

Post Facto Designs

Within-Subject Example

Using SPSS

®

for Within-Subject Data

The SPSS

®

Procedure

The SPSS

®

Output

Nonparametric Statistics

Terms and Concepts

APPENDIX A: SPSS® BASICS

Using SPSS

®

General Features

Management Functions

Additional Management Functions

APPENDIX B: EXCEL BASICS

Data Management

The Excel Menus

Using Statistical Functions

Data Analysis Procedures

Missing Values and “0” Values in Excel Analyses

Using Excel with “Real Data”

APPENDIX C: STATISTICAL TABLES

REFERENCES

INDEX

END USER LICENSE AGREEMENT

Pages

xv

xvi

xvii

xix

1

2

3

4

5

6

7

8

9

10

11

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

549

550

551

552

553

555

556

557

558

559

560

561

562

Guide

Cover

Table of Contents

Preface

Begin Reading

List of Illustrations

CHAPTER 1: INTRODUCTION

Figure 1.1 William Playfair's pie chart.

Figure 1.2 John Snow's map showing deaths in the London cholera epidemic of 1854.

Figure 1.3 Florence Nightingale's polar chart comparing battlefield and nonbattlefield deaths.

CHAPTER 2: DESCRIPTIVE STATISTICS: CENTRAL TENDENCY

Figure 2.1 The possible spurious relationship between ice cream consumption and crime.

Figure 2.2 The BRFSS GENHLTH variable values.

Figure 2.3 Graph of bimodal distribution.

Figure 2.4 Descriptive frequencies menus in SPSS

®

.

Figure 2.5 Frequencies submenus in SPSS

®

.

Figure 2.6 SPSS

®

frequency output.

Figure 2.7 SPSS

®

Descriptive – Descriptives output.

Figure 2.8 Excel spreadsheet showing GENHLTH data.

Figure 2.9 Excel database showing “Data Analysis” submenu.

Figure 2.10 “Descriptive Statistics” drop box for calculating central tendency.

Figure 2.11 Descriptive statistics results worksheet.

Figure 2.12 Example of the normal distribution of values.

Figure 2.13 Illustration of a positively skewed distribution.

Figure 2.14

Figure 2.15 Excel output showing the Histogram specification and the data columns.

Figure 2.16 Excel output showing the Histogram specification and the data columns.

Figure 2.17 Excel chart output from the “INSERT” menu ribbon.

Figure 2.18 SPSS

®

procedure for creating the histogram.

Figure 2.19 SPSS

®

procedure for specifying the features for the histogram.

Figure 2.20 SPSS

®

histogram in the output file.

Figure 2.21 SPSS

®

histogram showing the distribution of JS scores.

Figure 2.22 SPSS

®

(Frequency) central tendency results for AP scores.

Figure 2.23 Excel (Frequency) central tendency results for AP scores.

Figure 2.24 SPSS

®

histogram of AP scores.

Figure 2.25 Excel column chart of AP scores.

CHAPTER 3: DESCRIPTIVE STATISTICS: VARIABILITY

Figure 3.1 The characteristics of the range.

Figure 3.2 The uneven scale of percentile scores.

Figure 3.3 Specifying a percentile with SPSS

®

“Frequencies: Statistics” functions.

Figure 3.4 SPSS

®

output for percentile calculation.

Figure 3.5 Using the Excel functions to create percentiles.

Figure 3.6 Using the specification window for PERCENTILE.EXC.

Figure 3.7 The components of the SD.

Figure 3.8 The Excel descriptive statistics output for AP scores.

Figure 3.9 Using the Excel functions to calculate the “actual” SD.

Figure 3.10 Using the “Descriptives” menus in SPSS

®

.

Figure 3.11 The Descriptive Statistics output from SPSS

®

.

Figure 3.12 The AP score histogram from SPSS

®

.

Figure 3.13

Figure 3.14

Figure 3.15

Figure 3.16 The SPSS

®

histogram output for Problem 2.

CHAPTER 4: THE NORMAL DISTRIBUTION

Figure 4.1 The normal curve with known properties.

Figure 4.2 The location of

z

= (−)1.96.

Figure 4.3 The (partial)

Z

Score Table of Values.

Figure 4.4 Using the

Z

Score Table to identify the percent of the distribution below a

z

score.

Figure 4.5 Using the

Z

Score Table of Values to identify the percent of the distribution below

z

= −1.96.

Figure 4.6 Identifying the area between

z

scores.

Figure 4.7 Identifying the tabled values of

z

scores of −1.96 and −1.35.

Figure 4.8 Subtracting the areas to identify the given area of the distribution.

Figure 4.9 Visualizing the area between two

z

scores.

Figure 4.10 An example AP raw score distribution.

Figure 4.11 The histogram of AP score values (from Figure 3.12).

Figure 4.12 Using SPSS

®

to create

z

scores.

Figure 4.13 Creating a

z

score variable using SPSS

®

descriptive menu.

Figure 4.14 Using the SPSS

®

Frequencies menu to locate percentiles.

Figure 4.15 Using the SPSS

®

Frequencies output to locate percentiles.

Figure 4.16 The Function Argument STANDARDIZE with AP score data.

Figure 4.17 Entering formulas directly in Excel using the enter formula (“=”) key.

Figure 4.18 Entering formulas by dragging a formula to other values in a spreadsheet.

Figure 4.19 Using the Excel NORM.S.DIST function.

Figure 4.20 Using the Excel NORM.DIST function.

Figure 4.21 Using SPSS

®

to create

z

scores.

Figure 4.22 Using the Excel Standardize function to create

z

scores (using SD population).

CHAPTER 5: PROBABILITY AND THE Z DISTRIBUTION

Figure 5.1 Specifying combinations using Excel “COMBIN” in the “Math & Trig” formulas.

Figure 5.2 Specifying permutations using Excel “PERMUT” from “Statistical” functions.

Figure 5.3 Visualizing the transformation of a percentile to a

z

score.

Figure 5.4 Visualizing the 67th percentile of the raw score test distribution.

Figure 5.5 Using the Compute Variable menu in SPSS

®

to create

z

scores.

Figure 5.6 SPSS

®

data file with two

z

score variables.

Figure 5.7 Visualizing the probabilities as a preliminary solution.

Figure 5.8 Visualizing the middle 90% area of the test scores.

Figure 5.9 Visualizing the excluded 5% of the test score values.

Figure 5.10 Using the Excel NORM.DIST function for exact probabilities (probability density function).

Figure 5.11 Using the Excel dragging capability to calculate probability density values.

Figure 5.12 Excel NORM.DIST example for calculating exact probabilities.

Figure 5.13 Estimating an exact probability using the

z

score table.

Figure 5.14 Using NORM.DIST to identify the exact probability for 10.

CHAPTER 6: RESEARCH DESIGN AND INFERENTIAL STATISTICS

Figure 6.1 The process of social research.

Figure 6.2 The nature of experimental designs.

Figure 6.3 The sampling process for inferential statistics.

Figure 6.4 Creating a sampling distribution of means.

Figure 6.5 The nature of the sampling distribution.

Figure 6.6 Using the sampling distribution to “locate” the sample mean.

Figure 6.7 Using the sampling distribution and the standard error of the mean.

Figure 6.8 Using the sampling distribution to make a statistical decision.

Figure 6.9 The Excel

Z

test formula specification menu.

CHAPTER 7: THE T TEST FOR SINGLE SAMPLES

Figure 7.1 The “one-shot case study.”

Figure 7.2 Using the sampling distribution with estimated population values.

Figure 7.3 Estimating the population standard deviation.

Figure 7.4 The

T

Test elements used to compare a sample mean to all possible means from a population.

Figure 7.5 Understanding the concept of degrees of freedom.

Figure 7.6 Excel descriptive statistics for QL.

Figure 7.7 SPSS

®

descriptive statistics for QL.

Figure 7.8 Transforming the sample mean value to a value on the sampling distribution.

Figure 7.9 The nature of the

T

distribution.

Figure 7.10 The

T

test exclusion areas.

Figure 7.11 Visualizing beta and nonbeta areas.

Figure 7.12 The two-tailed test.

Figure 7.13 One- and two-tailed exclusion values.

Figure 7.14 Confidence interval values for the QL example.

Figure 7.15 Comparing CI

0.95

and CI

0.99

.

Figure 7.16 Selecting the single sample

t

test in SPSS

®

.

Figure 7.17 Specifying the single sample

T

test in SPSS

®

.

Figure 7.18 The SPSS

®

output tables with

t

test results.

Figure 7.19 The SPSS

®

output tables with

t

test results.

Figure 7.20 The T.DIST.2T function in Excel.

Figure 7.21 The Excel descriptive data for the sample group Overall scores.

Figure 7.22 The SPSS

®

results of the single sample

t

test.

CHAPTER 8: INDEPENDENT SAMPLE T TEST

Figure 8.1 The research design with randomness and two comparison groups.

Figure 8.2 Using dependent sample measures in experimental designs.

Figure 8.3 Using matched groups in experimental designs.

Figure 8.4 Example of experimental research design using

T

test with two groups.

Figure 8.5 The post facto comparison for independent

t

test.

Figure 8.6 The independent sample

T

test process.

Figure 8.7 All possible pairs of samples.

Figure 8.8 The sampling distribution of differences created from pairs of samples.

Figure 8.9 Symbols in the distribution of differences.

Figure 8.10 The statistical decision for the FOP example question.

Figure 8.11 Accessing the SPSS

®

Explore procedure.

Figure 8.12 Specifying the variables to check the normality assumption.

Figure 8.13 Explore output for assessing normal distribution of sample groups.

Figure 8.14 The Excel equal variance test menu.

Figure 8.15 The Excel output for the two-sample variance test.

Figure 8.16 The

F

distribution and exclusion area.

Figure 8.17 Specifying the variables of interest for equality of variances.

Figure 8.18 Specifying Levene's test for equality of variances.

Figure 8.19 The SPSS

®

output assessing equality of variance.

Figure 8.20 The SPSS

®

menus for the independent sample

T

test.

Figure 8.21 The Independent-Samples

T

Test callout window for specifying the analysis.

Figure 8.22 Specifying group values for the independent

T

test.

Figure 8.23 The SPSS

®

output for the independent

T

test.

Figure 8.24 The Excel specifications for the independent

T

test.

Figure 8.25 The Excel callout window for locating data.

Figure 8.26 The Excel output for the independent

T

test with equal variances.

Figure 8.27 The SPSS

®

options for the Mann–Whitney

U

Test.

Figure 8.28 The SPSS

®

specification for the Mann–Whitney

U

Test.

Figure 8.29 The SPSS

®

output for the Mann–Whitney

U

Test.

Figure 8.30 Comparison oftraining groups on patient satisfaction.

Figure 8.31 Comparison of training groups on patient satisfaction (SPSS

®

).

CHAPTER 9: ANALYSIS OF VARIANCE

Figure 9.1 The four groups in the noise-learning experiment.

Figure 9.2 The paired comparisons in the experiment with four groups.

Figure 9.3 The components of variance in ANOVA.

Figure 9.4 Different ANOVA possibilities.

Figure 9.5 ANOVA possibilities of groups with different within variances.

Figure 9.6 Venn diagram showing effect size.

Figure 9.7 Post hoc test possibilities.

Figure 9.8 The post hoc summary for the example.

Figure 9.9 The descriptive output (Excel) to test the normal distribution assumption.

Figure 9.10 SPSS

®

descriptive output for normal distribution assumption.

Figure 9.11 SPSS

®

graphs for FR groups.

Figure 9.12 The SPSS

®

means procedure.

Figure 9.13 Three indicators of health by 6, 7, or 8 hours sleep.

Figure 9.14 The SPSS

®

menu options for accessing the one-way ANOVA.

Figure 9.15 The one-way ANOVA specification windows.

Figure 9.16 The post hoc choices from SPSS

®

one-way ANOVA.

Figure 9.17 Options for SPSS one-way ANOVA.

Figure 9.18 The descriptives report in SPSS

®

one-way ANOVA.

Figure 9.21 The Tukey post hoc results in the SPSS

®

one-way ANOVA procedure.

Figure 9.19 The Levene's test results in SPSS

®

one-way ANOVA.

Figure 9.20 The SPSS

®

one-way ANOVA summary table.

Figure 9.22 The single factor ANOVA menu option in Excel.

Figure 9.23 The Excel single factor ANOVA output.

Figure 9.24 The specification window for the Kruskal–Wallis test.

Figure 9.25 The output for the Kruskal–Wallis test.

Figure 9.26 The data to check for normal distribution.

Figure 9.27 The ANOVA results for Problem 2.

Figure 9.28 The post hoc analysis for the ANOVA result.

CHAPTER 10: FACTORIAL ANOVA

Figure 10.1 Main effects analyses in 2XANOVA.

Figure 10.2 The interaction effect of sex and noise conditions.

Figure 10.3 Ordinal interaction patterns compared to no interaction.

Figure 10.4 The interaction graph for the 2XANOVA example (sex by sleep).

Figure 10.5 The SPSS

®

data file for the 2XANOVA example.

Figure 10.6 The SPSS

®

2XANOVA menus.

Figure 10.7 The SPSS

®

menus for specifying the 2XANOVA procedure.

Figure 10.8 The Plots window that specifies the results graph.

Figure 10.9 The “Post Hoc” window specifying the Tukey analysis for sleep.

Figure 10.10 The choices in the “Univariate: Options” window.

Figure 10.11 The SPSS

®

2XANOVA summary table.

Figure 10.12 The SPSS

®

simple effects table for levels of schools on subject areas.

Figure 10.13 The simple effects graph for the 2XANOVA example (sleep by sex).

Figure 10.14 shows the other simple effects analysis. In this table, both the univariate tests for sleep levels within sex categories are significant (male:

F

= 8.967,

p

= 0.004; female:

F

= 7.672,

p

= 0.007). This analysis indicates that the different sleep levels appear to show significantly different health scores within male and female categories.

3

Separate tables in the SPSS output allow you to pinpoint the pairwise differences.

Figure 10.15 The 2XANOVA procedure chart.

Figure 10.16 The simple effects analyses for condition.

Figure 10.17 Simple effects plot for condition.

Figure 10.18 Simple effects analyses for provider.

Figure 10.19 Simple effects plot for provider.

CHAPTER 11: CORRELATION

Figure 11.1 Examples of Pearson's

r

values.

Figure 11.2 The scattergram between health and income.

Figure 11.3 Correlation patterns in scattergrams.

Figure 11.4 Strength of correlations in the scattergram.

Figure 11.5 The SPSS

®

graph menu for creating scattergrams.

Figure 11.6 The Scatter/Dot menus.

Figure 11.7 The scattergram specification window in SPSS

®

.

Figure 11.8 The Excel scattergram specification.

Figure 11.9 The Excel scattergram.

Figure 11.10 The

Z

score scattergram in Excel.

Figure 11.11 The effect size of correlation–explaining variance.

Figure 11.12 The effect size components produced by correlation.

Figure 11.13 The correlation problem of restricted range.

Figure 11.14 The correlation conditions of homoscedasticity and heteroscedasticity.

Figure 11.15 The SPSS

®

descriptive output for the study variables.

Figure 11.16 The Excel descriptive statistics.

Figure 11.17 The SPSS

®

histogram for HealthOP.

Figure 11.18 The Excel histogram for HealthOP.

Figure 11.19 The Excel scattergram between IncomeOP and HealthOP.

Figure 11.20 The SPSS

®

Correlation menu.

Figure 11.21 The Correlation specification window.

Figure 11.22 The SPSS

®

correlation matrix.

Figure 11.23 The “Correlation” window in the Excel Data – Data Analysis menu.

Figure 11.24 The “Correlation” specification window in Excel.

Figure 11.25 The Excel correlation matrix.

Figure 11.26 Funding for study hospitals.

Figure 11.27 The Spearman's Rho correlation between the study variables.

Figure 11.28 The Pearson's

r

correlation between the study variables.

Figure 11.29 Descriptive findings for study variables.

Figure 11.30 The histogram for community involvement.

Figure 11.31 The histogram for job opportunities.

Figure 11.32 The tests of normality for the study variables.

Figure 11.33 The scattergram for the study variables.

Figure 11.34 The correlation findings for the study variables.

CHAPTER 12: BIVARIATE REGRESSION

Figure 12.1 The regression line for job opportunities and community involvement.

Figure 12.2 The effect of correlation on prediction accuracy.

Figure 12.3 The lack of meaningful prediction with no significant correlation.

Figure 12.4 The scattergram between income class and healthy days.

Figure 12.5 The formula in pieces.

Figure 12.6 The completed regression formula.

Figure 12.7 Using the regression formula to predict a value of

Y

at

X

= 4.

Figure 12.8 The SPSS

®

Z

score scattergram of income class and healthy days.

Figure 12.9 The elements of the

Y

variance.

Figure 12.10 The regression options in SPSS

®

.

Figure 12.11 The SPSS

®

regression specification windows.

Figure 12.12 The ANOVA table providing data for the omnibus test.

Figure 12.13 The SPSS

®

model summary results panel.

Figure 12.14 The SPSS

®

coefficients panel results for the bivariate regression.

Figure 12.15 The Excel regression specification window.

Figure 12.16 The Excel regression Statistics output.

Figure 12.19 The Excel predicted values and residuals for the study data.

Figure 12.20 The SPSS

®

descriptive summaries of the study variables.

Figure 12.21 The Excel descriptive summaries for the study variables.

Figure 12.22 The scattergram between recreational opportunities and community involvement.

Figure 12.23 The Curve Estimation procedure in SPSS

®

.

Figure 12.24 The Curve Estimation procedure for recreational opportunities and community involvement.

Figure 12.25 The curve estimation model summary.

Figure 12.26 The quadratic coefficients analysis for Curve Estimation.

Figure 12.27 The SPSS

®

omnibus test results.

Figure 12.28 The SPSS

®

model summary results for recreational opportunities–community involvement study.

Figure 12.29 The SPSS

®

coefficients output for the recreational opportunities–community involvement study.

Figure 12.30 The Excel regression output for the reading assessment – FR study.

Figure 12.31 The multiple correlation relationship in the fictitious study example.

Figure 12.32 Sk and Ku data for the two study variables.

Figure 12.33 Tests of normality are nonsignificant for the study variables.

Figure 12.34 The scattergram of the study variables.

Figure 12.35 Model summary comparisons for linear and quadratic equations.

Figure 12.36 The scattergram of linear and quadratic regression equations.

Figure 12.37 Correlation matrix indicating a significant correlation.

Figure 12.38 The model summary indicating the correlation and squared correlation (effect size).

Figure 12.39 The omnibus test result from the ANOVA result.

Figure 12.40 The coefficients findings summary.

CHAPTER 13: INTRODUCTION TO MULTIPLE LINEAR REGRESSION

Figure 13.1 The SPSS

®

report of diagnostic values for the study data.

Figure 13.2 The SPSS

®

specification menu for linear regression.

Figure 13.3 The SPSS

®

specification menu for diagnostic values.

Figure 13.4 The scattergram of the study variables.

Figure 13.5 The histogram for the dependent variable in the study.

Figure 13.6 The SPSS

®

descriptives report showing skewness and kurtosis findings.

Figure 13.7 The histogram of standardized residuals from the study.

Figure 13.8 The

P–P

plot of standardized residuals from the study.

Figure 13.9 The scatterplot between standardized residuals and predicted values.

Figure 13.10 The specification menus for SPSS

®

MLR.

Figure 13.11 The SPSS

®

MLR specification for assumptions and individual predictors.

Figure 13.12 The Partial Regression Plot showing Rank Opinion predicting Health Opinion.

Figure 13.13 The MLR omnibus test result.

Figure 13.14 The effect size summary for the MLR procedure.

Figure 13.15 The SPSS

®

output for individual predictors.

Figure 13.16 Isolating the effects of a predictor variable on an outcome variable through part or semipartial correlation.

Figure 13.17 The MLR entry method for the first predictor.

Figure 13.18 The omnibus test results for hierarchical MLR.

Figure 13.19 The hierarchical results for effect size.

Figure 13.20 The individual predictor summary for separate entry.

Figure 13.21 The specification menu for the Stepwise (and other) entry method(s).

Figure 13.22 The histogram for the outcome variable.

Figure 13.23 The findings indicating all study variables are normally distributed.

Figure 13.24 The residuals plot for normality assessment.

Figure 13.25 The scatterplot between standardized residuals and predicted values.

CHAPTER 14: CHI-SQUARE AND CONTINGENCY TABLE ANALYSIS

Figure 14.1 The chi-square series of distributions.

Figure 14.2 The SPSS

®

“Weight Cases” specification window.

Figure 14.3 The SPSS

®

“Weight Cases” specification window.

Figure 14.4 The SPSS

®

Crosstabs specification window.

Figure 14.5 The “Crosstabs: Statistics” menu in SPSS

®

.

Figure 14.6 The SPSS

®

“Crosstabs: Cell Display” menu.

Figure 14.7 The SPSS

®

crosstabs contingency table.

Figure 14.8 The SPSS

®

chi-square significance test output.

Figure 14.9 The SPSS

®

effect size measures.

Figure 14.10 The CHISQ.TEST function in Excel for chi-square analysis.

Figure 14.11 The Excel CHIDIST function to identify the chi-square probability.

Figure 14.12 The SPSS

®

variables used for the chi-square analysis.

Figure 14.13 The SPSS

®

chi-square findings.

Figure 14.14 The SPSS

®

effect size findings.

Figure 14.15 The SPSS

®

contingency table output for the crosstabs analysis.

Figure 14.16 The Excel CHISQ.DIST.RT results for the test of independence.

Figure 14.17 The Excel CHISQ.TEST function using observed and expected frequencies.

CHAPTER 15: REPEATED MEASURES PROCEDURES: Tdep AND ANOVAWS

Figure 15.1 The SPSS

®

T

dep

specification window.

Figure 15.2 The SPSS

®

descriptive output for

T

dep

.

Figure 15.4 The SPSS

®

T

dep

test summary.

Figure 15.3 The SPSS

®

correlation output for

T

dep

.

Figure 15.5 The Excel

T

dep

specification window.

Figure 15.6 The

T

dep

findings from Excel paired two sample test.

Figure 15.7 The mixed design that includes within-subject and between-group elements.

Figure 15.8 The SPSS

®

specification window for the ANOVA

ws

procedure.

Figure 15.9 The SPSS

®

“Repeated Measures” window.

Figure 15.10 The “Contrasts” window for specifying repeated contrasts.

Figure 15.11 The “Options” menu for the ANOVA

ws

procedure.

Figure 15.12 The “Descriptive Statistics” output.

Figure 15.13 The Mauchly's test of sphericity results.

Figure 15.14 The within-subject effects output for Time.

Figure 15.15 The effect size output for ANOVA

ws

.

Figure 15.16 The post hoc output for the study.

Figure 15.17 The comparison plot for the time conditions.

List of Tables

CHAPTER 2: DESCRIPTIVE STATISTICS: CENTRAL TENDENCY

Table 2.1 Typical Ordinal Response Scale

Table 2.2 Perceived Distances in Ordinal Response Items

Table 2.3 Comparison of Interval and Ordinal Scales

Table 2.4 Aggregated School Percentages of Students Passing the Math Standard

Table 2.5 Adjusted School Percentages

Table 2.6 Math Achievement Percentages Demonstrating a Bimodal Distribution of Scores

Table 2.7 BRFSS Responses to the General Health Question

Table 2.8 GENHLTH Responses as Unordered and Ordered

Table 2.9 Frequency of GENHLTH Responses in Five Bins

Table 2.10 Job Satisfaction Ratings of Assembly Workers

Table 2.11 Exam Scores in an AP Class

CHAPTER 3: DESCRIPTIVE STATISTICS: VARIABILITY

Table 3.1 Using the Deviation Method to Calculate SD

Table 3.2 Using the Computation Method to Calculate SD

Table 3.3 Neighborhood Characteristics Ratings Sample

Table 3.4 Housing Survey Data: Quality of Life Index

CHAPTER 4: THE NORMAL DISTRIBUTION

Table 4.1 Neighborhood Characteristics Ratings Sample

Table 4.2 Housing Survey Data – Quality of Life Index

CHAPTER 5: PROBABILITY AND THE Z DISTRIBUTION

Table 5.1 Soccer Injuries by Incidence

Table 5.2 Elements of the Bayes Theorem Problem

Table 5.3 Health Score Ratings of Baristas

Table 5.4 Conditional Probability Elements for OCD

CHAPTER 6: RESEARCH DESIGN AND INFERENTIAL STATISTICS

Table 6.1 Experimental and Control Groups

Table 6.2 Quasi-Experimental Design

Table 6.3 Population and Sample Symbols for Inferential Statistics

CHAPTER 7: THE T TEST FOR SINGLE SAMPLES

Table 7.1 Population and Sample Symbols for Inferential Statistics

Table 7.2 Quality of Life (Hypothetical) Sample Data

Table 7.3 Exclusion Values for the

Z

Distribution

Table 7.4 Exclusion Values for

T

Distribution (df = 9)

Table 7.5 The STAR Classroom Observation Protocol™ Data

CHAPTER 8: INDEPENDENT SAMPLE T TEST

Table 8.1 New Entries to the List of Population and Sample Symbols

Table 8.2 Procedure Information and Fear of Procedure Scores

Table 8.3 Minutes Pacing for Low and High Perceived Stress Assessments

Table 8.4 Mann–Whitney

U

Test Data

Table 8.5 Practitioner Training and Patient Satisfaction

CHAPTER 9: ANALYSIS OF VARIANCE

Table 9.1 Hypothetical Experiment Data

Table 9.2 Hypothetical Experiment Data with Squared Values

Table 9.3 The ANOVA Results Table

Table 9.4 The ANOVA Results Table with Calculated MS Values

Table 9.5 The Final ANOVA Results Table

Table 9.6 The

F

Table of Values Exclusion Areas

Table 9.7 Example of Values from a Studentized Range Table

Table 9.8 The Group Means

Table 9.9 Matrix of Group Means

Table 9.10 The Data for One-Way ANOVA Example

Table 9.11 The ANOVA Example Database with Calculation Values

Table 9.12 The Completed ANOVA Summary Table for the Extended Example

Table 9.13 The Group Mean Difference Matrix

Table 9.14 Hypothetical Data for Kruskal–Wallis Test

Table 9.15 The Paired Comparison Results

Table 9.16 Blood Sugar: Pay Method Data

Table 9.17 Post Hoc Analysis for Problem 1

CHAPTER 10: FACTORIAL ANOVA

Table 10.1 The Data for the Sleep–Sex Impact on Health

Table 10.2 Data Summaries for 2XANOVA Calculations

Table 10.3 The 2XANOVA Summary Table

Table 10.4 The Completed 2XANOVA Summary Table

Table 10.5 Patient Satisfaction Ratings of Providers and Medical Conditions

Table 10.6 Manual Calculations for Problem 1

Table 10.7 ANOVA Table for Problem 1

CHAPTER 11: CORRELATION

Table 11.1 Measures of Correlation

Table 11.2 Data for Correlation Example

Table 11.3 The Data Table Showing

Z

Scores

Table 11.4 Hypothetical Correlation of Hospital Rankings by Patient Satisfaction Ranking

Table 11.5 Ranking an Interval Variable

Table 11.6 Test Taking Minutes and Test Score Data

Table 11.7 The Job Opportunity–Community Involvement Study Data

CHAPTER 12: BIVARIATE REGRESSION

Table 12.1 The Fictitious Data on Income Class and Healthy Days

Table 12.2 The Calculated Sums of the Fictitious Data

Table 12.3 The Calculations for the Components of Variance

Table 12.4 The Stress-Absence Data

Table 12.5 The Job Opportunity and Community Involvement Data

Table 12.6 Manual Calculations for Problem 1

CHAPTER 13: INTRODUCTION TO MULTIPLE LINEAR REGRESSION

Table 13.1 The Diagnostic Study Values

Table 13.2 The Change Values Resulting from Adding Predictors

Table 13.3 The Study Variables and Labels

Table 13.4 Problem 1 MLR Data

Table 13.5 The Unique Contribution to Outcome Variance by Predictor Variables

CHAPTER 14: CHI-SQUARE AND CONTINGENCY TABLE ANALYSIS

Table 14.1 The Hypothetical Treatment Differences Data

Table 14.2 The Reporting Data for the Hypothetical Study

Table 14.3 Observed Frequencies for the Test of Independence

Table 14.4 The Expected Frequencies for the Study

Table 14.5 The Calculated Chi-Square for the Test of Independence

Table 14.6 The Percentages of the Study Data for Interpretation of Findings

Table 14.7 The 2 × 2 Chi-Square Contingency Table

Table 14.8 The Example Data in a 2 × 2 Table

Table 14.9 Using the General Chi-Square Formula on the 2 × 2 Table

Table 14.10 The Dependent Sample Chi-Square for the Resolve Study

Table 14.11 Changing the Dependent Sample Chi-Square Categories

Table 14.12 The Chi-Square Values for the Hypothetical Problem

Table 14.13 The Database for the Chi-Square Example

Table 14.14 The Example Data

Table 14.15 The Contingency Table with Expected Frequencies

CHAPTER 15: REPEATED MEASURES PROCEDURES: Tdep AND ANOVAWS

Table 15.1 Independent and Dependent Sample in Experimental and Post Facto Designs

Table 15.2 The Study Data

Table 15.3 The Calculated Elements of the Study Data

Table 15.4 The Difference Procedure for Calculating

T

dep

Table 15.5 The

T

ind

Comparison with

T

dep

Table 15.6 The Experimental Design

Table 15.7 Data for Within-Subject Study with Three Categories

Table 15.8 The Within-Subject Example Data

Using Statistics in the Social and Health Sciences with SPSS® and EXCEL®

 

 

Martin Lee Abbott

 

 

 

Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Names: Abbott, Martin, 1949-

Title: Using statistics in the social and health sciences with SPSS® and Excel®/ Martin Lee Abbott.

Description: Hoboken, New Jersey : John Wiley & Sons, Inc., [2017] | In the title, both SPSS and Excel are accompanied by the trademark symbol. | Includes bibliographical references and index.

Identifiers: LCCN 2016009168| ISBN 9781119121046 (cloth) | ISBN 9781119121060 (epub) | ISBN 9781119121053 (epdf)

Subjects: LCSH: Mathematical statistics--Data processing. | Multivariate analysis--Data processing. | Social sciences--Statistical methods. | Medical sciences--Statistical methods. | Microsoft Excel (Computer file) | SPSS (Computer file)

Classification: LCC QA276.45.M53 A23 2017 | DDC 005.5/5--dc23 LC record available at https://lccn.loc.gov/2016009168

To my longsuffering, wonderful wife Kathy;

-and-

To those seeking to understand the nature of social systems so that, like Florence Nightingale, they might better understand God's character.

PREFACE

The study of statistics is gaining recognition in a great many fields. In particular, researchers in the social and health sciences note its importance for problem solving and its practical importance in their areas. Statistics has always been important, for example, among those hoping to enter careers in medicine but more so now due to the increasing emphasis on “Scientific Inquiry & Reasoning Skills” as preparation for the Medical College Admission Test (MCAT). Sociology, always relying on statistics and research for its core emphases, is now included in the MCAT as well.

This book focuses squarely on the procedures important to an essential understanding of statistics and how it is used in the real world for problem solving. Moreover, my discussion in the book repeatedly ties statistical methodology with research design (see the “companion” volume my colleague and I wrote to emphasize research and design skills in social science; Abbott and McKinney, 2013).

I emphasize applied statistical analyses and as such will use examples throughout the book drawn from my own research as well as from national databases like GSS and Behavioral Risk Factor Surveillance System (BRFSS). Using data from these sources allow students the opportunity to see how statistical procedures apply to research in their fields as well as to examine “real data.” A central feature of the book is my discussion and use of SPSS® and Microsoft Excel® to analyze data for problem solving.

Throughout my teaching and research career, I have developed an approach to helping students understand difficult statistical concepts in a new way. I find that the great majority of students are visual learners, so I developed diagrams and figures over the years that help create a conceptual picture of the statistical procedures that are often problematic to students (like sampling distributions!).

Another reason for writing this book was to give students a way to understand statistical computing without having to rely on comprehensive and expensive statistical software programs. Since most students have access to Microsoft Excel, I developed a step-by-step approach to using the powerful statistical procedures in Excel to analyze data and conduct research in each of the statistical topics I cover in the book.1

I also wanted to make those comprehensive statistical programs more approachable to statistics students, so I have also included a “hands-on” guide to SPSS in parallel with the Excel examples. In some cases, SPSS has the only means to perform some statistical procedures, but in most cases, both Excel and SPSS can be used.

Here are some of the features of the book:

1.

Emphasis on the

interpretation

of findings.

2.

Use of

clear examples

from my existing and former research projects and large databases to illustrate statistical procedures. “Real-world” data can be cumbersome, so I introduce straightforward procedures and examples in order to help students focus more on interpretation of findings.

3.

Inclusion of a

data lab section

in each chapter that provides relevant, clear examples.

4.

Introduction to advanced statistical procedures

in chapter sections (e.g., regression diagnostics) and separate chapters (e.g., multiple linear regression) for greater relevance to real-world research needs.

5.

Strengthening of the

connection between statistical application and research designs

.

6.

Inclusion of detailed sections in each chapter explaining

applications from Excel and SPSS

.

I use SPSS2 (versions 22 and 23) screenshots of menus and tables by permission from the IBM® Company. IBM, the IBM logo, ibm.com, and SPSS are trademarks or registered trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “IBM Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml. Microsoft Excel references and screenshots in this book are used with permission from Microsoft. I use Microsoft Excel® 2013 in this book.3

I use GSS (2014) data and codebook for examples in this book.4 The BRFSS Survey Questionnaire and Data are used with permission from the CDC.5

1

 One limitation to teaching statistics procedures with Excel is that the data analysis features are different depending on whether the user is a “Mac” user or a “PC” user. I am using the PC version, which features a “Data Analysis” suite of statistical tools. This feature may no longer be included in the Mac version of Excel.

2

 SPSS screen reprints throughout the book are used courtesy of International Business Machines Corporation, ©International Business Machines Corporation. SPSS was acquired by IBM in October 2009.

3

 Excel references and screenshots in this book are used with permission from Microsoft

®

.

4

 Smith, Tom W., Peter Marsden, Michael Hout, and Jibum Kim. General Social Surveys, 1972–2012 [machine-readable data file]/Principal Investigator, Tom W. Smith; Coprincipal Investigator, Peter V. Marsden; Coprincipal Investigator, Michael Hout; Sponsored by National Science Foundation. NORC ed. Chicago: National Opinion Research Center [producer]; Storrs, CT: The Roper Center for Public Opinion Research, University of Connecticut [distributor], 2013. 1 data file (57,061 logical records) + 1 codebook (3432 pp.). (National Data Program for the Social Sciences, No. 21).

5

 Centers for Disease Control and Prevention (CDC).

Behavioral Risk Factor Surveillance System Survey Questionnaire

. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2013 and Centers for Disease Control and Prevention (CDC).

Behavioral Risk Factor Surveillance System Survey Data

. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2013.

ACKNOWLEDGMENTS

I wish to thank my daughter Kristin Hovaguimian for her outstanding work on the Index to this book (and all the others!) – not an easy task with a book of this nature.

I thank my wife Kathleen Abbott for her dedication and amazing contributions to the editing process.

I thank my son Matthew Abbott for the inspiration he has always provided in matters statistical and philosophical.

Thank you Jon Gurstelle and the team at Wiley for your continuing support of this project.

CHAPTER 1INTRODUCTION

The world suddenly has become awash in data! A great many popular books have been written recently that extol “big data” and the information derived for decision makers. These data are considered “big” because a certain “catalog” of data may be so large that traditional ways of managing and analyzing such information cannot easily accommodate it. The data originate from you and me whenever we use certain social media, or make purchases online, or have information derived from us through radio frequency identification (RFID) readers attached to clothing and cars, even implanted in animals, and so on. The result is a massive avalanche of information that exists for businesses leaders, decision makers, and researchers to use for predicting related behaviors and attitudes.

Big Data Analysis

Decision makers are trying to figure out how to manage and use the information available. Typical computer software used for statistical decision making is currently limited to a number of cases far below that which is available for consideration of big data. A traditional approach to address this issue is known as “data mining” in which a number of techniques, including statistics, are used to discover patterns in a large set of data.

Researchers may be overjoyed with the availability of such rich data, but it provides both opportunities and challenges. On the opportunity side, never before have such large amounts of information been available to assist researchers and policy makers understand widespread public thinking and behavior. On the challenge side however are several difficult questions:

How are such data to be examined?

Do current social science methods and processes provide guidance to examining data sets that surpass historical data-gathering capacity?

Are big data representative?

Do data sets so large obviate the need for probability-based research analyses?

Do decision makers understand how to use social science methodology to assist in their analyses of emerging data?

Will the decisions emerging from big data be used ethically, within the context to social science research guidelines?

Will effect size considerations overshadow questions of significance testing?

Social scientists can rely on existing statistical methods to manage and analyze big data, but the way in which the analyses are used for decision making will change. One trend is that prediction may be hailed as a more prominent method for understanding the data than traditional hypothesis testing. We will have more to say about this distinction later in the book, but it is important at this point to see that researchers will need to adapt statistical approaches for analyzing big data.

Visual Data Analysis

Another emerging trend for understanding and managing the swell of data is the use of visuals. Of course, visual descriptions of data have been used for centuries. It is commonly acknowledged that the first “pie chart” was published by Playfair (1801). Playfair's example in Figure 1.1 compares the dynamics of nations over time.

Figure 1.1 William Playfair's pie chart.

Source: https://commons.wikimedia.org/wiki/File:Playfair_piecharts.jpg. Public domain.

Figure 1.1 compared nations using size, color, and orientation over time. Using this method for comparing information has been useful for viewing the patterns in data not readily observable from numerical analysis.

As with numerical methods, however, there are opportunities and challenges in the use of visual analyses:

Can visual means be used to convey complex meaning?

Are there “rules” that will help to insure a standard way of creating, analyzing, and interpreting such visual information?

Will visual analyses become divorced from numerical analysis so that observers have no way of objectively confirming the meaning of the images?

Several visual data software analysis programs have appeared over the last several years. Simply running an online search will yield several possibilities including many that offer free (initial) programs for cataloging and presenting data from the user. I offer one very important caveat (see the final bullet point earlier), which is that it is important to perform visual data analysis in concert with numerical analysis. As we will see later in the book, it is easy to intentionally or unintentionally mislead readers using visual presentations when these are divorced from numerical statistical means that discuss the “significance” and “meaningfulness” of the visual data.

Importance of Statistics for the Social and Health Sciences and Medicine

The presence of so much rich information presents meaningful opportunities for understanding many of the processes that affect the social world. While much of the time big data analyses are used for understanding business dynamics and economic trends, it is also important to focus on those data patterns that can affect the social sphere beyond these indicators: social and psychological behavior and attitudes, changes in understanding health and medicine, and educational progress. These social indicators have been the subject of a great deal of analyses over the decades and now may make significant advances depending on how big data are analyzed and managed. On a related note, the social sciences (especially sociology and psychology) are now areas included in the new Medical College Admission Test (MCAT), which also includes greater emphasis upon “Scientific Inquiry & Reasoning Skills.” The material we will learn from this book will help to support study in these areas for aspiring health and medical professionals.

In this book, I intend to focus on how to use and analyze data of all sizes and shapes. While we will be limited in our ability to dive into the world of big data fully, we can study the basics of how to recognize, generate, interpret, and critique analyses of data for decision making. One of the first lessons is that data can be understood both numerically and visually. When we describe information, we are attempting to see and convey underlying meaning in the numbers and visual expressions. If I have a collection of data, I cannot recognize its meaning by simply looking at it. However, if I apply certain numerical and visual methods to organize the data, I can see what patterns lay below the surface.

Historical Notes: Early Use of Statistics

Statistics as a field has had a long and colorful history. Students will recognize some prominent names as the field developed its mathematical identity: Pearson, Fisher, Bayes, Laplace, and others. But it is important to note that some of the earliest statistical studies were based in solving social and political problems.

One of the earliest of such studies was developed by John Graunt who compiled information from Bills of Mortality to detect, among other things, the impact and origins of deaths by plague. Parish records documented christenings, weddings, and burials at the time, so Graunt's study tracked the number of deaths in the parishes as a way to understand the dynamics of the plague. His broader goal was to predict the population of London using extant data from the parish records.

Another early use of statistics was Dr John Snow's map showing deaths in the houses of London's Soho District during the 1854 cholera epidemic, as popularized by Johnson's book, The Ghost Map