An Introduction to Categorical Data Analysis - Alan Agresti - E-Book

An Introduction to Categorical Data Analysis E-Book

Alan Agresti

0,0
115,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

A valuable new edition of a standard reference The use of statistical methods for categorical data has increased dramatically, particularly for applications in the biomedical and social sciences. An Introduction to Categorical Data Analysis, Third Edition summarizes these methods and shows readers how to use them using software. Readers will find a unified generalized linear models approach that connects logistic regression and loglinear models for discrete data with normal regression for continuous data. Adding to the value in the new edition is: * Illustrations of the use of R software to perform all the analyses in the book * A new chapter on alternative methods for categorical data, including smoothing and regularization methods (such as the lasso), classification methods such as linear discriminant analysis and classification trees, and cluster analysis * New sections in many chapters introducing the Bayesian approach for the methods of that chapter * More than 70 analyses of data sets to illustrate application of the methods, and about 200 exercises, many containing other data sets * An appendix showing how to use SAS, Stata, and SPSS, and an appendix with short solutions to most odd-numbered exercises Written in an applied, nontechnical style, this book illustrates the methods using a wide variety of real data, including medical clinical trials, environmental questions, drug use by teenagers, horseshoe crab mating, basketball shooting, correlates of happiness, and much more. An Introduction to Categorical Data Analysis, Third Edition is an invaluable tool for statisticians and biostatisticians as well as methodologists in the social and behavioral sciences, medicine and public health, marketing, education, and the biological and agricultural sciences.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 787

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



WILEY SERIES IN PROBABILITY AND STATISTICS

Established by Walter A. Shewhart and Samuel S. Wilks

Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Geof H. Givens, Harvey Goldstein, Geert Molenberghs, David W. Scott, Adrian F. M. Smith, Ruey S. Tsay

Editors Emeriti: J. Stuart Hunter, Iain M. Johnstone, Joseph B. Kadane, Jozef L. Teugels

The Wiley Series in Probability and Statistics is well established and authoritative. It covers many topics of current research interest in both pure and applied statistics and probability theory. Written by leading statisticians and institutions, the titles span both state-of-the-art developments in the field and classical methods.

Reflecting the wide range of current research in statistics, the series encompasses applied, methodological and theoretical statistics, ranging from applications and new techniques made possible by advances in computerized practice to rigorous treatment of theoretical approaches.

This series provides essential and invaluable reading for all statisticians, whether in academia, industry, government, or research.

A complete list of titles in this series can be found at http://www.wiley.com/go/wsps

AN INTRODUCTION TOCATEGORICAL DATAANALYSIS

Third Edition

Alan Agresti

University of Florida, Florida, United States

This third edition first published 2019

© 2019 John Wiley & Sons, Inc.

Edition History

(1e, 1996); John Wiley & Sons, Inc. (2e, 2007); John Wiley & Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Alan Agresti to be identified as the author of this work has been asserted in accordance with law.

Registered Office

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office

111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data

Names: Agresti, Alan, author.

Title: An introduction to categorical data analysis / Alan Agresti.

Description: Third edition. | Hoboken, NJ : John Wiley & Sons, 2019. | Series: Wiley series in probability and statistics | Includes bibliographical references and index. |

Identifiers: LCCN 2018026887 (print) | LCCN 2018036674 (ebook) | ISBN 9781119405276 (Adobe PDF) | ISBN 9781119405283 (ePub) | ISBN 9781119405269 (hardcover)

Subjects: LCSH: Multivariate analysis.

Classification: LCC QA278 (ebook) | LCC QA278 .A355 2019 (print) | DDC 519.5/35--dc23

LC record available at https://lccn.loc.gov/2018026887

Cover Design: Wiley

Cover Image: © iStock.com/Anna_Zubkova

CONTENTS

Preface

About the Companion Website

Chapter 1 Introduction

1.1 Categorical Response Data

1.2 Probability Distributions for Categorical Data

1.3 Statistical Inference for a Proportion

1.4 Statistical Inference for Discrete Data

1.5 Bayesian Inference for Proportions *

1.6 Using

R

Software for Statistical Inference about Proportions *

Exercises

Notes

Chapter 2 Analyzing Contingency Tables

2.1 Probability Structure for Contingency Tables

2.2 Comparing Proportions in 2 × 2 Contingency Tables

2.3 The Odds Ratio

2.4 Chi-Squared Tests of Independence

2.5 Testing Independence for Ordinal Variables

2.6 Exact Frequentist and Bayesian Inference *

2.7 Association in Three-Way Tables

Exercises

Notes

Chapter 3 Generalized Linear Models

3.1 Components of a Generalized Linear Model

3.2 Generalized Linear Models for Binary Data

3.3 Generalized Linear Models for Counts and Rates

3.4 Statistical Inference and Model Checking

3.5 Fitting Generalized Linear Models

Exercises

Notes

Chapter 4 Logistic Regression

4.1 The Logistic Regression Model

4.2 Statistical Inference for Logistic Regression

4.3 Logistic Regression with Categorical Predictors

4.4 Multiple Logistic Regression

4.5 Summarizing Effects in Logistic Regression

4.6 Summarizing Predictive Power: Classification Tables, ROC Curves, and Multiple Correlation

Exercises

Notes

Chapter 5 Building and Applying Logistic Regression Models

5.1 Strategies in Model Selection

5.2 Model Checking

5.3 Infinite Estimates in Logistic Regression

5.4 Bayesian Inference, Penalized Likelihood, and Conditional Likelihood for Logistic Regression *

5.5 Alternative Link Functions: Linear Probability and Probit Models *

5.6 Sample Size and Power for Logistic Regression *

Exercises

Notes

Chapter 6 Multicategory Logit Models

6.1 Baseline-Category Logit Models for Nominal Responses

6.2 Cumulative Logit Models for Ordinal Responses

6.3 Cumulative Link Models: Model Checking and Extensions *

6.4 Paired-Category Logit Modeling of Ordinal Responses*

Exercises

Notes

Chapter 7 Loglinear Models for Contingency Tables and Counts

7.1 Loglinear Models for Counts in Contingency Tables

7.2 Statistical Inference for Loglinear Models

7.3 The Loglinear – Logistic Model Connection

7.4 Independence Graphs and Collapsibility

7.5 Modeling Ordinal Associations in Contingency Tables

7.6 Loglinear Modeling of Count Response Variables *

Exercises

Notes

Chapter 8 Models for Matched Pairs

8.1 Comparing Dependent Proportions for Binary Matched Pairs

8.2 Marginal Models and Subject-Specific Models for Matched Pairs

8.3 Comparing Proportions for Nominal Matched-Pairs Responses

8.4 Comparing Proportions for Ordinal Matched-Pairs Responses

8.5 Analyzing Rater Agreement *

8.6 Bradley–Terry Model for Paired Preferences *

Exercises

Notes

Chapter 9 Marginal Modeling of Correlated, Clustered Responses

9.1 Marginal Models Versus Subject-Specific Models

9.2 Marginal Modeling: The Generalized Estimating Equations (GEE) Approach

9.3 Marginal Modeling for Clustered Multinomial Responses

9.4 Transitional Modeling, Given the Past

9.5 Dealing with Missing Data *

Exercises

Notes

Chapter 10 Random Effects: Generalized Linear Mixed Models

10.1 Random Effects Modeling of Clustered Categorical Data

10.2 Examples: Random Effects Models for Binary Data

10.3 Extensions to Multinomial Responses and Multiple Random Effect Terms

10.4 Multilevel (Hierarchical) Models

10.5 Latent Class Models

*

Exercises

Notes

Chapter 11 Classification and Smoothing

*

11.1 Classification: Linear Discriminant Analysis

11.2 Classification: Tree-Based Prediction

11.3 Cluster Analysis for Categorical Responses

11.4 Smoothing: Generalized Additive Models

11.5 Regularization for High-Dimensional Categorical Data (Large

p

)

Exercises

Notes

Chapter 12 A Historical Tour of Categorical Data Analysis *

Appendix: Software for Categorical Data Analysis

A.1

R

for Categorical Data Analysis

A.2

SAS

for Categorical Data Analysis

A.3

STATA

for Categorical Data Analysis

A.4

SPSS

for Categorical Data Analysis

Brief Solutions to Odd-Numbered Exercises

Bibliography

Examples Index

Subject Index

End User License Agreement

List of Tables

Chapter 1

Table 1.1

Chapter 2

Table 2.1

Table 2.2

Table 2.3

Table 2.4

Table 2.5

Table 2.6

Table 2.7

Table 2.8

Table 2.9

Table 2.10

Table 2.11

Table 2.12

Table 2.13

Table 2.14

Table 2.15

Table 2.16

Chapter 3

Table 3.1

Table 3.2

Table 3.3

Table 3.4

Chapter 4

Table 4.1

Table 4.2

Table 4.3

Table 4.4

Table 4.5

Table 4.6

Table 4.7

Table 4.8

Table 4.9

Table 4.10

Table 4.11

Table 4.12

Chapter 5

Table 5.1

Table 5.2

Table 5.3

Table 5.4

Table 5.5

Table 5.6

Table 5.7

Table 5.8

Table 5.9

Table 5.10

Chapter 6

Table 6.1

Table 6.2

Table 6.3

Table 6.4

Table 6.5

Table 6.6

Table 6.7

Table 6.8

Table 6.9

Table 6.10

Table 6.11

Table 6.12

Chapter 7

Table 7.1

Table 7.2

Table 7.3

Table 7.4

Table 7.5

Table 7.6

Table 7.7

Table 7.8

Table 7.9

Table 7.10

Table 7.11

Table 7.12

Table 7.13

Table 7.14

Table 7.15

Table 7.16

Table 7.17

Chapter 8

Table 8.1

Table 8.2

Table 8.3

Table 8.4

Table 8.5

Table 8.6

Table 8.7

Table 8.8

Table 8.9

Table 8.10

Table 8.11

Table 8.12

Table 8.13

Table 8.14

Table 8.15

Table 8.16

Chapter 9

Table 9.1

Table 9.2

Table 9.3

Table 9.4

Table 9.5

Table 9.6

Table 9.7

Chapter 10

Table 10.1

Table 10.2

Table 10.3

Table 10.4

Table 10.5

Table 10.6

Table 10.7

Table 10.8

Table 10.9

Table 10.10

Table 10.11

Chapter 11

Table 11.1

Table 11.2

Table 11.3

Appendix

Table A.1

Table A.2

Table A.3

Table A.4

Table A.5

Table A.6

Table A.7

Table A.8

Table A.9

Table A.10

Table A.11

Table A.12

Table A.13

Table A.14

Table A.15

Table A.16

Table A.17

Table A.18

Table A.19

Table A.20

Table A.21

Table A.22

Guide

Cover

Table of Contents

Preface

Pages

ix

x

xi

xiii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

325

326

327

328

329

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

363

364

365

366

367

369

370

371

372

373

374

375

PREFACE

In recent years, the use of specialized statistical methods for categorical data has increased dramatically, particularly for applications in the biomedical and social sciences. Partly this reflects the development during the past few decades of sophisticated methods for analyzing categorical data. It also reflects the increasing methodological sophistication of scientists and applied statisticians, most of whom now realize that it is unnecessary and often inappropriate to use methods for continuous data with categorical responses.

This third edition of the book is a substantial revision of the second edition. The most important change is showing how to conduct all the analyses using R software. As in the first two editions, the main focus is presenting the most important methods for analyzing categorical data. The book summarizes methods that have long played a prominent role, such as chi-squared tests, but gives special emphasis to modeling techniques, in particular to logistic regression.

The presentation in this book has a low technical level and does not require familiarity with advanced mathematics such as calculus or matrix algebra. Readers should possess a background that includes material from a two-semester statistical methods sequence for undergraduate or graduate nonstatistics majors. This background should include estimation and significance testing and exposure to regression modeling.

This book is designed for students taking an introductory course in categorical data analysis, but I also have written it for applied statisticians and practicing scientists involved in data analyses. I hope that the book will be helpful to analysts dealing with categorical response data in the social, behavioral, and biomedical sciences, as well as in public health, marketing, education, biological and agricultural sciences, and industrial quality control.

The basics of categorical data analysis are covered in Chapters 1 to 7. Chapter 2 surveys standard descriptive and inferential methods for contingency tables, such as odds ratios, tests of independence, and conditional versus marginal associations. I feel that an understanding of methods is enhanced, however, by viewing them in the context of statistical models. Thus, the rest of the text focuses on the modeling of categorical responses. I prefer to teach categorical data methods by unifying their models with ordinary regression models. Chapter 3 does this under the umbrella of generalized linear models. That chapter introduces generalized linear models for binary data and count data. Chapters 4 and 5 discuss the most important such model for binary data, logistic regression. Chapter 6 introduces logistic regression models for multicategory responses, both nominal and ordinal. Chapter 7 discusses loglinear models for contingency tables and other types of count data.

I believe that logistic regression models deserve more attention than loglinear models, because applications more commonly focus on the relationship between a categorical response variable and some explanatory variables (which logistic regression models do) than on the association structure among several response variables (which loglinear models do). Thus, I have given main attention to logistic regression in these chapters and in later chapters that discuss extensions of this model.

Chapter 8 presents methods for matched-pairs data. Chapters 9 and 10 extend the matched-pairs methods to apply to clustered, correlated observations. Chapter 9 does this with marginal models, emphasizing the generalized estimating equations (GEE) approach, whereas Chapter 10 uses random effects to model more fully the dependence. Chapter 11 is a new chapter, presenting classification and smoothing methods. That chapter also introduces regularization methods that are increasingly important with the advent of data sets having large numbers of explanatory variables. Chapter 12 provides a historical perspective of the development of the methods. The text concludes with an appendix showing the use of R, SAS, Stata, and SPSS software for conducting nearly all methods presented in this book. Many of the chapters now also show how to use the Bayesian approach to conduct the analyses.

The material in Chapters 1 to 7 forms the heart of an introductory course in categorical data analysis. Sections that can be skipped if desired, to provide more time for other topics, include Sections 1.5, 2.5–2.7, 3.3 and 3.5, 5.4–5.6, 6.3–6.4, and 7.4–7.6. Instructors can choose sections from Chapters 8 to 12 to supplement the topics of primary importance. Sections and subsections labeled with an asterisk can be skipped for those wanting a briefer survey of the methods.

This book has lower technical level than my book Categorical Data Analysis (3rd edition, Wiley 2013). I hope that it will appeal to readers who prefer a more applied focus than that book provides. For instance, this book does not attempt to derive likelihood equations, prove asymptotic distributions, or cite current research work.

Most methods for categorical data analysis require extensive computations. For the most part, I have avoided details about complex calculations, feeling that statistical software should relieve this drudgery. The text shows how to use R to obtain all the analyses presented. The Appendix discusses the use of SAS, Stata, and SPSS. The full data sets analyzed in the book are available at the text website www.stat.ufl.edu/∼aa/cat/data. That website also lists typos and errors of which I have become aware since publication. The data files are also available at https://github.com/alanagresti/categorical-data.

Brief solutions to odd-numbered exercises appear at the end of the text. An instructor's manual will be included on the companion website for this edition: www.wiley.com/go/Agresti/CDA_3e. The aforementioned data sets will also be available on the companion website. Additional exercises are available there and atwww.stat.ufl.edu/∼aa/cat/Extra_Exercises, some taken from the 2nd edition to create space for new material in this edition and some being slightly more technical.

I owe very special thanks to Brian Marx for his many suggestions about the text over the past twenty years. He has been incredibly generous with his time in providing feedback based on teaching courses based on the book. I also thank those individuals who commented on parts of the manuscript or who made suggestions about examples or material to cover or provided other help such as noticing errors. Travis Gerke, Anna Gottard, and Keramat Nourijelyani gave me several helpful comments. Thanks also to Alessandra Brazzale, Debora Giovannelli, David Groggel, Stacey Handcock, Maria Kateri, Bernhard Klingenberg, Ioannis Kosmidis, Mohammad Mansournia, Trevelyan McKinley, Changsoon Park, Tom Piazza, Brett Presnell, Ori Rosen, Ralph Scherer, Claudia Tarantola, Anestis Touloumis, Thomas Yee, Jin Wang, and Sherry Wang. I also owe thanks to those who helped with the first two editions, especially Patricia Altham, James Booth, Jane Brockmann, Brian Caffo, Brent Coull, Al DeMaris, Anna Gottard, Harry Khamis, Svend Kreiner, Carla Rampichini, Stephen Stigler, and Larry Winner. Thanks to those who helped with material for my more advanced text (Categorical Data Analysis) that I extracted here, especially Bernhard Klingenberg, Yongyi Min, and Brian Caffo. Many thanks also to the staff at Wiley for their usual high-quality help.

A truly special by-product for me of writing books about categorical data analysis has been invitations to teach short courses based on them and spend research visits at many institutions around the world. With grateful thanks I dedicate this book to my hosts over the years. In particular, I thank my hosts in Italy (Adelchi Azzalini, Elena Beccalli, Rino Bellocco, Matilde Bini, Giovanna Boccuzzo, Alessandra Brazzale, Silvia Cagnone, Paula Cerchiello, Andrea Cerioli, Monica Chiogna, Guido Consonni, Adriano Decarli, Mauro Gasparini, Alessandra Giovagnoli, Sabrina Giordano, Paolo Giudici, Anna Gottard, Alessandra Guglielmi, Maria Iannario, Gianfranco Lovison, Claudio Lupi, Monia Lupparelli, Maura Mezzetti, Antonietta Mira, Roberta Paroli, Domenico Piccolo, Irene Poli, Alessandra Salvan, Nicola Sartori, Bruno Scarpa, Elena Stanghellini, Claudia Tarantola, Cristiano Varin, Roberta Varriale, Laura Ventura, Diego Zappa), the UK (Phil Brown, Bianca De Stavola, Brian Francis, Byron Jones, Gillian Lancaster, Irini Moustaki, Chris Skinner, Briony Teather), Austria (Regina Dittrich, Gilg Seeber, Helga Wagner), Belgium (Hermann Callaert, Geert Molenberghs), France (Antoine De Falguerolles, Jean-Yves Mary, Agnes Rogel), Germany (Maria Kateri, Gerhard Tutz), Greece (Maria Kateri, Ioannis Ntzoufras), the Netherlands (Ivo Molenaar, Marijte van Duijn, Peter van der Heijden), Norway (Petter Laake), Portugal (Francisco Carvalho, Adelaide Freitas, Pedro Oliveira, Carlos Daniel Paulino), Slovenia (Janez Stare), Spain (Elias Moreno), Sweden (Juni Palmgren, Elisabeth Svensson, Dietrich van Rosen), Switzerland (Anthony Davison, Paul Embrechts), Brazil (Clarice Demetrio, Bent Jörgensen, Francisco Louzada, Denise Santos), Chile (Guido Del Pino), Colombia (Marta Lucia Corrales Bossio, Leonardo Trujillo), Turkey (Aylin Alin), Mexico (Guillermina Eslava), Australia (Chris Lloyd), China (I-Ming Liu, Chongqi Zhang), Japan (Ritei Shibata), and New Zealand (Nye John, I-Ming Liu). Finally, thanks to my wife, Jacki Levine, for putting up with my travel schedule in these visits around the world!

ALAN AGRESTI

Gainesville, Florida and Brookline, Massachusetts

March 2018

ABOUT THE COMPANION WEBSITE

This book comes with a companion website of other material, including all data sets analyzed in the book and some extra exercises.

www.wiley.com/go/Agresti/CDA_3e