Foundations of Linear and Generalized Linear Models - Alan Agresti - E-Book

Foundations of Linear and Generalized Linear Models E-Book

Alan Agresti

0,0
100,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

A valuable overview of the most important ideas and results in statistical modeling Written by a highly-experienced author, Foundations of Linear and Generalized Linear Models is a clear and comprehensive guide to the key concepts and results of linearstatistical models. The book presents a broad, in-depth overview of the most commonly usedstatistical models by discussing the theory underlying the models, R software applications,and examples with crafted models to elucidate key ideas and promote practical modelbuilding. The book begins by illustrating the fundamentals of linear models, such as how the model-fitting projects the data onto a model vector subspace and how orthogonal decompositions of the data yield information about the effects of explanatory variables. Subsequently, the book covers the most popular generalized linear models, which include binomial and multinomial logistic regression for categorical data, and Poisson and negative binomial loglinear models for count data. Focusing on the theoretical underpinnings of these models, Foundations ofLinear and Generalized Linear Models also features: * An introduction to quasi-likelihood methods that require weaker distributional assumptions, such as generalized estimating equation methods * An overview of linear mixed models and generalized linear mixed models with random effects for clustered correlated data, Bayesian modeling, and extensions to handle problematic cases such as high dimensional problems * Numerous examples that use R software for all text data analyses * More than 400 exercises for readers to practice and extend the theory, methods, and data analysis * A supplementary website with datasets for the examples and exercises An invaluable textbook for upper-undergraduate and graduate-level students in statistics and biostatistics courses, Foundations of Linear and Generalized Linear Models is also an excellent reference for practicing statisticians and biostatisticians, as well as anyone who is interested in learning about the most important statistical models for analyzing data.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 853

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



WILEY SERIES IN PROBABILITY AND STATISTICS

ESTABLISHED BY WALTER A. SHEWHART AND SAMUEL S. WILKS

Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Geof H. Givens, Harvey Goldstein, Geert Molenberghs, David W. Scott, Adrian F. M. Smith, Ruey S. Tsay, Sanford Weisberg

Editors Emeriti: J. Stuart Hunter, Iain M. Johnstone, Joseph B. Kadane, Jozef L. Teugels

A complete list of the titles in this series appears at the end of this volume.

Foundations of Linear and Generalized Linear Models

ALAN AGRESTI

Distinguished Professor Emeritus University of Florida Gainesville, FL

Visiting Professor Harvard University Cambridge, MA

Copyright © 2015 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data

Agresti, Alan, author. Foundations of linear and generalized linear models / Alan Agresti. pages cm. – (Wiley series in probability and statistics) Includes bibliographical references and index. ISBN 978-1-118-73003-4 (hardback) 1. Mathematical analysis–Foundations. 2. Linear models (Statistics) I. Title. QA299.8.A37 2015 003′.74–dc23

2014036543

To my statistician friends in Europe

CONTENTS

Preface

Purpose of this Book

Use as a Textbook

Acknowledgments

Chapter 1: Introduction to Linear and Generalized Linear Models

1.1 Components of A Generalized Linear Model

1.2 Quantitative/Qualitative Explanatory Variables and Interpreting Effects

1.3 Model Matrices and Model Vector Spaces

1.4 Identifiability and Estimability

1.5 Example: Using Software to Fit A GLM

Chapter Notes

Exercises

Notes

Chapter 2: Linear Models: Least Squares Theory

2.1 Least Squares Model Fitting

2.2 Projections of Data Onto Model Spaces

2.3 Linear Model Examples: Projections and SS Decompositions

2.4 Summarizing Variability in a Linear Model

2.5 Residuals, Leverage, and Influence

2.6 Example: Summarizing the Fit of a Linear Model

2.7 Optimality of Least Squares and Generalized Least Squares

Chapter Notes

Exercises

Notes

Chapter 3: Normal Linear Models: Statistical Inference

3.1 Distribution Theory for Normal Variates

3.2 Significance Tests for Normal Linear Models

3.3 Confidence Intervals and Prediction Intervals for Normal Linear Models

3.4 Example: Normal Linear Model Inference

3.5 Multiple Comparisons: Bonferroni, Tukey, and FDR Methods

Chapter Notes

EXERCISES

Notes

Chapter 4: Generalized Linear Models: Model Fitting and Inference

4.1 Exponential Dispersion Family Distributions for a GLM

4.2 Likelihood and Asymptotic Distributions for GLMs

4.3 Likelihood-Ratio/Wald/Score Methods of Inference for GLM Parameters

4.4 Deviance of a GLM, Model Comparison, and Model Checking

4.5 Fitting Generalized Linear Models

4.6 Selecting Explanatory Variables for a GLM

4.7 Example: Building a GLM

Appendix: GLM Analogs of Orthogonality Results for Linear Models

Chapter Notes

Exercises

Notes

Chapter 5: Models for Binary Data

5.1 Link Functions for Binary Data

5.2 Logistic Regression: Properties and Interpretations

5.3 Inference About Parameters of Logistic Regression Models

5.4 Logistic Regression Model Fitting

5.5 Deviance and Goodness of Fit for Binary GLMs

5.6 Probit and Complementary Log–Log Models

5.7 Examples: Binary Data Modeling

Chapter Notes

Exercises

Notes

Chapter 6: Multinomial Response Models

6.1 Nominal Responses: Baseline-Category Logit Models

6.2 Ordinal Responses: Cumulative Logit and Probit Models

6.3 Examples: Nominal and Ordinal Responses

Chapter Notes

Exercises

Notes

Chapter 7: Models for Count Data

7.1 Poisson GLMs for Counts and Rates

7.2 Poisson/Multinomial Models for Contingency Tables

7.3 Negative Binomial GLMS

7.4 Models for Zero-Inflated Data

7.5 Example: Modeling Count Data

Chapter Notes

Exercises

Notes

Chapter 8: Quasi-Likelihood Methods

8.1 Variance Inflation for Overdispersed Poisson and Binomial GLMs

8.2 Beta-Binomial Models and Quasi-Likelihood Alternatives

8.3 Quasi-Likelihood and Model Misspecification

Chapter Notes

Exercises

Notes

Chapter 9: Modeling Correlated Responses

9.1 Marginal Models and Models with Random Effects

9.2 Normal Linear Mixed Models

9.3 Fitting and Prediction for Normal Linear Mixed Models

9.4 Binomial and Poisson GLMMs

9.5 GLMM Fitting, Inference, and Prediction

9.6 Marginal Modeling and Generalized Estimating Equations

9.7 Example: Modeling Correlated Survey Responses

Chapter Notes

Exercises

Notes

Chapter 10: Bayesian Linear and Generalized Linear Modeling

10.1 The Bayesian Approach to Statistical Inference

10.2 Bayesian Linear Models

10.3 Bayesian Generalized Linear Models

10.4 Empirical Bayes and Hierarchical Bayes Modeling

Chapter Notes

Exercises

Notes

Chapter 11: Extensions of Generalized Linear Models

11.1 Robust Regression and Regularization Methods for Fitting Models

11.2 Modeling With Large

P

11.3 Smoothing, Generalized Additive Models, and Other GLM Extensions

Chapter Notes

Exercises

Notes

Appendix A: Supplemental Data Analysis Exercises

Notes

Appendix B: Solution Outlines for Selected Exercises

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11

References

Author Index

Example Index

Subject Index

Wiley Series

End User License Agreement

List of Tables

Chapter 1

Table 1.1

Table 1.2

Table 1.3

Table 1.4

Table 1.5

Chapter 2

Table 2.1

Table 2.2

Table 2.3

Table 2.4

Chapter 3

Table 3.1

Table 3.2

Table 3.3

Table 3.4

Chapter 4

Table 4.1

Chapter 5

Table 5.1

Table 5.2

Table 5.3

Table 5.4

Table 5.5

Chapter 6

Table 6.1

Table 6.2

Chapter 7

Table 7.1

Table 7.2

Table 7.3

Table 7.4

Table 7.5

Table 7.6

Chapter 8

Table 8.1

Table 8.2

Table 8.3

Chapter 9

Table 9.1

Table 9.2

Table 9.3

Table 9.4

Table 9.5

Table 9.6

Table 9.7

Chapter 10

Table 10.1

Table 10.2

Guide

Cover

Table of Contents

Preface

Pages

xi

xii

xiii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

433

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

Preface

PURPOSE OF THIS BOOK

Why yet another book on linear models? Over the years, a multitude of books have already been written about this well-traveled topic, many of which provide more comprehensive presentations of linear modeling than this one attempts. My book is intended to present an overview of the key ideas and foundational results of linear and generalized linear models. I believe this overview approach will be useful for students who lack the time in their program for a more detailed study of the topic. This situation is increasingly common in Statistics and Biostatistics departments. As courses are added on recent influential developments (such as “big data,” statistical learning, Monte Carlo methods, and application areas such as genetics and finance), programs struggle to keep room in their curriculum for courses that have traditionally been at the core of the field. Many departments no longer devote an entire year or more to courses about linear modeling.

Books such as those by Dobson and Barnett (2008), Fox (2008), and Madsen and Thyregod (2011) present fine overviews of both linear and generalized linear models. By contrast, my book has more emphasis on the theoretical foundations—showing how linear model fitting projects the data onto a model vector subspace and how orthogonal decompositions of the data yield information about effects, deriving likelihood equations and likelihood-based inference, and providing extensive references for historical developments and new methodology. In doing so, my book has less emphasis than some other books on practical issues of data analysis, such as model selection and checking. However, each chapter contains at least one section that applies the models presented in that chapter to a dataset, using R software. The book is not intended to be a primer on R software or on the myriad details relevant to statistical practice, however, so these examples are relatively simple ones that merely convey the basic concepts and spirit of model building.

The presentation of linear models for continuous responses in Chapters 1–3 has a geometrical rather than an algebraic emphasis. More comprehensive books on linear models that use a geometrical approach are the ones by Christensen (2011) and by Seber and Lee (2003). The presentation of generalized linear models in Chapters 4–9 includes several sections that focus on discrete data. Some of this significantly abbreviates material from my book, Categorical Data Analysis (3rd ed., John Wiley & Sons, 2013). Broader overviews of generalized linear modeling include the classic book by McCullagh and Nelder (1989) and the more recent book by Aitkin et al. (2009). An excellent book on statistical modeling in an even more general sense is by Davison (2003).

USE AS A TEXTBOOK

This book can serve as a textbook for a one-semester or two-quarter course on linear and generalized linear models. It is intended for graduate students in the first or second year of Statistics and Biostatistics programs. It also can serve programs with a heavy focus on statistical modeling, such as econometrics and operations research. The book also should be useful to students in the social, biological, and environmental sciences who choose Statistics as their minor area of concentration.

As a prerequisite, the reader should be familiar with basic theory of statistics, such as presented by Casella and Berger (2001). Although not mandatory, it will be helpful if readers have at least some background in applied statistical modeling, including linear regression and ANOVA. I also assume some linear algebra background. In this book, I recall and briefly review fundamental statistical theory and matrix algebra results where they are used. This contrasts with the approach in many books on linear models of having several chapters on matrix algebra and distribution theory before presenting the main results on linear models. Readers wanting to improve their knowledge of matrix algebra can find on the Web (e.g., with a Google search of “review of matrix algebra”) overviews that provide more than enough background for reading this book. Also helpful as background for Chapters 1–3 on linear models are online lectures, such as the MIT linear algebra lectures by G. Strang at http://ocw.mit.edu/courses/mathematics on topics such as vector spaces, column space and null space, independence and a basis, inverses, orthogonality, projections and least squares, eigenvalues and eigenvectors, and symmetric and idempotent matrices. By not including separate chapters on matrix algebra and distribution theory, I hope instructors will be able to cover most of the book in a single semester or in a pair of quarters.

Each chapter contains exercises for students to practice and extend the theory and methods and also to help assimilate the material by analyzing data. Complete data files for the text examples and exercises are available at the text website, http://www.stat.ufl.edu/∼aa/glm/data/. Appendix A contains supplementary data analysis exercises that are not tied to any particular chapter. Appendix B contains solution outlines and hints for some of the exercises.

I emphasize that this book is not intended to be a complete overview of linear and generalized linear modeling. Some important classes of models are beyond its scope; examples are transition (e.g., Markov) models and survival (time-to-event) models. I intend merely for the book to be an overview of the foundations of this subject—that is, core material that should be part of the background of any statistical scientist. I invite readers to use it as a stepping stone to reading more specialized books that focus on recent advances and extensions of the models presented here.

ACKNOWLEDGMENTS

This book evolved from a one-semester course that I was invited to develop and teach as a visiting professor for the Statistics Department at Harvard University in the fall terms of 2011–2014. That course covers most of the material in Chapters 1–9. My grateful thanks to Xiao-Li Meng (then chair of the department) for inviting me to teach this course, and likewise thanks to Dave Harrington for extending this invitation through 2014. (The book's front cover, showing the Zakim bridge in Boston, reflects the Boston-area origins of this book.) Special thanks to Dave Hoaglin, who besides being a noted statistician and highly published book author, has wonderful editing skills. Dave gave me detailed and helpful comments and suggestions for my working versions of all the chapters, both for the statistical issues and the expository presentation. He also found many errors that otherwise would have found their way into print!

Thanks also to David Hitchcock, who kindly read the entire manuscript and made numerous helpful suggestions, as did Maria Kateri and Thomas Kneib for a few chapters. Hani Doss kindly shared his fine course notes on linear models (Doss 2010) when I was organizing my own thoughts about how to present the foundations of linear models in only two chapters. Thanks to Regina Dittrich for checking the R code and pointing out errors. I owe thanks also to several friends and colleagues who provided comments or datasets or other help, including Pat Altham, Alessandra Brazzale, Jane Brockmann, Phil Brown, Brian Caffo, Leena Choi, Guido Consonni, Brent Coull, Anthony Davison, Kimberly Dibble, Anna Gottard, Ralitza Gueorguieva, Alessandra Guglielmi, Jarrod Hadfield, Rebecca Hale, Don Hedeker, Georg Heinze, Jon Hennessy, Harry Khamis, Eunhee Kim, Joseph Lang, Ramon Littell, I-Ming Liu, Brian Marx, Clint Moore, Bhramar Mukherjee, Dan Nettleton, Keramat Nourijelyani, Donald Pierce, Penelope Pooler, Euijung Ryu, Michael Schemper, Cristiano Varin, Larry Winner, and Lo-Hua Yuan. James Booth, Gianfranco Lovison, and Brett Presnell have generously shared materials over the years dealing with generalized linear models. Alex Blocker, Jon Bischof, Jon Hennessy, and Guillaume Basse were outstanding and very helpful teaching assistants for my Harvard Statistics 244 course, and Jon Hennessy contributed solutions to many exercises from which I extracted material at the end of this book. Thanks to students in that course for their comments about the manuscript. Finally, thanks to my wife Jacki Levine for encouraging me to spend the terms visiting Harvard and for support of all kinds, including helpful advice in the early planning stages of this book.

ALAN AGRESTI

Brookline, Massachusetts, and Gainesville, Florida

June 2014

CHAPTER 1Introduction to Linear and Generalized Linear Models

This is a book about linear models and generalized linear models. As the names suggest, the linear model is a special case of the generalized linear model. In this first chapter, we define generalized linear models, and in doing so we also introduce the linear model.

Chapters 2 and 3 focus on the linear model. Chapter 2 introduces the least squares method for fitting the model, and Chapter 3 presents statistical inference under the assumption of a normal distribution for the response variable. Chapter 4 presents analogous model-fitting and inferential results for the generalized linear model. This generalization enables us to model non-normal responses, such as categorical data and count data.

The remainder of the book presents the most important generalized linear models. Chapter 5 focuses on models that assume a binomial distribution for the response variable. These apply to binary data, such as “success” and “failure” for possible outcomes in a medical trial or “favor” and “oppose” for possible responses in a sample survey. Chapter 6 extends the models to multicategory responses, assuming a distribution. Chapter 7 introduces models that assume a or distribution for the response variable. These apply to count data, such as observations in a health survey on the number of respondent visits in the past year to a doctor. Chapter 8 presents ways of weakening distributional assumptions in generalized linear models, introducing methods that merely focus on the mean and variance of the response distribution. Chapters 1–8 assume observations. Chapter 9 generalizes the models further to permit observations, such as in handling responses. Chapters 1–9 use the traditional approach to statistical inference, assuming probability distributions for the response variables but treating model parameters as fixed, unknown values. Chapter 10 presents the approach for linear models and generalized linear models, which treats the model parameters as random variables having their own distributions. The final chapter introduces extensions of the models that handle more complex situations, such as settings in which models have enormous numbers of parameters.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!