Handbook of Regression Analysis - Samprit Chatterjee - E-Book

Handbook of Regression Analysis E-Book

Samprit Chatterjee

0,0
120,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A Comprehensive Account for Data Analysts of the Methods and Applications of Regression Analysis. Written by two established experts in the field, the purpose of the Handbook of Regression Analysis is to provide a practical, one-stop reference on regression analysis. The focus is on the tools that both practitioners and researchers use in real life. It is intended to be a comprehensive collection of the theory, methods, and applications of regression methods, but it has been deliberately written at an accessible level. The handbook provides a quick and convenient reference or "refresher" on ideas and methods that are useful for the effective analysis of data and its resulting interpretations. Students can use the book as an introduction to and/or summary of key concepts in regression and related course work (including linear, binary logistic, multinomial logistic, count, and nonlinear regression models). Theory underlying the methodology is presented when it advances conceptual understanding and is always supplemented by hands-on examples. References are supplied for readers wanting more detailed material on the topics discussed in the book. R code and data for all of the analyses described in the book are available via an author-maintained website.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 341

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Cover

Half Title page

Title page

Copyright page

Dedication

Preface

Part One: The Multiple Linear Regression Model

Chapter One: Multiple Linear Regression

1.1 Introduction

1.2 Concepts and Background Material

1.3 Methodology

1.4 Example – Estimating Home Prices

1.5 Summary

Key Terms

Chapter Two: Model Building

2.1 Introduction

2.2 Concepts and Background Material

2.3 Methodology

2.4 Indicator Variables and Modeling Interactions

2.5 Summary

Key Terms

Part Two: Addressing Violations of Assumptions

Chapter Three: Diagnostics for Unusual Observations

3.1 Introduction

3.2 Concepts and Background Material

3.3 Methodology

3.4 Example – Estimating Home Prices (continued)

3.5 Summary

Key Terms

Chapter Four: Transformations and Linearizable Models

4.1 Introduction

4.2 Concepts and Background Material: The Log-Log Model

4.3 Concepts and Background Material: Semilog Models

4.4 Example – Predicting Movie Grosses After One Week

4.5 Summary

Key Terms

Chapter Five: Time Series Data and Autocorrelation

5.1 Introduction

5.2 Concepts and Background Material

5.3 Methodology: Identifying Autocorrelation

5.4 Methodology: Addressing Autocorrelation

5.5 Summary

Key Terms

Part Three: Categorical Predictors

Chapter Six: Analysis of Variance

6.1 Introduction

6.2 Concepts and Background Material

6.3 Methodology

6.4 Example – DVD Sales of Movies

6.5 Higher-Way ANOVA

6.6 Summary

Key Terms

Chapter Seven: Analysis of Covariance

7.1 Introduction

7.2 Methodology

7.3 Example – International Grosses of Movies

7.4 Summary

Key Terms

Part Four: Other Regression Models

Chapter Eight: Logistic Regression

8.1 Introduction

8.2 Concepts and Background Material

8.3 Methodology

8.4 Example – Smoking and Mortality

8.5 Example – Modeling Bankruptcy

8.6 Summary

Key Terms

Chapter Nine: Multinomial Regression

9.1 Introduction

9.2 Concepts and Background Material

9.3 Methodology

9.4 Example – City Bond Ratings

9.5 Summary

Key Terms

Chapter Ten: Count Regression

10.1 Introduction

10.2 Concepts and Background Material

10.3 Methodology

10.4 Overdispersion and Negative Binomial Regression

10.5 Example – Unprovoked Shark Attacks in Florida

10.6 Other Count Regression Models

10.7 Poisson Regression and Weighted Least Squares

10.8 Summary

Key Terms

Chapter Eleven: Nonlinear Regression

11.1 Introduction

11.2 Concepts and Background Material

11.3 Methodology

11.4 Example – Michaelis-Menten Enzyme Kinetics

11.5 Summary

Key Terms

Bibliography

Index

Handbook of Regression Analysis

Copyright © 2013 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data is available.

ISBN: 978-0-470-88716-5

Dedicated to everyone who labors in the field of statistics, whether they are students, teachers, researchers, or data analysts.

PREFACE

How to Use This Book

This book is designed to be a practical guide to regression modeling. There is little theory here, and methodology appears in the service of the ultimate goal of analyzing real data using appropriate regression tools. As such, the target audience of the book includes anyone who is faced with regression data [that is, data where there is a response variable that is being modeled as a function of other variable(s)], and whose goal is to learn as much as possible from that data.

The book can be used as a text for an applied regression course (indeed, much of it is based on handouts that have been given to students in such a course), but that is not its primary purpose; rather, it is aimed much more broadly as a source of practical advice on how to address the problems that come up when dealing with regression data. While a text is usually organized in a way that makes the chapters interdependent, successively building on each other, that is not the case here. Indeed, we encourage readers to dip into different chapters for practical advice on specific topics as needed. The pace of the book is faster than might typically be the case for a text. The coverage, while at an applied level, does not shy away from sophisticated concepts. It is distinct from, for example, Chatterjee and Hadi (2012), while also having less theoretical focus than texts such as Greene (2011), Montgomery et al. (2012), or Sen and Srivastava (1990).

This, however, is not a cookbook that presents a mechanical approach to doing regression analysis. Data analysis is perhaps an art, and certainly a craft; we believe that the goal of any data analysis book should be to help analysts develop the skills and experience necessary to adjust to the inevitable twists and turns that come up when analyzing real data.

We assume that the reader possesses a nodding acquaintance with regression analysis. The reader should be familiar with the basic terminology and should have been exposed to basic regression techniques and concepts, at least at the level of simple (one-predictor) linear regression. We also assume that the user has access to a computer with an adequate regression package. The material presented here is not tied to any particular software. Almost all of the analyses described here can be performed by most standard packages, although the ease of doing this could vary. All of the analyses presented here were done using the free package R (R Development Core Team, 2011), which is available for many different operating system platforms (see http://www.R-project.org/ for more information). Code for the output and figures in the book can be found at its associated web site at http://people.stern.nyu.edu/jsimonof/RegressionHandbook/.

Each chapter of the book is laid out in a similar way, with most having at least four sections of specific types. First is an introduction, where the general issues that will be discussed in that chapter are presented. A section on concepts and background material follows, where a discussion of the relationship of the chapter’s material to the broader study of regression data is the focus. This section also provides any theoretical background for the material that is necessary. Sections on methodology follow, where the specific tools used in the chapter are discussed. This is where relevant algorithmic details are likely to appear. Finally, each chapter includes at least one analysis of real data using the methods discussed in the chapter (as well as appropriate material from earlier chapters), including both methodological and graphical analyses.

The book begins with discussion of the multiple regression model. Many regression textbooks start with discussion of simple regression before moving on to multiple regression. This is quite reasonable from a pedagogical point of view, since simple regression has the great advantage of being easy to understand graphically, but from a practical point of view simple regression is rarely the primary tool in analysis of real data. For that reason, we start with multiple regression, and note the simplifications that come from the special case of a single predictor. Chapter 1 describes the basics of the multiple regression model, including the assumptions being made, and both estimation and inference tools, while also giving an introduction to the use of residual plots to check assumptions.

Since it is unlikely that the first model examined will ultimately be the final preferred model, Chapter 2 focuses on the very important areas of model building and model selection. This includes addressing the issue of collinearity, as well as the use of both hypothesis tests and information measures to help choose among candidate models.

Chapters 3 through 5 study common violations of regression assumptions, and methods available to address those model violations. Chapter 3 focuses on unusual observations (outliers and leverage points), while Chapter 4 describes how transformations (especially the log transformation) can often address both nonlinearity and nonconstant variance violations. Chapter 5 is an introduction to time series regression, and the problems caused by autocorrelation. Time series analysis is a vast area of statistical methodology, so our goal in this chapter is only to provide a good practical introduction to that area in the context of regression analysis.

Chapters 6 and 7 focus on the situation where there are categorical variables among the predictors. Chapter 6 treats analysis of variance (ANOVA) models, which include only categorical predictors, while Chapter 7 looks at analysis of covariance (ANCOVA) models, which include both numerical and categorical predictors. The examination of interaction effects is a fundamental aspect of these models, as are questions related to simultaneous comparison of many groups to each other. Data of this type often exhibit nonconstant variance related to the different subgroups in the population, and the appropriate tool to address this issue, weighted least squares, is also a focus here.

Chapters 8 though 10 examine the situation where the nature of the response variable is such that Gaussian-based least squares regression is no longer appropriate. Chapter 8 focuses on logistic regression, designed for binary response data and based on the binomial random variable. While there are many parallels between logistic regression analysis and least squares regression analysis, there are also issues that come up in logistic regression that require special care. Chapter 9 uses the multinomial random variable to generalize the models of Chapter 8 to allow for multiple categories in the response variable, outlining models designed for response variables that either do or do not have ordered categories. Chapter 10 focuses on response data in the form of counts, where distributions like the Poisson and negative binomial play a central role. The connection between all these models through the generalized linear model framework is also exploited in this chapter.

The final chapter focuses on situations where linearity does not hold, and a nonlinear relationship is necessary. Although these models are based on least squares, from both an algorithmic and inferential point of view there are strong connections with the models of Chapters 8 through 10, which we highlight.

This Handbook can be used in several different ways. First, a reader may use the book to find information on a specific topic. An analyst might want additional information on, for example, logistic regression or autocorrelation. The chapters on these (and other) topics provide the reader with this subject matter information. As noted above, the chapters also include at least one analysis of a data set, a clarification of computer output, and reference to sources where additional material can be found. The chapters in the book are to a large extent self-contained and can be consulted independently of other chapters.

The book can also be used as a template for what we view as a reasonable approach to data analysis in general. This is based on the cyclical paradigm of model formulation, model fitting, model evaluation, and model updating leading back to model (re)formulation. Statistical significance of test statistics does not necessarily mean that an adequate model has been obtained. Further analysis needs to be performed before the fitted model can be regarded as an acceptable description of the data, and this book concentrates on this important aspect of regression methodology. Detection of deficiencies of fit is based on both testing and graphical methods, and both approaches are highlighted here.

This preface is intended to indicate ways in which the Handbook can be used. Our hope is that it will be a useful guide for data analysts, and will help contribute to effective analyses. We would like to thank our students and colleagues for their encouragement and support. We hope we have provided them with a book of which they would approve. We would like to thank Steve Quigley, Jackie Palmieri, and Amy Hendrickson for their help in bringing this manuscript to print. We would also like to thank our families for their love and support.

SAMPRIT CHATTERJEEBrooksville, Maine

JEFFREY S. SIMONOFFNew York, New York

August, 2012

PART ONE

The Multiple Linear Regression Model

CHAPTER ONE

Multiple Linear Regression

1.1 Introduction

1.2 Concepts and Background Material

1.2.1 The Linear Regression Model

1.2.2 Estimation Using Least Squares

1.2.3 Assumptions

1.3 Methodology

1.3.1 Interpreting Regression Coefficients

1.3.2 Measuring the Strength of the Regression Relationship

1.3.3 Hypothesis Tests and Confidence Intervals for β

1.3.4 Fitted Values and Predictions

1.3.5 Checking Assumptions Using Residual Plots

1.4 Example – Estimating Home Prices

1.5 Summary

1.1 Introduction

This is a book about regression modeling, but when we refer to regression models, what do we mean? The regression framework can be characterized in the following way:

1. We have one particular variable that we are interested in understanding or modeling, such as sales of a particular product, sale price of a home, or voting preference of a particular voter. This variable is called the target, response, or dependent variable, and is usually represented by y.
2. We have a set of p other variables that we think might be useful in predicting or modeling the target variable (the price of the product, the competitor’s price, and so on; or the lot size, number of bedrooms, number of bathrooms of the home, and so on; or the gender, age, income, party membership of the voter, and so on). These are called the predicting, or independent variables, and are usually represented by , , etc.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!