103,99 €
Praise for the First Edition "The attention to detail is impressive. The book is very well written and the author is extremely careful with his descriptions . . . the examples are wonderful." --The American Statistician Fully revised to reflect the latest methodologies and emerging applications, Applied Regression Modeling, Second Edition continues to highlight the benefits of statistical methods, specifically regression analysis and modeling, for understanding, analyzing, and interpreting multivariate data in business, science, and social science applications. The author utilizes a bounty of real-life examples, case studies, illustrations, and graphics to introduce readers to the world of regression analysis using various software packages, including R, SPSS, Minitab, SAS, JMP, and S-PLUS. In a clear and careful writing style, the book introduces modeling extensions that illustrate more advanced regression techniques, including logistic regression, Poisson regression, discrete choice models, multilevel models, and Bayesian modeling. In addition, the Second Edition features clarification and expansion of challenging topics, such as: * Transformations, indicator variables, and interaction * Testing model assumptions * Nonconstant variance * Autocorrelation * Variable selection methods * Model building and graphical interpretation Throughout the book, datasets and examples have been updated and additional problems are included at the end of each chapter, allowing readers to test their comprehension of the presented material. In addition, a related website features the book's datasets, presentation slides, detailed statistical software instructions, and learning resources including additional problems and instructional videos. With an intuitive approach that is not heavy on mathematical detail, Applied Regression Modeling, Second Edition is an excellent book for courses on statistical regression analysis at the upper-undergraduate and graduate level. The book also serves as a valuable resource for professionals and researchers who utilize statistical methods for decision-making in their everyday work.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 675
Veröffentlichungsjahr: 2013
CONTENTS
Cover
Title page
Copyright page
Dedication
Preface
Acknowledgments
Introduction
I.1 Statistics in practice
I.2 Learning statistics
Chapter 1: Foundations
1.1 Identifying and summarizing data
1.2 Population distributions
1.3 Selecting individuals at random—probability
1.4 Random sampling
1.5 Interval estimation
1.6 Hypothesis testing
1.7 Random errors and prediction
1.8 Chapter summary
Problems
Chapter 2: Simple linear regression
2.1 Probability model for X and Y
2.2 Least squares criterion
2.3 Model evaluation
2.4 Model assumptions
2.5 Model interpretation
2.6 Estimation and prediction
2.7 Chapter summary
Problems
Chapter 3: Multiple linear regression
3.1 Probability model for (X1, X2, …) and Y
3.2 Least squares criterion
3.3 Model evaluation
3.4 Model assumptions
3.5 Model interpretation
3.6 Estimation and prediction
3.7 Chapter summary
Problems
Chapter 4: Regression model building I
4.1 Transformations
4.2 Interactions
4.3 Qualitative predictors
4.4 Chapter summary
Problems
Chapter 5: Regression model building II
5.1 Influential points
5.2 Regression pitfalls
5.3 Model building guidelines
5.4 Model selection
5.5 Model interpretation using graphics
5.6 Chapter summary
Problems
Chapter 6: Case studies
6.1 Home prices
6.2 Vehicle fuel efficiency
6.3 Pharmaceutical patches
Chapter 7: Extensions
7.1 Generalized linear models
7.2 Discrete choice models
7.3 Multilevel models
7.4 Bayesian modeling
Appendix A: Computer software help
Problems
Appendix B: Critical values for t-distributions
Appendix C: Notation and formulas
C.1 Univariate data
C.2 Simple linear regression
C.3 Multiple linear regression
Appendix D: Mathematics refresher
D.1 The natural logarithm and exponential functions
D.2 Rounding and accuracy
Appendix E: Answers for selected problems
References
Glossary
Index
Copyright © 2012 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Pardoe, Iain, 1970- Applied regression modeling [electronic resource] / Iain Pardoe. — 2nd ed. 1 online resource. Includes index. Description based on print version record and CIP data provided by publisher; resource not viewed. ISBN 978-1-118-34502-3 (pdf) — ISBN 978-1-118-34503-0 (mobi) — ISBN 978-1-118-34504-7 (epub) — ISBN 978-1-118-09728-1 (hardback) (print) 1. Regression analysis. 2. Statistics. I. Title. QA278.2 519.5’36—dc23 2012006617
To Tanya, Bethany, and Sierra
PREFACE
The first edition of this book was developed from class notes written for an applied regression course taken primarily by undergraduate business majors in their junior year at the University of Oregon. Since the regression methods and techniques covered in the book have broad application in many fields, not just business, this second edition widens its scope to reflect this. Details of the major changes for the second edition are included below.
The book is suitable for any undergraduate statistics course in which regression analysis is the main focus. A recommended prerequisite is an introductory probability and statistics course. It would also be suitable for use in an applied regression course for non-statistics major graduate students, including MBAs, and for vocational, professional, or other non-degree courses. Mathematical details have deliberately been kept to a minimum, and the book does not contain any calculus. Instead, emphasis is placed on applying regression analysis to data using statistical software, and understanding and interpreting results.
Chapter 1 reviews essential introductory statistics material, while Chapter 2 covers simple linear regression. Chapter 3 introduces multiple linear regression, while Chapters 4 and 5 provide guidance on building regression models, including transforming variables, using interactions, incorporating qualitative information, and using regression diagnostics. Each of these chapters includes homework problems, mostly based on analyzing real datasets provided with the book. Chapter 6 contains three in-depth case studies, while Chapter 7 introduces extensions to linear regression and outlines some related topics. The appendices contain a list of statistical software packages that can be used to carry out all the analyses covered in the book (each with detailed instructions available from the book website), a table of critical values for the t-distribution, notation and formulas used throughout the book, a glossary of important terms, a short mathematics refresher, and brief answers to selected homework problems.
The first five chapters of the book have been used successfully in quarter-length courses at a number of institutions. An alternative approach for a quarter-length course would be to skip some of the material in Chapters 4 and 5 and substitute one or more of the case studies in Chapter 6, or briefly introduce some of the topics in Chapter 7. A semester-length course could comfortably cover all the material in the book.
The website for the book, which can be found at www.iainpardoe.com/arm2e/, contains supplementary material designed to help both the instructor teaching from this book and the student learning from it. There you’ll find all the datasets used for examples and homework problems in formats suitable for most statistical software packages, as well as detailed instructions for using the major packages, including SPSS, Minitab, SAS, JMP, Data Desk, EViews, Stata, Statistica, R, and S-PLUS. There is also some information on using the Microsoft Excel spreadsheet package for some of the analyses covered in the book (dedicated statistical software is necessary to carry out all of the analyses). The website also includes information on obtaining a solutions manual containing complete answers to all the homework problems, as well as further ideas for organizing class time around the material in the book.
The book contains the following stylistic conventions:
When displaying calculated values, the general approach is to be as accurate as possible when it matters (such as in intermediate calculations for problems with many steps), but to round appropriately when convenient or when reporting final results for real-world questions. Displayed results from statistical software use the default rounding employed in R throughout.
In the author’s experience, many students find some traditional approaches to notation and terminology a barrier to learning and understanding. Thus, some traditions have been altered to improve ease of understanding. These include: using familiar Roman letters in place of unfamiliar Greek letters [e.g., E(
Y
) rather than μ and
b
rather than β]; replacing the nonintuitive for the sample mean of
Y
with
m
Y
; using NH and AH for null hypothesis and alternative hypothesis, respectively, rather than the usual
H
0
and
H
a
.
Major changes for the second edition
The first edition of this book was used in the regression analysis course run by
Statistics.com
from 2008 to 2012. The lively discussion boards provided an invaluable source for suggestions for changes to the book. This edition clarifies and expands on concepts that students found challenging and addresses every question posed in those discussions.
The foundational material on interval estimation has been rewritten to clarify the mathematics.
There is new material on testing model assumptions, transformations, indicator variables, nonconstant variance, autocorrelation, power and sample size, model building, and model selection.
As far as possible, I’ve replaced outdated data examples with more recent data, and also used more appropriate data examples for particular topics (e.g., autocorrelation). In total, about 40% of the data files have been replaced.
Most of the data examples now use descriptive names for variables rather than generic letters such as
Y
and
X
.
As in the first edition, this edition uses mathematics to explain methods and techniques only where necessary, and formulas are used within the text only when they are instructive. However, this edition also includes additional formulas in optional sections to aid those students who can benefit from more mathematical detail.
I’ve added many more end-of-chapter problems. In total, the number of problems has increased by nearly 25%.
I’ve updated and added new references, nearly doubling the total number of references.
I’ve added a third case study to Chapter 6.
The first edition included detailed computer software instructions for five major software packages (SPSS, Minitab, SAS Analyst, R/S-PLUS, and Excel) in an appendix. This appendix has been dropped from this edition; instead, instructions for newer software versions and other packages (e.g., JMP and Stata) are now just updated on the book website.
IAIN PARDOE
Nelson, British ColumbiaApril 2012
ACKNOWLEDGMENTS
I am grateful to a number of people who helped to make this book a reality. Dennis Cook and Sandy Weisberg first gave me the textbook-writing bug when they approached me to work with them on their classic applied regression book (Cook and Weisberg, 1999), and Dennis subsequently motivated me to transform my teaching class notes into my own applied regression book. People who provided data for examples used throughout the book include: Victoria Whitman for the house price examples; Wolfgang Jank for the autocorrelation example on beverage sales; Craig Allen for the case study on pharmaceutical patches; Cathy Durham for the Poisson regression example in the chapter on extensions. The multilevel and Bayesian modeling sections of the chapter on extensions are based on work by Andrew Gelman and Hal Stern. Gary A. Simon and a variety of anonymous reviewers provided extremely useful feedback on the first edition of the book, as did many of my students at the University of Oregon and Statistics.com. Finally, I’d like to thank my editor at Wiley, Steve Quigley, who encouraged me to prepare this second edition.
I.P.
INTRODUCTION
Statistics is used in many fields of application since it provides an effective way to analyze quantitative information. Some examples include:
A pharmaceutical company is developing a new drag for treating a particular disease more effectively. How might statistics help you decide whether the drug will be safe and effective if brought to market?
Clinical trials involve large-scale statistical studies of people—usually both patients with the disease and healthy volunteers—who are assessed for their response to the drug. To determine that the drug is both safe and effective requires careful statistical analysis of the trial results, which can involve controlling for the personal characteristics of the people (e.g., age, gender, health history) and possible placebo effects, comparisons with alternative treatments, and so on
.
A manufacturing firm is not getting paid by its customers in a timely manner—this costs the firm money on lost interest. You’ve collected recent data for the customer accounts on amount owed, number of days since the customer was billed, and size of the customer (small, medium, large). How might statistics help you improve the on-time payment rate?
You can use statistics to find out whether there is an association between the amount owed and the number of days and/or size. For example, there may be a positive association between amount owed and number of days for small and medium-sized customers but not for large-sized customers—thus it may be more profitable to focus collection efforts on small and medium-sized customers billed some time ago, rather than on large-sized customers or customers billed more recently
.
A firm makes scientific instruments and has been invited to make a sealed bid on a large government contract. You have cost estimates for preparing the bid and fulfilling the contract, as well as historical information on similar previous contracts on which the firm has bid (some successful, others not). How might statistics help you decide how to price the bid?
You can use statistics to model the association between the success/failure of past bids and variables such as bid cost, contract cost, bid price, and so on. If your model proves useful for predicting bid success, you could use it to set a maximum price at which the bid is likely to be successful
.
As an auditor, you’d like to determine the number of price errors in all of a company’s invoices—this will help you detect whether there might be systematic fraud at the company. It is too time-consuming and costly to examine all of the company’s invoices, so how might statistics help you determine an upper bound for the proportion of invoices with errors?
Statistics allows you to infer about a population from a relatively small random sample of that population. In this case, you could take a sample of
100
invoices, say, to find a proportion, p, such that you could be
95%
confident that the population error rate is less than that quantity p
.
A firm manufactures automobile parts and the factory manager wants to get a better understanding of overhead costs. You believe two variables in particular might contribute to cost variation: machine hours used per month and separate production runs per month. How might statistics help you to quantify this information?
You can use statistics to build a multiple linear regression model that estimates an equation relating the variables to one another. Among other things you can use the model to determine how much cost variation can be attributed to the two cost drivers, their individual effects on cost, and predicted costs for particular values of the cost drivers
.
You work for a computer chip manufacturing firm and are responsible for forecasting future sales. How might statistics be used to improve the accuracy of your forecasts?
Statistics can be used to fit a number of different forecasting models to a time series of sales figures. Some models might just use past sales values and extrapolate into the future, while others might control for external variables such as economic indices. You can use statistics to assess the fit of the various models, and then use the best-fitting model, or perhaps an average of the few best-fitting models, to base your forecasts on
.
As a financial analyst, you review a variety of financial data, such as price/earnings ratios and dividend yields, to guide investment recommendations. How might statistics be used to help you make buy, sell, or hold recommendations for individual stocks?
By comparing statistical information for an individual stock with information about stock market sector averages, you can draw conclusions about whether the stock is overvalued or undervalued. Statistics is used for both “technical analysis” (which considers the trading patterns of stocks) and “quantitative analysis” (which studies economic or company-specific data that might be expected to affect the price or perceived value of a stock)
.
You are a brand manager for a retailer and wish to gain a better understanding of the association between promotional activities and sales. How might statistics be used to help you obtain this information and use it to establish future marketing strategies for your brand?
Electronic scanners at retail checkout counters and online retailer records can provide sales data and statistical summaries on promotional activities such as discount pricing and the use of in-store displays or e-commerce websites. Statistics can be used to model these data to discover which product features appeal to particular market segments and to predict market share for different marketing strategies
.
As a production manager for a manufacturer, you wish to improve the overall quality of your product by deciding when to make adjustments to the production process, for example, increasing or decreasing the speed of a machine. How might statistics be used to help you make those decisions?
Statistical quality control charts can be used to monitor the output of the production process. Samples from previous runs can be used to determine when the process is “in control.” Ongoing samples allow you to monitor when the process goes out of control, so that you can make the adjustments necessary to bring it back into control
.
As an economist, one of your responsibilities is providing forecasts about some aspect of the economy, for example, the inflation rate. How might statistics be used to estimate those forecasts optimally?
Statistical information on various economic indicators can be entered into computerized forecasting models (also determined using statistical methods) to predict inflation rates. Examples of such indicators include the producer price index, the unemployment rate, and manufacturing capacity utilization
.
As general manager of a baseball team with limited financial resources, you’d like to obtain strong, yet undervalued players. How might statistics help you to do this?
A wealth of statistical information on baseball player performance is available, and objective analysis of these data can reveal information on those players most likely to add value to the team (in terms of winning games) relative to a player’s cost. This field of statistics even has its own name, sabermetrics
.
What is this book about?
This book is about the application of statistical methods, primarily regression analysis and modeling, to enhance decision-making. Regression analysis is by far the most used statistical methodology in real-world applications. Furthermore, many other statistical techniques are variants or extensions of regression analysis, so once you have a firm foundation in this methodology, you can approach these other techniques without too much additional difficulty. In this book we show you how to apply and interpret regression models, rather than deriving results and formulas (there is no calculus in the book)
.
Why are non-math major students required to study statistics?
In many aspects of modern life, we have to make decisions based on incomplete information (e.g., health, climate, economics, business). This book will help you to understand, analyze, and interpret such data in order to make informed decisions in the face of uncertainty. Statistical theory allows a rigorous, quantifiable appraisal of this uncertainty
.
How is the book organized?
Chapter 1 reviews the essential details of an introductory statistics course necessary for use in later chapters. Chapter 2 covers the simple linear regression model for analyzing the linear association between two variables (a “response” and a “predictor”). Chapter 3 extends the methods of Chapter 2 to multiple linear regression where there can be more than one predictor variable. Chapters 4 and 5 provide guidance on building regression models, including transforming variables, using interactions, incorporating qualitative information, and diagnosing problems. Chapter 6 contains three case studies that apply the linear regression modeling techniques considered in this book to examples on real estate prices, vehicle fuel efficiency, and pharmaceutical patches. Chapter 7 introduces some extensions to the multiple linear regression model and outlines some related topics. The appendices contain a list of statistical software that can be used to carry out all the analyses covered in the book, a t-table for use in calculating confidence intervals and conducting hypothesis tests, notation and formulas used throughout the book, a glossary of important terms, a short mathematics refresher, and brief answers to selected problems
.
What else do you need?
The preferred calculation method for understanding the material and completing the problems is to use statistical software rather than a statistical calculator. It may be possible to apply many of the methods discussed using spreadsheet software (such as Microsoft Excel), although some of the graphical methods may be difficult to implement and statistical software will generally be easier to use. Although a statistical calculator is not recommended for use with this book, a traditional calculator capable of basic arithmetic (including taking logarithmic and exponential transformations) will be invaluable
.
What other resources are recommended?
Good supplementary textbooks (some at a more advanced level) include Dielman (2004), Draper and Smith (1998), Kutner et al. (2004), Mendenhall and Sincich (2011), Ryan (2008), and Weisberg (2005)
.
This chapter provides a brief refresher of the main statistical ideas that will be a useful foundation for the main focus of this book, regression analysis, covered in subsequent chapters. For more detailed discussion of this material, consult a good introductory statistics textbook such as Freedman et al. (2007) or Moore et al. (2011). To simplify matters at this stage, we consider univariate data, that is, datasets consisting of measurements of just a single variable on a sample of observations. By contrast, regression analysis concerns multivariate data where there are two or more variables measured on a sample of observations. Nevertheless, the statistical ideas for univariate data carry over readily to this more complex situation, so it helps to start as simply as possible and make things more complicated only as needed.
One way to think about statistics is as a collection of methods for using data to understand a problem quantitatively—we saw many examples of this in the introduction. This book is concerned primarily with analyzing data to obtain information that can be used to help make decisions in real-world contexts.
The process of framing a problem in such a way that it will be amenable to quantitative analysis is clearly an important step in the decision-making process, but this lies outside the scope of this book. Similarly, while data collection is also a necessary task—often the most time-consuming part of any analysis—we assume from this point on that we have already obtained some data relevant to the problem at hand. We will return to the issue of the manner in which these data have been collected—namely, whether the sample data can be considered to be representative of some larger population that we wish to make statistical inferences for—in Section 1.3.
For now, we consider identifying and summarizing the data at hand. For example, suppose that we have moved to a new city and wish to buy a home. In deciding on a suitable home, we would probably consider a variety of factors, such as size, location, amenities, and price. For the sake of illustration we focus on price and, in particular, see if we can understand the way in which sale prices vary in a specific housing market. This example will run through the rest of the chapter, and, while no one would probably ever obsess over this problem to this degree in real life, it provides a useful, intuitive application for the statistical ideas that we use in the rest of the book in more complex problems.
The particular sample in the HOMES1 data file is random because the 30 homes have been selected randomly somehow from the population of all single-family homes in this housing market. For example, consider a list of homes currently for sale, which are considered to be representative of this population. A random number generator—commonly available in spreadsheet or statistical software—can be used to pick out 30 of these. Alternative selection methods may or may not lead to a random sample. For example, picking the first 30 homes on the list would not lead to a random sample if the list were ordered by the size of the sale price.
We can simply list small datasets such as this. The values of Price in this case are:
However, even for these data, it can be helpful to summarize the numbers with a small number of sample statistics (such as the sample mean and standard deviation), or with a graph that can effectively convey the manner in which the numbers vary. A particularly effective graph is a , which places the numbers along the vertical axis of the plot, with numbers that are close together in magnitude next to one another on the plot. For example, a stem-and-leaf plot for the 30 sample prices looks like the following:
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!