E-Book
129,99 €

Applied Logistic Regression E-Book

David W. Hosmer

4,9

129,99 €

oder

Leseprobe lesen

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Serie: Wiley Series in Probability and Statistics
Sprache: Englisch

Beschreibung

A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-the-art techniques for building, interpreting, and assessing the performance of LR models. New and updated features include: * A chapter on the analysis of correlated outcome data * A wealth of additional material for topics ranging from Bayesian methods to assessing model fit * Rich data sets from real-world studies that demonstrate each method under discussion * Detailed examples and interpretation of the presented results as well as exercises throughout Applied Logistic Regression, Third Edition is a must-have guide for professionals and researchers who need to model nominal or ordinal scaled outcome variables in public health, medicine, and the social sciences as well as a wide range of other fields and disciplines.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 929

Veröffentlichungsjahr: 2013

Bewertungen

4,9 (18 Bewertungen)

Rezensionen(0 Rezensionen)

Leseprobe

Table of Contents

Series

Title Page

Dedication

Preface to the Third Edition

Chapter 1: Introduction to the Logistic Regression Model

1.1 Introduction

1.2 Fitting the Logistic Regression Model

1.3 Testing for the Significance of the Coefficients

1.4 Confidence Interval Estimation

1.5 Other Estimation Methods

1.6 Data Sets Used in Examples and Exercises

Exercises

Chapter 2: The Multiple Logistic Regression Model

2.1 Introduction

2.2 The Multiple Logistic Regression Model

2.3 Fitting the Multiple Logistic Regression Model

2.4 Testing for the Significance of the Model

2.5 Confidence Interval Estimation

2.6 Other Estimation Methods

Exercises

Chapter 3: Interpretation of the Fitted Logistic Regression Model

3.1 Introduction

3.2 Dichotomous Independent Variable

3.3 Polychotomous Independent Variable

3.4 Continuous Independent Variable

3.5 Multivariable Models

3.6 Presentation and Interpretation of the Fitted Values

3.7 A Comparison of Logistic Regression and Stratified Analysis for 2 × 2 Tables

Exercises

Chapter 4: Model-Building Strategies and Methods for Logistic Regression

4.1 Introduction

4.2 Purposeful Selection of Covariates

4.3 Other Methods for Selecting Covariates

4.4 Numerical Problems

Exercises

Chapter 5: Assessing the Fit of the Model

5.1 Introduction

5.2 Summary Measures of Goodness of Fit

5.3 Logistic Regression Diagnostics

5.4 Assessment of Fit Via External Validation

5.5 Interpretation and Presentation of the Results from a Fitted Logistic Regression Model

Exercises

Chapter 6: Application of Logistic Regression with Different Sampling Models

6.1 Introduction

6.2 Cohort Studies

6.3 Case-Control Studies

6.4 Fitting Logistic Regression Models to Data From Complex Sample Surveys

Exercises

Chapter 7: Logistic Regression for Matched Case-Control Studies

7.1 Introduction

7.2 Methods For Assessment of Fit in a 1−M Matched Study

7.3 An Example Using the Logistic Regression Model in a Matched Study

7.4 An Example Using the Logistic Regression Model in a Matched Study

Exercises

Chapter 8: Logistic Regression Models for Multinomial and Ordinal Outcomes

8.1 The Multinomial Logistic Regression Model

8.2 Ordinal Logistic Regression Models

Exercises

Chapter 9: Logistic Regression Models for the Analysis of Correlated Data

9.1 Introduction

9.2 Logistic Regression Models for the Analysis of Correlated Data

9.3 Estimation Methods for Correlated Data Logistic Regression Models

9.4 Interpretation of Coefficients From Logistic Regression Models for the Analysis of Correlated Data

9.5 An Example of Logistic Regression Modeling with Correlated Data

9.6 Assessment of Model Fit

Exercises

Chapter 10: Special Topics

10.1 Introduction

10.2 Application of Propensity Score Methods in Logistic Regression Modeling

10.3 Exact Methods for Logistic Regression Models

10.4 Missing Data

10.5 Sample Size Issues When Fitting Logistic Regression Models

10.6 Bayesian Methods for Logistic Regression

10.7 Other Link Functions for Binary Regression Models

10.8 Mediation‡

10.9 More About Statistical Interaction

Exercises

References

Index

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data Is Available

Hosmer, David W.

Applied Logistic Regression / David W. Hosmer, Jr., Stanley Lemeshow, Rodney X. Sturdivant. - 3rd ed.

Includes bibliographic references and index.

ISBN 978-0-470-58247-3 (cloth)

To our wives, Trina, Elaine, and Mandy,

and our sons, daughters,

and grandchildren

Preface to the Third Edition

This third edition of Applied Logistic Regression comes 12 years after the 2000 publication of the second edition. During this interval there has been considerable effort researching statistical aspects of the logistic regression model—particularly when the outcomes are correlated. At the same time, capabilities of computer software packages to fit models grew impressively to the point where they now provide access to nearly every aspect of model development a researcher might need. As is well-recognized in the statistical community, the inherent danger of this easy-to-use software is that investigators have at their disposal powerful computational tools, about which they may have only limited understanding. It is our hope that this third edition will help bridge the gap between the outstanding theoretical developments and the need to apply these methods to diverse fields of inquiry.

As was the case in the first two editions, the primary objective of the third edition is to provide an introduction to the underlying theory of the logistic regression model, with a major focus on the application, using real data sets, of the available methods to explore the relationship between a categorical outcome variable and a set of covariates. The materials in this book have evolved over the past 12 years as a result of our teaching and consulting experiences. We have used this book to teach parts of graduate level survey courses, quarter- or semester-long courses, as well as focused short courses to working professionals. We assume that students have a solid foundation in linear regression methodology and contingency table analysis. The positive feedback we have received from students or professionals taking courses using this book or using it for self-learning or reference, provides us with some assurance that the approach we used in the first two editions worked reasonably well; therefore, we have followed that approach in this new edition.

The approach we take is to develop the logistic regression model from a regression analysis point of view. This is accomplished by approaching logistic regression in a manner analogous to what would be considered good statistical practice for linear regression. This differs from the approach used by other authors who have begun their discussion from a contingency table point of view. While the contingency table approach may facilitate the interpretation of the results, we believe that it obscures the regression aspects of the analysis. Thus, discussion of the interpretation of the model is deferred until the regression approach to the analysis is firmly established.

To a large extent, there are no major differences between the many software packages that include logistic regression modeling. When a particular approach is available in a limited number of packages, it will be noted in this text. In general, analyses in this book have been performed using STATA [Stata Corp. (2011)]. This easy-to-use package combines excellent graphics and analysis routines; is fast; is compatible across Macintosh, Windows and UNIX platforms; and interacts well with Microsoft Word. Other major statistical packages employed at various points during the preparation of this text include SAS [SAS Institute Inc. (2009)], OpenBUGS [Lunn et al. (2009)] and R [R Development Core Team (2010)]. For all intents and purposes the results produced were the same regardless of which package we used. Reported numeric results have been rounded from figures obtained from computer output and thus may differ slightly from those that would be obtained in a replication of our analyses or from calculations based on the reported results. When features or capabilities of the programs differed in an important way, we noted them by the names given rather than by their bibliographic citation.

We feel that this new edition benefits greatly from the addition of a number of key topics. These include the following:

1. An expanded presentation of numerous new techniques for model-building, including methods for determining the scale of continuous covariates and assessing model performance.

2. An expanded presentation of regression modeling of complex sample survey data.

3. An expanded development of the use of logistic regression modeling in matched studies, as well as with multinomial and ordinal scaled responses.

4. A new chapter dealing with models and methods for correlated categorical response data.

5. A new chapter developing a number of important applications either missing or expanded from the previous editions. These include propensity score methods, exact methods for logistic regression, sample size issues, Bayesian logistic regression, and other link functions for binary outcome regression models. This chapter concludes with sections dealing with the epidemiologic concepts of mediation and additive interaction.

As was the case for the second edition, all of the data sets used in the text are available at a web site at John Wiley & Sons, Inc. http://wiley.mpstechnologies.com/wiley/BOBContent/searchLPBobContent.do

In addition, the data may also be found, by permission of John Wiley & Sons Inc., in the archive of statistical data sets maintained at the University of Massachusetts at http://www.umass.edu/statdata/statdata in the logistic regression section.

We would like to express our sincere thanks and appreciation to our colleagues, students, and staff at all of the institutions we have been fortunate to have been affiliated with since the first edition was conceived more than 25 years ago. This includes not only our primary university affiliations but also the locations where we spent extended sabbatical leaves and special research assignments. For this edition we would like to offer special thanks to Sharon Schwartz and Melanie Wall from Columbia University who took the lead in writing the two final sections of the book dealing with mediation and additive interaction. We benefited greatly from their expertise in applying these methods in epidemiologic settings. We greatly appreciate the efforts of Danielle Sullivan, a PhD candidate in biostatistics at Ohio State, for assisting in the preparation of the index for this book. Colleagues in the Division of Biostatistics and the Division of Epidemiology at Ohio State were helpful in their review of selected sections of the book. These include Bo Lu for his insights on propensity score methods and David Murray, Sigrún Alba Jóhannesdóttir, and Morten Schmidt for their thoughts concerning the sections on mediation analysis and additive interaction. Data sets form the basis for the way we present our materials and these are often hard to come by. We are very grateful to Karla Zadnik, Donald O. Mutti, Loraine T. Sinnott, and Lisa A. Jones-Jordan from The Ohio State University College of Optometry as well as to the Collaborative Longitudinal Evaluation of Ethnicity and Refractive Error (CLEERE) Study Group for making the myopia data available to us. We would also like to acknowledge Cynthia A. Fontanella from the College of Social Work at Ohio State for making both the Adolescent Placement and the Polypharmacy data sets available to us. A special thank you to Gary Phillips from the Center for Biostatistics at OSU for helping us identify these valuable data sets (that he was the first one to analyze) as well as for his assistance with some programming issues with Stata. We thank Gordon Fitzgerald of the Center for Outcomes Research (COR) at the University of Massachusetts / Worcester for his help in obtaining the small subset of data used in this text from the Global Longitudinal Study of Osteoporosis in Women (GLOW) Study's main data set. In addition, we thank him for his many helpful comments on the use of propensity scores in logistic regression modeling. We thank Turner Osler for providing us with the small subset of data obtained from a large data set he abstracted from the National Burn Repository 2007 Report, that we used for the burn injury analyses. In many instances the data sets we used were modified from the original data sets in ways to allow us to illustrate important modeling techniques. As such, we issue a general disclaimer here, and do so again throughout the text, that results presented in this text do not apply to the original data.

Before we began this revision, numerous individuals reviewed our proposal anonymously and made many helpful suggestions. They confirmed that what we planned to include in this book would be of use to them in their research and teaching. We thank these individuals and, for the most part, addressed their comments. Many of these reviewers suggested that we include computer code to run logistic regression in a variety of packages, especially R. We decided not to do this for two reasons: we are not statistical computing specialists and did not want to have to spend time responding to email queries on our code. Also, capabilities of computer packages change rapidly and we realized that whatever we decided to include here would likely be out of date before the book was even published. We refer readers interested in code specific to various packages to a web site maintained by Academic Technology Services (ATS) at UCLA where they use a variety of statistical packages to replicate the analyses for the examples in the second edition of this text as well as numerous other statistical texts. The link to this web site is http://www.ats.ucla.edu/stat/.

Finally, we would like to thank Steve Quigley, Susanne Steitz-Filler, Sari Friedman and the production staff at John Wiley & Sons Inc. for their help in bringing this project to completion.

David W. Hosmer, Jr.

Stanley Lemeshow

Rodney X. Sturdivant1

Stowe, Vermont

Columbus, Ohio

West Point, New York

January 2013

1 * The views expressed in this book are those of the author and do not reflect the official policy or position of the Department of the Army, Department of Defense, or the U.S. Government.

Chapter 1: Introduction to the Logistic Regression Model

1.1 Introduction

Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response variable and one or more explanatory variables. Quite often the outcome variable is discrete, taking on two or more possible values. The logistic regression model is the most frequently used regression model for the analysis of these data.

Before beginning a thorough study of the logistic regression model it is important to understand that the goal of an analysis using this model is the same as that of any other regression model used in statistics, that is, to find the best fitting and most parsimonious, clinically interpretable model to describe the relationship between an outcome (dependent or response) variable and a set of independent (predictor or explanatory) variables. The independent variables are often called covariates. The most common example of modeling, and one assumed to be familiar to the readers of this text, is the usual linear regression model where the outcome variable is assumed to be continuous.

What distinguishes a logistic regression model from the linear regression model is that the outcome variable in logistic regression is binary or dichotomous. This difference between logistic and linear regression is reflected both in the form of the model and its assumptions. Once this difference is accounted for, the methods employed in an analysis using logistic regression follow, more or less, the same general principles used in linear regression. Thus, the techniques used in linear regression analysis motivate our approach to logistic regression. We illustrate both the similarities and differences between logistic regression and linear regression with an example.

Example 1:Table 1.1 lists the age in years (AGE), and presence or absence of evidence of significant coronary heart disease (CHD) for 100 subjects in a hypothetical study of risk factors for heart disease. The table also contains an identifier variable (ID) and an age group variable (AGEGRP). The outcome variable is CHD, which is coded with a value of “0” to indicate that CHD is absent, or “1” to indicate that it is present in the individual. In general, any two values could be used, but we have found it most convenient to use zero and one. We refer to this data set as the CHDAGE data.

Table 1.1 Age, Age Group, and Coronary Heart Disease (CHD) Status of 100 Subjects

It is of interest to explore the relationship between AGE and the presence or absence of CHD in this group. Had our outcome variable been continuous rather than binary, we probably would begin by forming a scatterplot of the outcome versus the independent variable. We would use this scatterplot to provide an impression of the nature and strength of any relationship between the outcome and the independent variable. A scatterplot of the data in Table 1.1 is given in Figure 1.1.

Figure 1.1 Scatterplot of presence or absence of coronary heart disease (CHD) by AGE for 100 subjects.

In this scatterplot, all points fall on one of two parallel lines representing the absence of CHD () or the presence of CHD (). There is some tendency for the individuals with no evidence of CHD to be younger than those with evidence of CHD. While this plot does depict the dichotomous nature of the outcome variable quite clearly, it does not provide a clear picture of the nature of the relationship between CHD and AGE.

The main problem with Figure 1.1 is that the variability in CHD at all ages is large. This makes it difficult to see any functional relationship between AGE and CHD. One common method of removing some variation, while still maintaining the structure of the relationship between the outcome and the independent variable, is to create intervals for the independent variable and compute the mean of the outcome variable within each group. We use this strategy by grouping age into the categories (AGEGRP) defined in Table 1.1. Table 1.2 contains, for each age group, the frequency of occurrence of each outcome, as well as the percent with CHD present.

Table 1.2 Frequency Table of Age Group by CHD

By examining this table, a clearer picture of the relationship begins to emerge. It shows that as age increases, the proportion (mean) of individuals with evidence of CHD increases. Figure 1.2 presents a plot of the percent of individuals with CHD versus the midpoint of each age interval. This plot provides considerable insight into the relationship between CHD and AGE in this study, but the functional form for this relationship needs to be described. The plot in this figure is similar to what one might obtain if this same process of grouping and averaging were performed in a linear regression. We note two important differences.

Figure 1.2 Plot of the percentage of subjects with CHD in each AGE group.

The first difference concerns the nature of the relationship between the outcome and independent variables. In any regression problem the key quantity is the mean value of the outcome variable, given the value of the independent variable. This quantity is called the conditional mean and is expressed as “” where denotes the outcome variable and denotes a specific value of the independent variable. The quantity is read “the expected value of , given the value ”. In linear regression we assume that this mean may be expressed as an equation linear in (or some transformation of or ), such as

This expression implies that it is possible for to take on any value as ranges between and .

The column labeled “Mean” in Table 1.2 provides an estimate of . We assume, for purposes of exposition, that the estimated values plotted in Figure 1.2 are close enough to the true values of to provide a reasonable assessment of the functional relationship between CHD and AGE. With a dichotomous outcome variable, the conditional mean must be greater than or equal to zero and less than or equal to one (i.e., ). This can be seen in . In addition, the plot shows that this mean approaches zero and one “gradually”. The change in the per unit change in becomes progressively smaller as the conditional mean gets closer to zero or one. The curve is said to be and resembles a plot of the cumulative distribution of a continuous random variable. Thus, it should not seem surprising that some well-known cumulative distributions have been used to provide a model for in the case when is dichotomous. The model we use is based on the logistic distribution.

Lesen Sie weiter in der vollständigen Ausgabe!

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben: