Applied Longitudinal Analysis - Garrett M. Fitzmaurice - E-Book

Applied Longitudinal Analysis E-Book

Garrett M. Fitzmaurice

0,0
124,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Praise for the First Edition ". . . [this book] should be on the shelf of everyone interested in . . . longitudinal data analysis." --Journal of the American Statistical Association Features newly developed topics and applications of the analysis of longitudinal data Applied Longitudinal Analysis, Second Edition presents modern methods for analyzing data from longitudinal studies and now features the latest state-of-the-art techniques. The book emphasizes practical, rather than theoretical, aspects of methods for the analysis of diverse types of longitudinal data that can be applied across various fields of study, from the health and medical sciences to the social and behavioral sciences. The authors incorporate their extensive academic and research experience along with various updates that have been made in response to reader feedback. The Second Edition features six newly added chapters that explore topics currently evolving in the field, including: * Fixed effects and mixed effects models * Marginal models and generalized estimating equations * Approximate methods for generalized linear mixed effects models * Multiple imputation and inverse probability weighted methods * Smoothing methods for longitudinal data * Sample size and power Each chapter presents methods in the setting of applications to data sets drawn from the health sciences. New problem sets have been added to many chapters, and a related website features sample programs and computer output using SAS, Stata, and R, as well as data sets and supplemental slides to facilitate a complete understanding of the material. With its strong emphasis on multidisciplinary applications and the interpretation of results, Applied Longitudinal Analysis, Second Edition is an excellent book for courses on statistics in the health and medical sciences at the upper-undergraduate and graduate levels. The book also serves as a valuable reference for researchers and professionals in the medical, public health, and pharmaceutical fields as well as those in social and behavioral sciences who would like to learn more about analyzing longitudinal data.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 1367

Veröffentlichungsjahr: 2012

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Cover

Half Title page

Title page

Copyright page

Dedication

Preface

Preface to First Edition

Acknowledgments

Part I: Introduction to Longitudinal and Clustered Data

Chapter 1: Longitudinal and Clustered Data

1.1 Introduction

1.2 Longitudinal and Clustered Data

1.3 Examples

1.4 Regression Models for Correlated Responses

1.5 Organization of the Book

1.6 Further Reading

Chapter 2: Longitudinal Data: Basic Concepts

2.1 Introduction

2.2 Objectives of Longitudinal Analysis

2.3 Defining Features of Longitudinal Data

2.4 Example: Treatment of Lead-Exposed Children Trial

2.5 Sources of Correlation in Longitudinal Data

2.6 Further Reading

Part II: Linear Models for Longitudinal Continuous Data

Chapter 3: Overview of Linear Models for Longitudinal Data

3.1 Introduction

3.2 Notation and Distributional Assumptions

3.3 Simple Descriptive Methods of Analysis

3.4 Modeling the Mean

3.5 Modeling the Covariance

3.6 Historical Approaches

3.7 Further Reading

Chapter 4: Estimation and Statistical Inference

4.1 Introduction

4.2 Estimation: Maximum Likelihood

4.3 Missing Data Issues

4.4 Statistical Inference

4.5 Restricted Maximum Likelihood (REML) Estimation

4.6 Further Reading

Chapter 5: Modeling the Mean: Analyzing Response Profiles

5.1 Introduction

5.2 Hypotheses Concerning Response Profiles

5.3 General Linear Model Formulation

5.4 Case Study

5.5 One-Degree-of-Freedom Tests for Group by Time Interaction

5.6 Adjustment for Baseline Response

5.7 Alternative Methods of Adjusting for Baseline Response

5.8 Strengths and Weaknesses of Analyzing Response Profiles

5.9 Computing: Analyzing Response Profiles Using PROC MIXED in SAS

5.10 Further Reading

Chapter 6: Modeling the Mean: Parametric Curves

6.1 Introduction

6.2 Polynomial Trends in Time

6.3 Linear Splines

6.4 General Linear Model Formulation

6.5 Case Studies

6.6 Computing: Fitting Parametric Curves Using PROC MIXED in SAS

6.7 Further Reading

Chapter 7: Modeling the Covariance

7.1 Introduction

7.2 Implications of Correlation among Longitudinal Data

7.3 Unstructured Covariance

7.4 Covariance Pattern Models

7.5 Choice among Covariance Pattern Models

7.6 Case Study

7.7 Discussion: Strengths and Weaknesses of Covariance Pattern Models

7.8 Computing: Fitting Covariance Pattern Models Using PROC MIXED in SAS

7.9 Further Reading

Chapter 8: Linear Mixed Effects Models

8.1 Introduction

8.2 Linear Mixed Effects Models

8.3 Random Effects Covariance Structure

8.4 Two-Stage Random Effects Formulation

8.5 Choice among Random Effects Covariance Models

8.6 Prediction of Random Effects

8.7 Prediction and Shrinkage

8.8 Case Studies

8.9 Computing: Fitting Linear Mixed Effects Models Using PROC MIXED in SAS

8.10 Further Reading

Chapter 9: Fixed Effects versus Random Effects Models

9.1 Introduction

9.2 Linear Fixed Effects Models

9.3 Fixed Effects versus Random Effects: Bias-Variance Trade-off

9.4 Resolving the Dilemma of Choosing Between Fixed and Random Effects Models

9.5 Longitudinal and Cross-sectional Information

9.6 Case Study

9.7 Computing: Fitting Linear Fixed Effects Models Using PROC GLM in SAS

9.8 Computing: Decomposition of Between-Subject and Within-Subject Effects Using PROC MIXED in SAS

9.9 Further Reading

Chapter 10: Residual Analyses and Diagnostics

10.1 Introduction

10.2 Residuals

10.3 Transformed Residuals

10.4 Aggregating Residuals

10.5 Semi-Variogram

10.6 Case Study

10.7 Summary

10.8 Further Reading

Part III: Generalized Linear Models for Longitudinal Data

Chapter 11: Review of Generalized Linear Models

11.1 Introduction

11.2 Salient Features of Generalized Linear Models

11.3 Illustrative Examples

11.4 Ordinal Regression Models

11.5 Overdispersion

11.6 Computing: Fitting Generalized Linear Models Using PROC GENMOD in SAS

11.7 Overview of Generalized Linear Models

11.8 Further Reading

Chapter 12: Marginal Models: Introduction and Overview

12.1 Introduction

12.2 Marginal Models for Longitudinal Data

12.3 Illustrative Examples of Marginal Models

12.4 Distributional Assumptions for Marginal Models

12.5 Further Reading

Chapter 13: Marginal Models: Generalized Estimating Equations (GEE)

13.1 Introduction

13.2 Estimation of Marginal Models: Generalized Estimating Equations

13.3 Residual Analyses and Diagnostics

13.4 Case Studies

13.5 Marginal Models and Time-Varying Covariates

13.6 Computing: Generalized Estimating Equations Using PROC GENMOD in SAS

13.7 Further Reading

Chapter 14: Generalized Linear Mixed Effects Models

14.1 Introduction

14.2 Incorporating Random Effects in Generalized Linear Models

14.3 Interpretation of Regression Parameters

14.4 Overdispersion

14.5 Estimation and Inference

14.6 A Note on Conditional Maximum Likelihood

14.7 Case Studies

14.8 Computing: Fitting Generalized Linear Mixed Models Using PROC GLIMMIX in SAS

14.9 Further Reading

Chapter 15: Generalized Linear Mixed Effects Models: Approximate Methods of Estimation

15.1 Introduction

15.2 Penalized Quasi-Likelihood

15.3 Marginal Quasi-Likelihood

15.4 Cautionary Remarks on the Use of PQL and MQL

15.5 Case Studies

15.6 Computing: Fitting GLMMs Using PROC GLIMMIX in SAS

15.7 Basis of PQL and MQL Approximations

15.8 Further Reading

Chapter 16: Contrasting Marginal and Mixed Effects Models

16.1 Introduction

16.2 Linear Models: A Special Case

16.3 Generalized Linear Models

16.4 Simple Numerical Illustration

16.5 Case Study

16.6 Conclusion

16.7 Further Reading

Part IV: Missing Data and Dropout

Chapter 17: Missing Data and Dropout: Overview of Concepts and Methods

17.1 Introduction

17.2 Hierarchy of Missing Data Mechanisms

17.3 Implications for Longitudinal Analysis

17.4 Dropout

17.5 Common Approaches for Handling Dropout

17.6 Bias of Last Value Carried Forward Imputation

17.7 Further Reading

Chapter 18: Missing Data and Dropout: Multiple Imputation and Weighting Methods

18.1 Introduction

18.2 Multiple Imputation

18.3 Inverse Probability Weighted Methods

18.4 Case Studies

18.5 “Sandwich” Variance Estimator Adjusting for Estimation of Weights

18.6 Computing: Multiple Imputation Using PROC MI in SAS

18.7 Computing: Inverse Probability Weighted (IPW) Methods in SAS

18.8 Further Reading

Part V: Advanced Topics for Longitudinal and Clustered Data

Chapter 19: Smoothing Longitudinal Data: Semiparametric Regression Models

19.1 Introduction

19.2 Penalized Splines for a Univariate Response

19.3 Case Study

19.4 Penalized Splines for Longitudinal Data

19.5 Case Study

19.6 Fitting Smooth Curves to Individual Longitudinal Data

19.7 Case Study

19.8 Computing: Fitting Smooth Curves Using PROC MIXED in SAS

19.9 Further Reading

Chapter 20: Sample Size and Power

20.1 Introduction

20.2 Sample Size for a Univariate Continuous Response

20.3 Sample Size for a Longitudinal Continuous Response

20.4 Sample Size for a Longitudinal Binary Response

20.5 Summary

20.6 Computing: Sample Size Calculation Using Pseudo-Data

20.7 Further Reading

Chapter 21: Repeated Measures and Related Designs

21.1 Introduction

21.2 Repeated Measures Designs

21.3 Multiple Source Data

21.4 Case Study 1: Repeated Measures Experiment

21.5 Case Study 2: Multiple Source Data

21.6 Summary

21.7 Further Reading

Chapter 22: Multilevel Models

22.1 Introduction

22.2 Multilevel Data

22.3 Multilevel Linear Models

22.4 Multilevel Generalized Linear Models

22.5 Summary

22.6 Further Reading

Appendix A Gentle Introduction to Vectors and Matrices

Appendix B Properties of Expectations and Variances

Appendix C Critical Points for a 50:50 Mixture of Chi-Squared Distributions

References

Index

Applied Longitudinal Analysis

WILEY SERIES IN PROBABILITY AND STATISTICS

Established by WALTER A. SHEWHART and SAMUEL S. WILKS

Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Harvey Goldstein, Iain M. Johnstone, Geert Molenberghs, David W. Scott, Adrian F. M. Smith, Ruey S. Tsay, Sanford Weisberg Editors Emeriti: Vic Barnett, J. Stuart Hunter, Joseph B. Kadane, Jozef L. Teugels

The Wiley Series in Probability and Statistics is well established and authoritative. It covers many topics of current research interest in both pure and applied statistics and probability theory. Written by leading statisticians and institutions, the titles span both state-of-the-art developments in the field and classical methods.

Reflecting the wide range of current research in statistics, the series encompasses applied, methodological and theoretical statistics, ranging from applications and new techniques made possible by advances in computerized practice to rigorous treatment of theoretical approaches.

This series provides essential and invaluable reading for all statisticians, whether in academia, industry, government, or research.

† ABRAHAM and LEDOLTER • Statistical Methods for Forecasting

AGRESTI • Analysis of Ordinal Categorical Data, Second Edition

AGRESTI • An Introduction to Categorical Data Analysis, Second Edition

AGRESTI • Categorical Data Analysis, Second Edition

ALTMAN, GILL, and McDONALD • Numerical Issues in Statistical Computing for the Social Scientist

AMARATUNGA and CABRERA • Exploration and Analysis of DNA Microarray and Protein Array Data

ANDL • Mathematics of Chance

ANDERSON • An Introduction to Multivariate Statistical Analysis, Third Edition

* ANDERSON • The Statistical Analysis of Time Series

ANDERSON, AUQUIER, HAUCK, OAKES, VANDAELE, and WEISBERG • Statistical Methods for Comparative Studies

ANDERSON and LOYNES • The Teaching of Practical Statistics

ARMITAGE and DAVID (editors) • Advances in Biometry

ARNOLD, BALAKRISHNAN, and NAGARAJA • Records

* ARTHANARI and DODGE • Mathematical Programming in Statistics

* BAILEY • The Elements of Stochastic Processes with Applications to the Natural Sciences

BALAKRISHNAN and KOUTRAS • Runs and Scans with Applications

BALAKRISHNAN and NG • Precedence-Type Tests and Applications

BARNETT • Comparative Statistical Inference, Third Edition

BARNETT • Environmental Statistics

BARNETT and LEWIS • Outliers in Statistical Data, Third Edition

BARTOSZYNSKI and NIEWIADOMSKA-BUGAJ • Probability and Statistical Inference

BASILEVSKY • Statistical Factor Analysis and Related Methods: Theory and Applications

BASU and RIGDON • Statistical Methods for the Reliability of Repairable Systems

BATES and WATTS • Nonlinear Regression Analysis and Its Applications

BECHHOFER, SANTNER, and GOLDSMAN • Design and Analysis of Experiments for Statistical Selection, Screening, and Multiple Comparisons

BELSLEY • Conditioning Diagnostics: Collinearity and Weak Data in Regression

† BELSLEY, KUH, and WELSCH • Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

BENDAT and PIERSOL • Random Data: Analysis and Measurement Procedures, Fourth Edition

BERRY, CHALONER, and GEWEKE • Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner

BERNARDO and SMITH • Bayesian Theory

BHAT and MILLER • Elements of Applied Stochastic Processes, Third Edition

BHATTACHARYA and WAYMIRE • Stochastic Processes with Applications

BILLINGSLEY • Convergence of Probability Measures, Second Edition

BILLINGSLEY • Probability and Measure, Third Edition

BIRKES and DODGE • Alternative Methods of Regression

BISGAARD and KULAHCI • Time Series Analysis and Forecasting by Example

BISWAS, DATTA, FINE, and SEGAL • Statistical Advances in the Biomedical Sciences: Clinical Trials, Epidemiology, Survival Analysis, and Bioinformatics

BLISCHKE AND MURTHY (editors) • Case Studies in Reliability and Maintenance

BLISCHKE AND MURTHY • Reliability: Modeling, Prediction, and Optimization

BLOOMFIELD • Fourier Analysis of Time Series: An Introduction, Second Edition

BOLLEN • Structural Equations with Latent Variables

BOLLEN and CURRAN • Latent Curve Models: A Structural Equation Perspective

BOROVKOV • Ergodicity and Stability of Stochastic Processes

BOULEAU • Numerical Methods for Stochastic Processes

BOX • Bayesian Inference in Statistical Analysis

BOX • R. A. Fisher, the Life of a Scientist

BOX and DRAPER • Response Surfaces, Mixtures, and Ridge Analyses, Second Edition

* BOX and DRAPER • Evolutionary Operation: A Statistical Method for Process Improvement

BOX and FRIENDS • Improving Almost Anything, Revised Edition

BOX, HUNTER, and HUNTER • Statistics for Experimenters: Design, Innovation, and Discovery, Second Editon

BOX, JENKINS, and REINSEL • Time Series Analysis: Forcasting and Control, Fourth Edition

BOX, LUCEÑO, and PANIAGUA-QUIÑONES • Statistical Control by Monitoring and Adjustment, Second Edition

BRANDIMARTE • Numerical Methods in Finance: A MATLAB-Based Introduction

† BROWN and HOLLANDER • Statistics: A Biomedical Introduction

BRUNNER, DOMHOF, and LANGER • Nonparametric Analysis of Longitudinal Data in Factorial Experiments

BUCKLEW • Large Deviation Techniques in Decision, Simulation, and Estimation

CAIROLI and DALANG • Sequential Stochastic Optimization

CASTILLO, HADI, BALAKRISHNAN, and SARABIA • Extreme Value and Related Models with Applications in Engineering and Science

CHAN • Time Series: Applications to Finance with R and S-Plus®, Second Edition

CHARALAMBIDES • Combinatorial Methods in Discrete Distributions

CHATTERJEE and HADI • Regression Analysis by Example, Fourth Edition

CHATTERJEE and HADI • Sensitivity Analysis in Linear Regression

CHERNICK • Bootstrap Methods: A Guide for Practitioners and Researchers, Second Edition

CHERNICK and FRIIS • Introductory Biostatistics for the Health Sciences

CHILÈS and DELFINER • Geostatistics: Modeling Spatial Uncertainty

CHOW and LIU • Design and Analysis of Clinical Trials: Concepts and Methodologies, Second Edition

CLARKE • Linear Models: The Theory and Application of Analysis of Variance

CLARKE and DISNEY • Probability and Random Processes: A First Course with Applications, Second Edition

* COCHRAN and COX • Experimental Designs, Second Edition

COLLINS and LANZA • Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences

CONGDON • Applied Bayesian Modelling

CONGDON • Bayesian Models for Categorical Data

CONGDON • Bayesian Statistical Modelling

CONOVER • Practical Nonparametric Statistics, Third Edition

COOK • Regression Graphics

COOK and WEISBERG • Applied Regression Including Computing and Graphics

COOK and WEISBERG • An Introduction to Regression Graphics

CORNELL • Experiments with Mixtures, Designs, Models, and the Analysis of Mixture Data, Third Edition

COVER and THOMAS • Elements of Information Theory

COX • A Handbook of Introductory Statistical Methods

* COX • Planning of Experiments

CRESSIE • Statistics for Spatial Data, Revised Edition

CRESSIE and WIKLE • Statistics for Spatio-Temporal Data

CSÖRGÖ and HORVÁTH • Limit Theorems in Change Point Analysis

DANIEL • Applications of Statistics to Industrial Experimentation

DANIEL • Biostatistics: A Foundation for Analysis in the Health Sciences, Eighth Edition

* DANIEL • Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition

DASU and JOHNSON • Exploratory Data Mining and Data Cleaning

DAVID and NAGARAJA • Order Statistics, Third Edition

* DEGROOT, FIENBERG, and KADANE • Statistics and the Law

DEL CASTILLO • Statistical Process Adjustment for Quality Control

DEMARIS • Regression with Social Data: Modeling Continuous and Limited Response Variables

DEMIDENKO • Mixed Models: Theory and Applications

DENISON, HOLMES, MALLICK and SMITH • Bayesian Methods for Nonlinear Classification and Regression

DETTE and STUDDEN • The Theory of Canonical Moments with Applications in Statistics, Probability, and Analysis

DEY and MUKERJEE • Fractional Factorial Plans

DILLON and GOLDSTEIN • Multivariate Analysis: Methods and Applications

DODGE • Alternative Methods of Regression

* DODGE and ROMIG • Sampling Inspection Tables, Second Edition

* DOOB • Stochastic Processes

DOWDY, WEARDEN, and CHILKO • Statistics for Research, Third Edition

DRAPER and SMITH • Applied Regression Analysis, Third Edition

DRYDEN and MARDIA • Statistical Shape Analysis

DUDEWICZ and MISHRA • Modern Mathematical Statistics

DUNN and CLARK • Basic Statistics: A Primer for the Biomedical Sciences, Third Edition

DUPUIS and ELLIS • A Weak Convergence Approach to the Theory of Large Deviations

EDLER and KITSOS • Recent Advances in Quantitative Methods in Cancer and Human Health Risk Assessment

* ELANDT-JOHNSON and JOHNSON • Survival Models and Data Analysis

ENDERS • Applied Econometric Time Series

† ETHIER and KURTZ • Markov Processes: Characterization and Convergence

EVANS, HASTINGS, and PEACOCK • Statistical Distributions, Third Edition

FELLER • An Introduction to Probability Theory and Its Applications, Volume I, Third Edition, Revised; Volume II, Second Edition

FISHER and VAN BELLE • Biostatistics: A Methodology for the Health Sciences

FITZMAURICE, LAIRD, and WARE • Applied Longitudinal Analysis, Second Edition

* FLEISS • The Design and Analysis of Clinical Experiments

FLEISS • Statistical Methods for Rates and Proportions, Third Edition

† FLEMING and HARRINGTON • Counting Processes and Survival Analysis

FUJIKOSHI, ULYANOV, and SHIMIZU • Multivariate Statistics: High-Dimensional and Large-Sample Approximations

FULLER • Introduction to Statistical Time Series, Second Edition

† FULLER • Measurement Error Models

GALLANT • Nonlinear Statistical Models

GEISSER • Modes of Parametric Statistical Inference

GELMAN and MENG • Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives

GEWEKE • Contemporary Bayesian Econometrics and Statistics

GHOSH, MUKHOPADHYAY, and SEN • Sequential Estimation

GIESBRECHT and GUMPERTZ • Planning, Construction, and Statistical Analysis of Comparative Experiments

GIFI • Nonlinear Multivariate Analysis

GIVENS and HOETING • Computational Statistics

GLASSERMAN and YAO • Monotone Structure in Discrete-Event Systems

GNANADESIKAN • Methods for Statistical Data Analysis of Multivariate Observations, Second Edition

GOLDSTEIN and LEWIS • Assessment: Problems, Development, and Statistical Issues

GREENWOOD and NIKULIN • A Guide to Chi-Squared Testing

GROSS, SHORTLE, THOMPSON, and HARRIS • Fundamentals of Queueing Theory, Fourth Edition

GROSS, SHORTLE, THOMPSON, and HARRIS • Solutions Manual to Accompany Fundamentals of Queueing Theory, Fourth Edition

* HAHN and SHAPIRO • Statistical Models in Engineering

HAHN and MEEKER • Statistical Intervals: A Guide for Practitioners

HALD • A History of Probability and Statistics and their Applications Before 1750

HALD • A History of Mathematical Statistics from 1750 to 1930

† HAMPEL • Robust Statistics: The Approach Based on Influence Functions

HANNAN and DEISTLER • The Statistical Theory of Linear Systems

HARMAN and KULKARNI • An Elementary Introduction to Statistical Learning Theory

HARTUNG, KNAPP, and SINHA • Statistical Meta-Analysis with Applications

HEIBERGER • Computation for the Analysis of Designed Experiments

HEDAYAT and SINHA • Design and Inference in Finite Population Sampling

HEDEKER and GIBBONS • Longitudinal Data Analysis

HELLER • MACSYMA for Statisticians

HINKELMANN and KEMPTHORNE • Design and Analysis of Experiments, Volume 1: Introduction to Experimental Design, Second Edition

HINKELMANN and KEMPTHORNE • Design and Analysis of Experiments, Volume 2: Advanced Experimental Design

HOAGLIN, MOSTELLER, and TUKEY • Fundamentals of Exploratory Analysis of Variance

* HOAGLIN, MOSTELLER, and TUKEY • Exploring Data Tables, Trends and Shapes

* HOAGLIN, MOSTELLER, and TUKEY • Understanding Robust and Exploratory Data Analysis

HOCHBERG and TAMHANE • Multiple Comparison Procedures

HOCKING • Methods and Applications of Linear Models: Regression and the Analysis of Variance, Second Edition

HOEL • Introduction to Mathematical Statistics, Fifth Edition

HOGG and KLUGMAN • Loss Distributions

HOLLANDER and WOLFE • Nonparametric Statistical Methods, Second Edition

HOSMER and LEMESHOW • Applied Logistic Regression, Second Edition

HOSMER, LEMESHOW, and MAY • Applied Survival Analysis: Regression Modeling of Time-to-Event Data, Second Edition

HUBER • Data Analysis: What Can Be Learned From the Past 50 Years

† HUBER and RONCHETTI • Robust Statistics, Second Edition

HUBERTY • Applied Discriminant Analysis

HUBERTY • Applied Discriminant Analysis Second Edition

HUNT and KENNEDY • Financial Derivatives in Theory and Practice, Revised Edition

HURD and MIAMEE • Periodically Correlated Random Sequences: Spectral Theory and Practice

HUSKOVA, BERAN, and DUPAC • Collected Works of Jaroslav Hajek—with Commentary

HUZURBAZAR • Flowgraph Models for Multistate Time-to-Event Data

IMAN and CONOVER • A Modern Approach to Statistics

* JACKSON • A User’s Guide to Principle Components

JOHN • Statistical Methods in Engineering and Quality Assurance

JOHNSON • Multivariate Statistical Simulation

JOHNSON and BALAKRISHNAN • Advances in the Theory and Practice of Statistics: A Volume in Honor of Samuel Kotz

JOHNSON and BHATTACHARYYA • Statistics: Principles and Methods, Fifth Edition

JOHNSON and KOTZ • Distributions in Statistics

JOHNSON and KOTZ (editors) • Leading Personalities in Statistical Sciences: From the Seventeenth Century to the Present

JOHNSON, KOTZ, and BALAKRISHNAN • Continuous Univariate Distributions, Volume 1, Second Edition

JOHNSON, KOTZ, and BALAKRISHNAN • Continuous Univariate Distributions, Volume 2, Second Edition

JOHNSON, KOTZ, and BALAKRISHNAN • Discrete Multivariate Distributions

JOHNSON, KEMP, and KOTZ • Univariate Discrete Distributions, Third Edition

JUDGE, GRIFFITHS, HILL, LÜTKEPOHL, and LEE • The Theory and Practice of Econometrics, Second Edition

JUREKOVÁ and SEN • Robust Statistical Procedures: Aymptotics and Interrelations

JUREK and MASON • Operator-Limit Distributions in Probability Theory

KADANE • Bayesian Methods and Ethics in a Clinical Trial Design

KADANE AND SCHUM • A Probabilistic Analysis of the Sacco and Vanzetti Evidence

KALBFLEISCH and PRENTICE • The Statistical Analysis of Failure Time Data, Second Edition

KARIYA and KURATA • Generalized Least Squares

KASS and VOS • Geometrical Foundations of Asymptotic Inference

† KAUFMAN and ROUSSEEUW • Finding Groups in Data: An Introduction to Cluster Analysis

KEDEM and FOKIANOS • Regression Models for Time Series Analysis

KENDALL, BARDEN, CARNE, and LE • Shape and Shape Theory

KHURI • Advanced Calculus with Applications in Statistics, Second Edition

KHURI, MATHEW, and SINHA • Statistical Tests for Mixed Linear Models

KLEIBER and KOTZ • Statistical Size Distributions in Economics and Actuarial Sciences

KLEMELÄ • Smoothing of Multivariate Data: Density Estimation and Visualization

KLUGMAN, PANJER, and WILLMOT • Loss Models: From Data to Decisions, Third Edition

KLUGMAN, PANJER, and WILLMOT • Solutions Manual to Accompany Loss Models: From Data to Decisions, Third Edition

KOTZ, BALAKRISHNAN, and JOHNSON • Continuous Multivariate Distributions, Volume 1, Second Edition

KOVALENKO, KUZNETZOV, and PEGG • Mathematical Theory of Reliability of Time-Dependent Systems with Practical Applications

KOWALSKI and TU • Modern Applied U-Statistics

KRISHNAMOORTHY and MATHEW • Statistical Tolerance Regions: Theory, Applications, and Computation

KROESE, TAIMRE, and BOTEV • Handbook of Monte Carlo Methods

KROONENBERG • Applied Multiway Data Analysis

KVAM and VIDAKOVIC • Nonparametric Statistics with Applications to Science and Engineering

LACHIN • Biostatistical Methods: The Assessment of Relative Risks, Second Edition

LAD • Operational Subjective Statistical Methods: A Mathematical, Philosophical, and Historical Introduction

LAMPERTI • Probability: A Survey of the Mathematical Theory, Second Edition

LANGE, RYAN, BILLARD, BRILLINGER, CONQUEST, and GREENHOUSE • Case Studies in Biometry

LARSON • Introduction to Probability Theory and Statistical Inference, Third Edition

LAWLESS • Statistical Models and Methods for Lifetime Data, Second Edition

LAWSON • Statistical Methods in Spatial Epidemiology

LE • Applied Categorical Data Analysis

LE • Applied Survival Analysis

LEE and WANG • Statistical Methods for Survival Data Analysis, Third Edition

LEPAGE and BILLARD • Exploring the Limits of Bootstrap

LEYLAND and GOLDSTEIN (editors) • Multilevel Modelling of Health Statistics

LIAO • Statistical Group Comparison

LINDVALL • Lectures on the Coupling Method

LIN • Introductory Stochastic Analysis for Finance and Insurance

LINHART and ZUCCHINI • Model Selection

LITTLE and RUBIN • Statistical Analysis with Missing Data, Second Edition

LLOYD • The Statistical Analysis of Categorical Data

LOWEN and TEICH • Fractal-Based Point Processes

MAGNUS and NEUDECKER • Matrix Differential Calculus with Applications in Statistics and Econometrics, Revised Edition

MALLER and ZHOU • Survival Analysis with Long Term Survivors

MALLOWS • Design, Data, and Analysis by Some Friends of Cuthbert Daniel

MANN, SCHAFER, and SINGPURWALLA • Methods for Statistical Analysis of Reliability and Life Data

MANTON, WOODBURY, and TOLLEY • Statistical Applications Using Fuzzy Sets

MARCHETTE • Random Graphs for Statistical Pattern Recognition

MARDIA and JUPP • Directional Statistics

MASON, GUNST, and HESS • Statistical Design and Analysis of Experiments with Applications to Engineering and Science, Second Edition

McCULLOCH, SEARLE, and NEUHAUS • Generalized, Linear, and Mixed Models, Second Edition

McFADDEN • Management of Data in Clinical Trials, Second Edition

* McLACHLAN • Discriminant Analysis and Statistical Pattern Recognition

McLACHLAN, DO, and AMBROISE • Analyzing Microarray Gene Expression Data

McLACHLAN and KRISHNAN • The EM Algorithm and Extensions, Second Edition

McLACHLAN and PEEL • Finite Mixture Models

McNEIL • Epidemiological Research Methods

MEEKER and ESCOBAR • Statistical Methods for Reliability Data

MEERSCHAERT and SCHEFFLER • Limit Distributions for Sums of Independent Random Vectors: Heavy Tails in Theory and Practice

MICKEY, DUNN, and CLARK • Applied Statistics: Analysis of Variance and Regression, Third Edition

* MILLER • Survival Analysis, Second Edition

MONTGOMERY, JENNINGS, and KULAHCI • Introduction to Time Series Analysis and Forecasting

MONTGOMERY, PECK, and VINING • Introduction to Linear Regression Analysis, Fourth Edition

MORGENTHALER and TUKEY • Configural Polysampling: A Route to Practical Robustness

MUIRHEAD • Aspects of Multivariate Statistical Theory

MULLER and STOYAN • Comparison Methods for Stochastic Models and Risks

MURRAY • X-STAT 2.0 Statistical Experimentation, Design Data Analysis, and Nonlinear Optimization

MURTHY, XIE, and JIANG • Weibull Models

MYERS, MONTGOMERY, and ANDERSON-COOK • Response Surface Methodology: Process and Product Optimization Using Designed Experiments, Third Edition

MYERS, MONTGOMERY, VINING, and ROBINSON • Generalized Linear Models. With Applications in Engineering and the Sciences, Second Edition

* NELSON • Accelerated Testing, Statistical Models, Test Plans, and Data Analyses

† NELSON • Applied Life Data Analysis

NEWMAN • Biostatistical Methods in Epidemiology

OCHI • Applied Probability and Stochastic Processes in Engineering and Physical Sciences

OKABE, BOOTS, SUGIHARA, and CHIU • Spatial Tesselations: Concepts and Applications of Voronoi Diagrams, Second Edition

OLIVER and SMITH • Influence Diagrams, Belief Nets and Decision Analysis

PALTA • Quantitative Methods in Population Health: Extensions of Ordinary Regressions

PANJER • Operational Risk: Modeling and Analytics

PANKRATZ • Forecasting with Dynamic Regression Models

PANKRATZ • Forecasting with Univariate Box-Jenkins Models: Concepts and Cases

* PARZEN • Modern Probability Theory and Its Applications

PEÑA, TIAO, and TSAY • A Course in Time Series Analysis

PIANTADOSI • Clinical Trials: A Methodologic Perspective

PORT • Theoretical Probability for Applications

POURAHMADI • Foundations of Time Series Analysis and Prediction Theory POWELL • Approximate Dynamic Programming: Solving the Curses of Dimensionality

PRESS • Bayesian Statistics: Principles, Models, and Applications

PRESS • Subjective and Objective Bayesian Statistics, Second Edition

PRESS and TANUR • The Subjectivity of Scientists and the Bayesian Approach

PUKELSHEIM • Optimal Experimental Design

PURI, VILAPLANA, and WERTZ • New Perspectives in Theoretical and Applied Statistics

† PUTERMAN • Markov Decision Processes: Discrete Stochastic Dynamic Programming

QIU • Image Processing and Jump Regression Analysis

* RAO • Linear Statistical Inference and Its Applications, Second Edition

RAUSAND and HØYLAND • System Reliability Theory: Models, Statistical Methods, and Applications, Second Edition

RENCHER • Linear Models in Statistics

RENCHER • Methods of Multivariate Analysis, Second Edition

RENCHER • Multivariate Statistical Inference with Applications

* RIPLEY • Spatial Statistics

* RIPLEY • Stochastic Simulation

ROBINSON • Practical Strategies for Experimenting

ROHATGI and SALEH • An Introduction to Probability and Statistics, Second Edition

ROLSKI, SCHMIDLI, SCHMIDT, and TEUGELS • Stochastic Processes for Insurance and Finance

ROSENBERGER and LACHIN • Randomization in Clinical Trials: Theory and Practice

ROSS • Introduction to Probability and Statistics for Engineers and Scientists

ROSSI, ALLENBY, and McCULLOCH • Bayesian Statistics and Marketing

† ROUSSEEUW and LEROY • Robust Regression and Outlier Detection

* RUBIN • Multiple Imputation for Nonresponse in Surveys

RUBINSTEIN and KROESE • Simulation and the Monte Carlo Method, Second Edition

RUBINSTEIN and MELAMED • Modern Simulation and Modeling

RYAN • Modern Engineering Statistics

RYAN • Modern Experimental Design

RYAN • Modern Regression Methods, Second Edition

RYAN • Statistical Methods for Quality Improvement, Second Edition

SALEH • Theory of Preliminary Test and Stein-Type Estimation with Applications

* SCHEFFE • The Analysis of Variance

SCHIMEK • Smoothing and Regression: Approaches, Computation, and Application

SCHOTT • Matrix Analysis for Statistics, Second Edition

SCHOUTENS • Levy Processes in Finance: Pricing Financial Derivatives

SCHUSS • Theory and Applications of Stochastic Differential Equations

SCOTT • Multivariate Density Estimation: Theory, Practice, and Visualization

† SEARLE • Linear Models for Unbalanced Data

† SEARLE • Matrix Algebra Useful for Statistics

† SEARLE, CASELLA, and McCULLOCH • Variance Components

SEARLE and WILLETT • Matrix Algebra for Applied Economics

SEBER • A Matrix Handbook For Statisticians

† SEBER • Multivariate Observations

SEBER and LEE • Linear Regression Analysis, Second Edition

† SEBER and WILD • Nonlinear Regression

SENNOTT • Stochastic Dynamic Programming and the Control of Queueing Systems

* SERFLING • Approximation Theorems of Mathematical Statistics

SHAFER and VOVK • Probability and Finance: It’s Only a Game!

SILVAPULLE and SEN • Constrained Statistical Inference: Inequality, Order, and Shape Restrictions

SMALL and McLEISH • Hilbert Space Methods in Probability and Statistical Inference

SRIVASTAVA • Methods of Multivariate Statistics

STAPLETON • Linear Statistical Models, Second Edition

STAPLETON • Models for Probability and Statistical Inference: Theory and Applications

STAUDTE and SHEATHER • Robust Estimation and Testing

STOYAN, KENDALL, and MECKE • Stochastic Geometry and Its Applications, Second Edition

STOYAN and STOYAN • Fractals, Random Shapes and Point Fields: Methods of Geometrical Statistics

STREET and BURGESS • The Construction of Optimal Stated Choice Experiments: Theory and Methods

STYAN • The Collected Papers of T. W. Anderson: 1943–1985

SUTTON, ABRAMS, JONES, SHELDON, and SONG • Methods for Meta-Analysis in Medical Research

TAKEZAWA • Introduction to Nonparametric Regression

TAMHANE • Statistical Analysis of Designed Experiments: Theory and Applications

TANAKA • Time Series Analysis: Nonstationary and Noninvertible Distribution Theory

THOMPSON • Empirical Model Building

THOMPSON • Sampling, Second Edition

THOMPSON • Simulation: A Modeler’s Approach

THOMPSON and SEBER • Adaptive Sampling

THOMPSON, WILLIAMS, and FINDLAY • Models for Investors in Real World Markets

TIAO, BISGAARD, HILL, PEÑA, and STIGLER (editors) • Box on Quality and Discovery: with Design, Control, and Robustness

TIERNEY • LISP-STAT: An Object-Oriented Environment for Statistical Computing and Dynamic Graphics

TSAY • Analysis of Financial Time Series, Third Edition

UPTON and FINGLETON • Spatial Data Analysis by Example, Volume II: Categorical and Directional Data

† VAN BELLE • Statistical Rules of Thumb, Second Edition

VAN BELLE, FISHER, HEAGERTY, and LUMLEY • Biostatistics: A Methodology for the Health Sciences, Second Edition

VESTRUP • The Theory of Measures and Integration

VIDAKOVIC • Statistical Modeling by Wavelets

VINOD and REAGLE • Preparing for the Worst: Incorporating Downside Risk in Stock Market Investments

WALLER and GOTWAY • Applied Spatial Statistics for Public Health Data

WEERAHANDI • Generalized Inference in Repeated Measures: Exact Methods in MANOVA and Mixed Models

WEISBERG • Applied Linear Regression, Third Edition

WEISBERG • Bias and Causation: Models and Judgment for Valid Comparisons

WELSH • Aspects of Statistical Inference

WESTFALL and YOUNG • Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment

WHITTAKER • Graphical Models in Applied Multivariate Statistics

WINKER • Optimization Heuristics in Economics: Applications of Threshold Accepting

WONNACOTT and WONNACOTT • Econometrics, Second Edition

WOODING • Planning Pharmaceutical Clinical Trials: Basic Statistical Principles

WOODWORTH • Biostatistics: A Bayesian Introduction

WOOLSON and CLARKE • Statistical Methods for the Analysis of Biomedical Data, Second Edition

WU and HAMADA • Experiments: Planning, Analysis, and Parameter Design Optimization, Second Edition

WU and ZHANG • Nonparametric Regression Methods for Longitudinal Data Analysis

YANG • The Construction Theory of Denumerable Markov Processes

YOUNG, VALERO-MORA, and FRIENDLY • Visual Statistics: Seeing Data with Dynamic Interactive Graphics

ZACKS • Stage-Wise Adaptive Designs

ZELTERMAN • Discrete Distributions—Applications in the Health Sciences

* ZELLNER • An Introduction to Bayesian Inference in Econometrics

ZHOU, OBUCHOWSKI, and McCLISH • Statistical Methods in Diagnostic Medicine, Second Edition

* Now available in a lower priced paperback edition in the Wiley Classics Library.

† Now available in a lower priced paperback edition in the Wiley–Interscience Paperback Series.

Copyright © 2011 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representation or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Fitzmaurice, Garrett M., 1962- Applied longitudinal analysis / Garrett M. Fitzmaurice, Nan M. Laird, James H. Ware. — 2nd ed. p. cm. ISBN 978-0-470-38027-7 (hardback) 1. Longitudinal method. 2. Regression analysis. 3. Multivariate analysis. 4. Medical statistics. I. Laird, Nan M., 1943- II. Ware, James H., 1941- III. Title. QA278.F575 2011 519.5’3—dc22

2011012197

To Laura, Kieran, and Aidan      — G.M.F.

To Joel, Richard, and Lily      — N.M.L.

To Janice, Cameron, and Jake      — J.H.W.

Preface

The first edition of Applied Longitudinal Analysis was designed to serve as a textbook for a course on modern statistical methods for longitudinal data analysis, and subsequently, as a reference resource for students and researchers. The book was targeted at a broad audience: graduate students in statistics, statisticians working in the health sciences, pharmaceutical industry, and governmental health-related agencies, as well as researchers and graduate students from a variety of substantive fields. In the seven years that have elapsed since publication of the first edition, Applied Longitudinal Analysis has been used extensively in university classrooms throughout the United States and abroad. We are grateful to many colleagues, course instructors, students, and readers who have offered constructive suggestions on how the book could be improved. This feedback has been invaluable and helped shape the content of the second edition.

The feedback we received has encouraged us to retain the general structure and format of the first edition while taking the opportunity to introduce a number of new and important topics. Although there is much new material in this second edition, the principles that guided us in writing the first edition have not changed. Our primary goal is to present a rigorous and comprehensive description of modern statistical methods for the analysis of longitudinal data that is accessible to a wide range of readers. A strong emphasis is placed on the application of these methods to longitudinal data and the interpretation of results. Although the methods are presented in the setting of numerous applications to actual data sets drawn from studies in health-related fields, reflecting our own research interests in the health sciences, they apply equally to other areas of application, for example, education, psychology, and other branches of the behavioral and social sciences.

How does this edition differ from its predecessor? The major changes in this edition have resulted from the addition of six new chapters:

1. A chapter (Chapter 9) on “fixed effects models,” in which subject-specific effects are treated as fixed rather than random, has been added. This chapter complements the existing chapter on mixed effects models (Chapter 8) and includes a discussion of the relative advantages of these two classes of models.
2. In the first edition, a single chapter was devoted to marginal models and generalized estimating equations (GEE) that focused exclusively on binary and count data. We now devote two chapters (Chapters 12 and 13) to marginal models and GEE, with new material on models for ordinal responses, residual diagnostics, and issues that arise when modeling time-varying covariates.
3. A chapter (Chapter 15) on approximate methods for generalized linear mixed effects models discusses penalized quasi-likelihood (PQL) and marginal quasi-likelihood (MQL) methods. We highlight settings where these approximations are unlikely to be accurate and can yield biased estimates of effects.
4. A second chapter (Chapter 18) on missing data and dropout, focusing on multiple imputation and inverse probability weighting (IPW) methods, has been added. To give greater prominence to methods for accounting for missing data and dropout in longitudinal analyses, the two companion chapters (Chapters 17 and 18) now appear before the Advanced Topics part of the book.
5. A chapter (Chapter 19) on smoothing longitudinal data has been added to the Advanced Topics. This chapter focuses on the connection between penalized splines and linear mixed effects models.
6. A chapter on sample size and power (Chapter 20) has been added to the Advanced Topics. This chapter considers issues of sample size, power, number of repeated measurements, and study duration for longitudinal study designs.

In addition the chapter on residual analyses and diagnostics (Chapter 10) has been revised to include material on recently developed model-checking techniques based on cumulative sums of residuals and the chapters that review generalized linear models (Chapter 11) and generalized linear mixed effects models (Chapter 14) have been updated to include new material on models for ordinal data and on methods for handling overdispersion. Finally, extra problem sets have been added to many of the chapters.

As in the first edition, the prerequisites for a course based on this book are an introductory course in statistics and a strong background in regression analysis. Some previous exposure to generalized linear models (e.g., logistic regression) would be helpful, although these models are reviewed in detail in the text. An understanding of matrix algebra or calculus is not assumed. Although we do not assume a high level of mathematical preparation, we have written this book for the motivated reader who is willing to consider mathematical ideas. The more technical or mathematical sections of the book are signposted with asterisks and may be omitted at first reading without loss of continuity.

The methods described in this book require the use of appropriate statistical software. As before, we include illustrative SAS commands for performing the analyses presented throughout the text at the end of many chapters, with basic descriptions of their usage. Because many of the analyses we discuss can be performed using alternative software packages (e.g., R, S-Plus, Stata, and SPSS), this book can be supplemented with any one of them. Readers are encouraged to perform and verify the results of analyses using statistical software of their choice. Programming statements and computer output for selected examples, prepared using SAS, Stata, and R, can be downloaded from the website: www.biostat.harvard.edu/~fitzmaur/ala2e. Because statistical software is constantly evolving, we will endeavor to update the website as new procedures become available in the major statistical software packages. The thirty-two real data sets used throughout the text and problem sets to illustrate the applications of longitudinal methods also can be downloaded from the website.

We hope this second edition of Applied Longitudinal Analysis provides a broader foundation in modern methods for the analysis of longitudinal data and will prove a worthy successor to the first edition. The original impetus for writing this book arose from teaching a graduate-level course on “Applied Longitudinal Analysis” at the Harvard School of Public Health. We are especially grateful to the students who have participated in the course since its inception almost twenty years ago; we have learned much from these extraordinary students. The collection of individuals who gave us useful feedback on the first edition is far too long to list. However, we would like to thank the many friends and colleagues who have helped us with this project. A special word of thanks to Amy Herring and Russell Localio. We thank Amy for her many helpful and constructive suggestions on how the book could be improved. We thank Russell for reading a draft of the new chapters and for providing invaluable feedback and suggestions that improved their content. Thanks also to Nick Horton, Stu Lipsitz, and Caitlin Ravichandran for their helpful suggestions and insightful comments on several chapters. Finally, we thank Steve Quigley and Susanne Steitz-Filler of Wiley, for their advice and encouragement during all stages of this project.

GARRETT M. FITZMAURICE NAN M. LAIRD JAMES H. WARE

Boston, MassachusettsMay, 2011

Preface to First Edition

Our goal in writing this book is to provide a rigorous and systematic description of modern methods for analyzing data from longitudinal studies. In recent years there have been remarkable developments in methods for longitudinal analysis. Despite these important advances, the methods have been somewhat slow to move into the mainstream. Applied Longitudinal Analysis bridges the gap between theory and application by presenting a comprehensive account of these methods in a way that is accessible to a wide range of readers.

The impetus for this book arose from teaching a graduate-level course on “Applied Longitudinal Analysis” at the Harvard School of Public Health. As course instructors, we were frustrated by the lack of a suitable textbook that adequately covered modern statistical methods for longitudinal analysis at a level accessible to a broad audience of researchers and graduate students in the health and medical sciences. We envision this book as a textbook for such a course and, subsequently, as a reference resource for researchers and graduate students. It is also suitable for graduate students in statistics and for statisticians already working in the health sciences, governmental health-related agencies, and the pharmaceutical industry. It is intended to allow a diverse group of statisticians, researchers, and graduate students in substantive fields to master modern methods for longitudinal data analysis.

The scope of this book is broad, covering methods for the analysis of diverse types of longitudinal data arising in the health sciences. The methods are presented in the setting of numerous applications to real data sets. Our main emphasis is on the practical rather than the theoretical aspects of longitudinal analysis. Twenty-five real data sets, drawn from studies in health-related fields, are used throughout the text and problem sets to illustrate the applications of longitudinal methods. These data sets can be downloaded from the website for the book: www.biostat.harvard.edu/~fitzmaur/ala. Although the methods are applied to data sets drawn from the health sciences, they apply equally to other areas of application, for example, education, psychology, and other branches of the behavioral and social sciences.

Because longitudinal data are a special case of clustered data, albeit with a natural ordering of the measurements within a cluster, we include also a description of modern methods for analyzing clustered data, more broadly defined. Indeed, one of our goals is to demonstrate that methods for longitudinal analysis are, more or less, special cases of more general regression methods for clustered data. As a result a comprehensive understanding of longitudinal data analysis provides the basis for a broader understanding of methods for analyzing the wide range of clustered data that commonly arises in studies in the biomedical and health sciences.

The prerequisites for a course based on this book are an introductory course in statistics and a strong background in regression analysis. Some previous exposure to generalized linear models (e.g., logistic regression) would be helpful, although these models are reviewed in the text. An understanding of matrix algebra or calculus is not assumed; the reader will be gently introduced to only those aspects of vector and matrix notation necessary for understanding the matrix representation of regression models for longitudinal data. Because vectors and matrices are used to simplify notation, the reader is required to attain some basic facility with the addition and multiplication of vectors and matrices. Although we do not assume a high level of mathematical preparation, a willingness to read and consider mathematical ideas is required. More technical or mathematical sections of the book are marked with asterisks and may be omitted at first reading without loss of continuity.

To use the methods described in this book, appropriate statistical software is required. In general, the methods available via commercially available software lag behind the recent advances in statistical methods; longitudinal data analysis is not exceptional in this regard. Recently the introduction of new programs for analyzing multivariate and longitudinal data has made these methods far more accessible to practitioners and students. We use SAS, which is widely available, to perform the analyses presented throughout the text. Illustrative SAS commands are included at the end of many of the chapters, with basic descriptions of their usage. Programming statements and computer output for the examples, prepared using SAS, can be downloaded from the website: www.biostat.harvard.edu/~fitzmaur/ala. We selected SAS because all of the analyses we discuss can be performed using its procedures. Many of the methods can be carried out using alternative software packages (e.g., S-Plus and Stata) or special purpose programs (e.g., BMDP5-V) and this book can be supplemented with any one of them. Readers are encouraged to perform and verify the results of analyses using software of their choice. Because statistical software is constantly evolving, we anticipate that all of the methods we discuss will soon be available within most of the major statistical packages.

Throughout the text references have been kept to an absolute minimum. Instead, at the end of each chapter we include suggestions for further readings that provide more in-depth coverage of certain topics. We also include “bibliographic notes” that highlight key references in the mainstream statistical literature. Although many of our readers may find the latter references to be too technical, they are included to give due credit to those who have contributed to the statistical methods described in each chapter.

Finally, we would like to thank the many friends and colleagues who have helped us to write this book. A special word of thanks to Misha Salganik, for preparation of the diagrams and many helpful suggestions for improvement of graphical displays. We are especially grateful to Joe Hogan and Russell Localio, for reading a first draft and providing invaluable feedback, comments, and suggestions that improved the book. We would also like to thank Rino Bellocco, Brent Coull, Nick Horton, Sharon-Lise Normand, Misha Salganik, Judy Singer, S. V. Subramanian, and Florin Vaida, for their insightful comments on several chapters. We are grateful to the students who have participated in the course on “Applied Longitudinal Analysis” at the Harvard School of Public Health since its inception; they have provided the impetus and motivation for writing this book. We gratefully acknowledge support from grant GM 29745 from the National Institutes of Health. The first author gratefully acknowledges support from the Junior Faculty Sabbatical Program at the Harvard School of Public Health; the support provided by a sabbatical created a unique opportunity to begin writing this book. Last, but not least, we thank Steve Quigley and Susanne Steitz of Wiley, for their advice and encouragement during all stages of this project.

GARRETT M. FITZMAURICE NAN M. LAIRD JAMES H. WARE

Boston, Massachusetts March, 2004

Acknowledgments

Throughout this book we have used data sets drawn from published studies in health-related fields to exemplify important concepts in the analysis of longitudinal and clustered data. We are grateful to the following investigators for sharing their data with us: Graham Bentham, Doug Dockery, Brian Flay, Robert Greenberg, Keith Henry, Aviva Must, Elena Naumova, George Rhoads, Jan Schouten, Linda Van Matter, and Gwen Zahner.

We also thank the following publishers for permission to reproduce published data sets in print and electronic format: The American Statistical Association, Blackwell Publishing, Brooks/Cole (a division of Thomson Learning), CRC Press, Elsevier, Iowa State Press, Oxford University Press, and SAS Institute, Inc.

Finally, in all data sets used throughout this book, the original subject identification (ID) numbers have been deleted and replaced with new subject ID numbers, to ensure that the data sets cannot be linked to the original records.

Part I

Introduction to Longitudinal and Clustered Data

Chapter 1

Longitudinal and Clustered Data

1.1 INTRODUCTION

Research on statistical methods for the design and analysis of human investigations expanded explosively in the second half of the twentieth century. Beginning in the early 1950s, the U.S. government shifted a substantial part of its research support from military to biomedical research. The legislative foundation for the modern National Institutes of Health (NIH), the Public Health Service Act, was passed in 1944 and NIH grew rapidly throughout the 1950s and 1960s. During these “golden years” of NIH expansion, the entire NIH budget grew from $8 million in 1947 to more than $1 billion in 1966. The NIH sponsored many of the important epidemiologic studies and clinical trials of that period, including the influential Framingham Heart Study (Dawber et al., 1951; Dawber, 1980).

The typical focus of these early studies was morbidity and, especially, mortality. Investigators sought to identify the causes of early death and to evaluate the effectiveness of treatments for delaying death and morbidity. In the Framingham Heart Study, participants were seen at two-year intervals. Survival outcomes during successive two-year periods were treated as independent events and modeled using multiple logistic regression. The successful use of multiple logistic regression in this setting, and the recognition that it could be applied to case-control data, led to widespread use of this methodology beginning in the 1960s. The analysis of time-to-event data was revolutionized by the seminal 1972 paper of D. R. Cox, describing the proportional hazards model (Cox, 1972). This paper was followed by a rich and important body of work that established the conceptual basis and the computational tools for modern survival analysis.

Although the design of the Framingham Heart Study and other cohort studies called for periodic measurement of the patient characteristics thought to be determinants of chronic disease, interest in the levels and patterns of change of those characteristics over time was initially limited. As the research advanced, however, investigators began to ask questions about the behavior of these risk factors. In the Framingham Heart Study, for example, investigators began to ask whether blood pressure levels in childhood were predictive of hypertension in adult life. In the Coronary Artery Risk Development in Young Adults (CARDIA) Study, investigators sought to identify the determinants of the transition from normotensive or normocholesterolemic status in early adult life to hypertension and hypercholesterolemia in middle age (Friedman et al., 1988). In the treatment of arthritis, asthma, and other diseases that are not typically life-threatening, investigators began to study the effects of treatments on the level and change over time in measures of severity of disease. Similar questions were being posed in every disease setting. Investigators began to follow populations of all ages over time, both in observational studies and clinical trials, to understand the development and persistence of disease and to identify factors that alter the course of disease development.

This interest in the temporal patterns of change in human characteristics came at a period when advances in computing power made new and more computationally intensive approaches to statistical analysis available at the desktop. Thus, in the early 1980s, Laird and Ware proposed the use of the EM algorithm to fit a class of linear mixed effects models appropriate for the analysis of repeated measurements (Laird and Ware, 1982); Jennrich and Schluchter (1986) proposed a variety of alternative algorithms, including Fisher-scoring and Newton–Raphson algorithms. Later in the decade, Liang and Zeger introduced the generalized estimating equations in the biostatistical literature and proposed a family of generalized linear models for fitting repeated observations of binary and counted data (Liang and Zeger, 1986; Zeger and Liang, 1986). Many other investigators writing in the biomedical, educational, and psychometric literature contributed to the rapid development of methodology for the analysis of these “longitudinal” data. The past 30 years have seen considerable progress in the development of statistical methods for the analysis of longitudinal data. Despite these important advances, methods for the analysis of longitudinal data have been somewhat slow to move into the mainstream. This book bridges the gap between theory and application by presenting a comprehensive description of methods for the analysis of longitudinal data accessible to a broad range of readers.

1.2 LONGITUDINAL AND CLUSTERED DATA

The defining feature of longitudinal studies is that measurements of the same individuals are taken repeatedly through time, thereby allowing the direct study of change over time. The primary goal of a longitudinal study is to characterize the change in response over time and the factors that influence change. With repeated measures on individuals, one can capture within-individual change. Indeed, the assessment of within-subject changes in the response over time can only be achieved within a longitudinal study design. For example, in a cross-sectional study, where the response is measured at a single occasion, one can only obtain estimates of between-individual differences in the response. That is, a cross-sectional study may allow comparisons among sub-populations that happen to differ in age, but it does not provide any information about how individuals change during the corresponding period.

To highlight this important distinction between cross-sectional and longitudinal study designs, consider the following simple example. Body fatness in girls is thought to increase just before or around menarche, leveling off approximately 4 years after menarche. Suppose that investigators are interested in determining the increase in body fatness in girls after menarche. In a cross-sectional study design, investigators might obtain measurements of percent body fat on two separate groups of girls: a group of 10-year-old girls (a pre-menarcheal cohort) and a group of 15-year-old girls (a post-menarcheal cohort). In this cross-sectional study design, direct comparison of the average percent body fat in the two groups of girls can be made using a two-sample (unpaired) t-test. This comparison does not provide an estimate of the change in body fatness as girls age from 10 to 15 years. The effect of growth or aging, an inherently within-individual effect, simply cannot be estimated from a cross-sectional study that does not obtain measures of how individuals change with time. In a cross-sectional study the effect of aging is potentially confounded with possible cohort effects. Put in a slightly different way, there are many characteristics that differentiate girls in these two different age groups that could distort the relationship between age and body fatness. On the other hand, a longitudinal study that measures a single cohort of girls at both ages 10 and 15 can provide a valid estimate of the change in body fatness as girls age. In the longitudinal study the analysis is based on a paired t-test, using the difference or change in percent body fat within each girl as the outcome variable. This within-individual comparison provides a valid estimate of the change in body fatness as girls age from 10 to 15 years. Moreover, since each girl acts as her own control, changes in percent body fat throughout the duration of the study are estimated free of any between-individual variation in body fatness.

A distinctive feature of longitudinal data is that they are clustered. In longitudinal studies the clusters are composed of the repeated measurements obtained from a single individual at different occasions. Observations within a cluster will typically exhibit positive correlation, and this correlation must be accounted for in the analysis. Longitudinal data also have a temporal order; the first measurement within a cluster necessarily comes before the second measurement, and so on. The ordering of the repeated measures has important implications for analysis. There are, however, many studies in the health sciences that are not longitudinal in this sense but which give rise to data that are clustered or cluster-correlated. For example, clustered data commonly arise when intact groups are randomized to health interventions or when naturally occurring groups in the population are randomly sampled. An example of the former is group-randomized trials. In a group-randomized trial, also known as a cluster-randomized trial, groups of individuals, rather than each individual alone, are randomized to different treatments or health interventions. Data on the health outcomes of interest are obtained on all individuals within a group. Alternatively, clustered data can arise from random sampling of naturally occurring groups in the population. Families, households, hospital wards, medical practices, neighborhoods, and schools are all instances of naturally occurring clusters in the population that might be the primary sampling units in a study. Finally, clustered data can arise when data on the health outcome of interest are simultaneously obtained either from multiple raters or from different measurement instruments.

In all these examples of clustered data, we might reasonably expect that measurements on units within a cluster are more similar than the measurements on units in different clusters. The degree of clustering can be expressed in terms of correlation among the measurements on units within the same cluster. This correlation invalidates the crucial assumption of independence that is the cornerstone of so many standard statistical techniques. Instead, statistical models for clustered data must explicitly describe and account for this correlation. Because longitudinal data are a special case of clustered data, albeit with a natural ordering of the measurements within a cluster, this book includes a description of modern methods of analysis for clustered data, more broadly defined. Indeed, one of the goals of this book is to demonstrate that methods for the analysis of longitudinal data are, more or less, special cases of more general regression methods for clustered data. As a result a comprehensive understanding of methods for the analysis of longitudinal data provides the basis for a broader understanding of methods for analyzing the wide range of clustered data that commonly arises in studies in the biomedical and health sciences.

The examples described above consider only a single level of clustering, for example, repeated measurements on individuals. More recently investigators have developed methodology for the analysis of multilevel data, in which observations may be clustered at more than one level. For example, the data may consist of repeated measurements on patients clustered by clinic. Alternatively, the data may consist of observations on children nested within classrooms, nested within schools. Although the analysis of multilevel data is not the primary focus of this book, multilevel data are discussed in Chapter 22.

Interest in the analysis of longitudinal and multilevel data continues to grow. New and more flexible models have been developed and advances in computation, such as Markov chain Monte Carlo (MCMC) methods, have allowed greater flexibility in model specification. Moreover, improvements in statistical software packages, especially SAS, Stata, SPSS, R, and S-Plus, have made these models much more accessible for use in routine data analysis. Despite these advances, however, methods for the analysis of longitudinal data are not widely used and are seen to be accessible only to statisticians with specialized expertise.

We believe that the methodology for the analysis of longitudinal data can be much more widely understood and applied. It is our hope that this book will help make that possible. It provides a comprehensive introduction to methods for the analysis of longitudinal data, written for a reader with a basic knowledge of statistics and a strong background in regression analysis. The book does not require a high level of mathematical preparation but does assume a willingness to read and consider mathematical ideas.

1.3 EXAMPLES

To highlight some of the distinctive features of longitudinal and clustered data, we introduce four examples drawn from studies in the biomedical sciences. These four examples will be used later in the book to illustrate different analytic approaches. Additional examples, also drawn from studies in the biomedical and health sciences, will be introduced in later chapters of the book.

1.3.1 Treatment of Lead-Exposed Children (TLC) Trial

Exposure to lead can produce cognitive impairment, especially among young children and infants. A young child exposed to high levels of lead may experience various adverse health effects, including hyperactivity, hearing or memory loss, learning disabilities, and damage to the brain and nervous system. Although the use of lead as an additive in gasoline has been discontinued, at least in the United States, resulting in a dramatic reduction in airborne lead levels, a small percentage of children continue to be exposed to lead at levels that can produce impairment. Much of this exposure is due to deteriorating lead-based paint (e.g., chipping and peeling paint) in older homes. Lead was used as a pigment and drying agent in “alkyd” oil-based paint. While the United States government banned the use of lead-based paint in housing in 1978, many homes built before 1978 contain lead-based paint. When lead-based paint deteriorates, it becomes lead paint chips, which can be eaten by young children, and lead-contaminated paint dust, which can be ingested by young children during normal teething and hand-to-mouth behavior. The U.S. Centers for Disease Control and Prevention (CDC) has concluded that children with blood lead levels above 10 micrograms per deciliter (μg/dL) of whole blood are at risk of adverse health effects.

Lead poisoning in children is treatable in the sense that there are medical interventions, known as chelation treatments, that can help a child to excrete the lead that has been ingested. Until recently chelation treatment of children with high levels of blood lead was administered by injection and required hospitalization. A new chelating agent, succimer, enhances urinary excretion of lead and has the distinct advantage that it can be given orally, rather than by injection. In the 1990s the Treatment of Lead-Exposed Children (TLC) Trial Group conducted a placebo-controlled, randomized trial of succimer in children with confirmed blood lead levels of 20 to 44 μg/dL, levels well above the CDC’s threshold for concern about the adverse health effects of exposure to lead (Treatment of Lead-Exposed Children (TLC) Trial Group, 2000; Rogan et al., 2001). The children were aged 12 to 33 months at enrollment and lived in deteriorating inner city housing. The mean age of the children at randomization was 2 years and the mean blood lead level was 26 μg/dL. Children received up to three 26-day courses of succimer or placebo and were followed for 3 years.

Table 1.1 presents data on blood lead levels at baseline, week 1, week 4, and week 6 for 10 randomly selected children from the study. The mean blood lead levels at each measurement occasion for a random subset of 100 children, broken down by treatment group, are presented in Table 1.2