Biostatistical Methods - John M. Lachin - E-Book

Biostatistical Methods E-Book

John M. Lachin

0,0
143,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Praise for the First Edition ". . . an excellent textbook . . . an indispensable reference for biostatisticians and epidemiologists." --International Statistical Institute A new edition of the definitive guide to classical and modern methods of biostatistics Biostatistics consists of various quantitative techniques that are essential to the description and evaluation of relationships among biologic and medical phenomena. Biostatistical Methods: The Assessment of Relative Risks, Second Edition develops basic concepts and derives an expanded array of biostatistical methods through the application of both classical statistical tools and more modern likelihood-based theories. With its fluid and balanced presentation, the book guides readers through the important statistical methods for the assessment of absolute and relative risks in epidemiologic studies and clinical trials with categorical, count, and event-time data. Presenting a broad scope of coverage and the latest research on the topic, the author begins with categorical data analysis methods for cross-sectional, prospective, and retrospective studies of binary, polychotomous, and ordinal data. Subsequent chapters present modern model-based approaches that include unconditional and conditional logistic regression; Poisson and negative binomial models for count data; and the analysis of event-time data including the Cox proportional hazards model and its generalizations. The book now includes an introduction to mixed models with fixed and random effects as well as expanded methods for evaluation of sample size and power. Additional new topics featured in this Second Edition include: * Establishing equivalence and non-inferiority * Methods for the analysis of polychotomous and ordinal data, including matched data and the Kappa agreement index * Multinomial logistic for polychotomous data and proportional odds models for ordinal data * Negative binomial models for count data as an alternative to the Poisson model * GEE models for the analysis of longitudinal repeated measures and multivariate observations Throughout the book, SAS is utilized to illustrate applications to numerous real-world examples and case studies. A related website features all the data used in examples and problem sets along with the author's SAS routines. Biostatistical Methods, Second Edition is an excellent book for biostatistics courses at the graduate level. It is also an invaluable reference for biostatisticians, applied statisticians, and epidemiologists.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 1131

Veröffentlichungsjahr: 2014

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Preface

Preface to First Edition

1 Biostatistics and Biomedical Science

1.1 Statistics and the Scientific Method

1.2 Biostatistics

1.3 Natural History of Disease Progression

1.4 Types of Biomedical Studies

1.5 Studies of Diabetic Nephropathy

2 Relative Risk Estimates and Tests for Independent Groups

2.1 Probability as a Measure of Risk

2.2 Measures of Differential or Relative Risk

2.3 Large Sample Distribution

2.4 Sampling Models: Likelihoods

2.5 Exact Inference

2.6 Large Sample Inferences

2.7 SAS PROC FREQ

2.8 Other Measures of Differential Risk

2.9 Polychotomous and Ordinal Data

2.10 Two Independent Groups with Polychotomous Response

2.11 Multiple Independent Groups

2.12 Problems

3 Sample Size, Power, and Efficiency

3.1 Estimation Precision

3.2 Power of Z-Tests

3.3 Test for Two Proportions

3.4 Power of Chi-Square Tests

3.5 SAS PROC POWER

3.6 Efficiency

3.7 Problems

4 Stratified-Adjusted Analysis for Independent Groups

4.1 Introduction

4.2 Mantel-Haenszel Test and Cochran’s Test

4.3 Stratified-Adjusted Estimators

4.4 Nature of Covariate Adjustment

4.5 Multivariate Tests of Hypotheses

4.6 Tests of Homogeneity

4.7 Efficient Tests of No Partial Association

4.8 Asymptotic Relative Efficiency of Competing Tests

4.9 Maximin-Efficient Robust Tests

4.10 Random Effects Model

4.11 Power and Sample Size for Tests of Association

4.12 Polychotomous and Ordinal Data

4.13 Problems

5 Case-Control and Matched Studies

5.1 Unmatched Case-Control (Retrospective) Sampling

5.2 Matching

5.3 Tests of Association for Matched Pairs

5.4 Measures of Association for Matched Pairs

5.5 Pair-Matched Retrospective Study

5.6 Power Function of McNemar’s Test

5.7 Stratified Analysis of Pair-Matched Tables

5.8 Multiple Matching: Mantel-Haenszel Analysis

5.9 Matched Polychotomous Data

5.10 Kappa Index of Agreement

5.11 Problems

6 Applications of Maximum Likelihood and Efficient Scores

6.1 Binomial

6.2 2×2 Table: Product Binomial (Unconditionally)

6.3 2×2 Table, Conditionally

6.4 Score-Based Estimate

6.5 Stratified Score Analysis of Independent 2×2 Tables

6.6 Matched Pairs

6.7 Iterative Maximum Likelihood

6.8 Problems

7 Logistic Regression Models

7.1 Unconditional Logistic Regression Model

7.2 Interpretation of the Logistic Regression Model

7.3 Tests of Significance

7.4 Interactions

7.5 Measures of the Strength of Association

7.6 Conditional Logistic Regression Model for Matched Sets

7.7 Models for Polychotomous or Ordinal Data

7.8 Random Effects and Mixed Models

7.9 Models for Multivariate or Repeated Measures

7.10 Problems

8 Analysis of Count Data

8.1 Event Rates and the Homogeneous Poisson Model

8.2 Overdispersed Poisson Model

8.3 Poisson Regression Model

8.4 Overdispersed and Robust Poisson Regression

8.5 Conditional Poisson Regression for Matched Sets

8.6 Negative Binomial Models

8.7 Power and Sample Size

8.8 Multiple Outcomes

8.9 Problems

9 Analysis of Event-Time Data

9.1 Introduction to Survival Analysis

9.2 Lifetable Construction

9.3 Family of Weighted Mantel-Haenszel Tests

9.4 Proportional Hazards Models

9.5 Evaluation of Sample Size and Power

9.6 Additional Models

9.7 Analysis of Recurrent Events

9.8 Problems

Appendix Statistical Theory

A.1 Introduction

A.2 Central Limit Theorem and the Law of Large Numbers

A.3 Delta Method

A.4 Slutsky’s Convergence Theorem

A.5 Least Squares Estimation

A.6 Maximum Likelihood Estimation and Efficient Scores

A.7 Tests of Significance

A.8 Explained Variation

A.9 Robust Inference

A.10 Generalized Linear Models and Quasi-likelihood

A.11 Generalized Estimating Equations (GEE)

References

Author Index

Subject Index

WILEY SERIES IN PROBABILITY AND STATISTICS

Established by WALTER A. SHEWHART and SAMUEL S. WILKS

Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Iain M. Johnstone, Geert Molenberghs, David W. Scott, Adrian F. M. Smith, Ruey S. Tsay, Sanford Weisberg

Editors Emeriti: Vic Barnett, J. Stuart Hunter, Joseph B. Kadane, Jozef L. Teugels

A complete list of the titles in this series appears at the end of this volume.

Copyright © 2011 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750–8400, fax (978) 750–4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748–6011, fax (201) 748–6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762–2974, outside the United States at (317) 572–3993 or fax (317) 572–4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Lachin, John M., 1942–Biostatistical methods : the assessment of relative risks / John M. Lachin. -- 2nd ed.p. cm. — (Wiley series in probability and statistics ; 807)Summary: “This book focuses on the comparison, contrast, and assessment of risks on the basis of clinical investigations. It develops basic concepts as well as deriving biostatistical methods through both the application of classical mathematical statistical tools and more modern likelihood-based theories. The first half of the book presents methods for the analysis of single and multiple 2×2 tables for cross-sectional, prospective, and retrospective (case-control) sampling, with and without matching using fixed and two-stage random effects models. The text then moves on to present a more modern likelihood- or model-based approach, which includes unconditional and conditional logistic regression; the analysis of count data and the Poisson regression model; the analysis of event time data, including the proportional hazards and multiplicative intensity models; and elements of categorical data analysis (expanded in this edition). SAS subroutines are both showcased in the text and embellished online by way of a dedicated author website. The book contains a technical, but accessible appendix that presents the core mathematical statistical theory used for the development of classical and modern statistical methods”—Provided by publisher.Includes bibliographical references and index.ISBN 978-0-470-50822-0 (hardback)1. Medical statistics. 2. Health risk assessment—Statistical methods. 3. Medicine—Research— Statistical methods. I. Title.RA409.L33 2010610.72—dc 22 2010018482

10 9 8 7 6 5 4 3 2 1

To my family

Preface

Ten years ago, almost to the day, I completed the first edition of this book. In the interim I and others have used the text as the basis for an M.S.- or Ph.D.-level course on biostatistical methods. My own three-hour course covered most of Chapters 1 to 8 and the Appendix. However, when my editor at John Wiley & Sons approached others who had also used the book for a course, I received a number of suggestions for expansion of the material, among the most prominent being the inclusion of methods for the analysis of polychotomous and ordinal data. Thus, in this second edition, throughout the text these methods are described. See the new Sections 2.9 to 2.11, 3.4, 3.5, 4.12, 5.8 to 5.10 and 7.7 in the table of contents. The evaluation of power and sample size for many of these methods has also been added to the text.

In addition, I have added a review of methods for the analysis of longitudinal repeated measures or multivariate observations, especially the family of models fit using generalized estimating equations (Sections 7.9 and A.11). I also now present an introduction to mixed models with fixed and random effects (Section 7.8).

Other additions include assessment of equivalence and noninferiority (Section 2.6.8), and sample size evaluation for such assessments (Sections 3.3.3 and 3.3.4), a discussion of adjustment for clinic effects in a multicenter study (Sections 7.6.4 and 7.8.2) and a description of negative binomial models for count data as an alternative to the Poisson model (Sections 8.6 and 8.7.2).

All of the methods are illustrated using SAS procedures from version 9.2. Readers are encouraged to review the SAS manuals that provide a more extensive review of the available options.

For the first edition, I established a website for the book that included all of the programs I used for all examples in the book, and all of the data sets used in the book.

This website has been updated to include the additional programs and data sets used herein. The book website is www.bsc.gwu.edu/jml/biostatmethods.

The first edition was replete with various typographical and computational errors and an errata was posted on the book website. As the book went through additional printings, I was able to correct most of these errors. I have been much more diligent in proofing the second edition but expect that I and others will find errors in this edition as well. Please check the website for an Erratumto this edition, and please bring any errors to my attention at lachin@gwu.edu.

I greatly appreciate the comments and corrections from those who read the first edition or used it for courses. I hope that this second edition will provide a book that is more useful for a broader range of curricula. While I hope that the book will be a useful technical reference, my basic objective for this edition, as for the first edition, has been to provide a graduate-level text that spans the classical and more modern spectrum of biostatistical methods. To that end I trust that the book will be useful for students, faculty, and the profession in general.

JOHN M. LACHIN

Rockville, Maryland

PREFACE TO FIRST EDITION

In 1993 to 1994 I led the effort to establish a graduate program in biostatistics at the George Washington University. The program, which I now direct, was launched in 1995 and is a joint initiative of the Department of Statistics, the Biostatistics Center (which I have directed since 1988) and the School of Public Health and Health Services. Biostatistics has long been a specialty of the statistics faculty, starting with Samuel Greenhouse, who joined the faculty in 1946. When Jerome Cornfield joined the faculty in 1972, he established a two-semester sequence in biostatistics (Statistics 225-6) as an elective for the graduate program in statistics (our 200 level being equivalent to the 600 level in other schools). Over the years these courses were taught by many faculty as a lecture course on current topics. With the establishment of the graduate program in biostatistics, however, these became pivotal courses in the graduate program and it was necessary that Statistics 225 be structured so as to provide students with a review of the foundations of biostatistics.

Thus I was faced with the question “what are the foundations of biostatistics?” In my opinion, biostatistics is set apart from other statistics specialties by its focus on the assessment of risks and relative risks through clinical research. Thus biostatistical methods are grounded in the analysis of binary and count data such as in 2×2 tables. For example, the Mantel-Haenszel procedure for stratified 2×2 tables forms the basis for many families of statistical procedures such as the Gρ family of modern statistical tests in the analysis of survival data. Further, all common medical study designs, such as the randomized clinical trial and the retrospective case-control study, are rooted in the desire to assess relative risks. Thus I developed Statistics 225, and later this text, around the principle of the assessment of relative risks in clinical investigations.

In doing so, I felt that it was important first to develop basic concepts and derive core biostatistical methods through the application of classical mathematical statistical tools, and then to show that these and comparable methods may also be developed through the application of more modern, likelihood-based theories. For example, the large sample distribution of the Mantel-Haenszel test can be derived using the large sample approximation to the hypergeometric and the Central Limit Theorem, and also as an efficient score test based on a hypergeometric likelihood.

Thus the first five chapters present methods for the analysis of single and multiple 2×2 tables for cross-sectional, prospective and retrospective (case-control) sampling, without and with matching. Both fixed and random effects (two-stage) models are employed. Then, starting in Chapter 6 and proceeding through Chapter 9, a more modern likelihood or model-based treatment is presented. These chapters broaden the scope of the book to include the unconditional and conditional logistic regression models in Chapter 7, the analysis of count data and the Poisson regression model in Chapter 8, and the analysis of event time data including the proportional hazards and multiplicative intensity models in Chapter 9. Core mathematical statistical tools employed in the text are presented in the Appendix. Following each chapter problems are presented that are intended to expose the student to the key mathematical statistical derivations of the methods presented in that chapter, and to illustrate their application and interpretation.

Although the text provides a valuable reference to the principal literature, it is not intended to be exhaustive. For this purpose, readers are referred to any of the excellent existing texts on the analysis of categorical data, generalized linear models and survival analysis. Rather, this manuscript was prepared as a textbook for advanced courses in biostatistics. Thus the course (and book) material was selected on the basis of its current importance in biostatistical practice and its relevance to current methodological research and more advanced methods. For example, Cornfield’s approximate procedure for confidence limits on the odds ratio, though brilliant, is no longer employed because we now have the ability to readily perform exact computations. Also, I felt it was more important that students be exposed to over-dispersion and the use of the information sandwich in model-based inference than to residual analysis in regression models. Thus each chapter must be viewed as one professor’s selection of relevant and insightful topics.

In my Statistics 225 course, I cover perhaps two-thirds of the material in this text. Chapter 9, on survival analysis, has been added for completeness, as has the section in the Appendix on quasi-likelihood and the family of generalized linear models. These topics are covered in detail in other courses. My detailed syllabus for Statistics 225, listing the specific sections covered and exercises assigned, is available at the Biostatistics Center web site (www.bsc.gwu.edu/jml/biostatmethods). Also, the data sets employed in the text and problems are available at this site or the web site of John Wiley and Sons, Inc. (www.wiley.com).

Although I was not trained as a mathematical statistician, during my career I have learned much from those with whom I have been blessed with the opportunity to collaborate (chronologically): Jerry Cornfield, Sam Greenhouse, Nathan Mantel, and Max Halperin, among the founding giants in biostatistics; and also Robert Smythe, L.J. Wei, Peter Thall, K.K. Gordon Lan and Zhaohai Li, among others, who are among the best of their generation. I have also learned much from my students, who have always sought to better understand the rationale for biostatistical methods and their application.

I especially acknowledge the collaboration of Zhaohai Li, who graciously agreed to teach Statistics 225 during the fall of 1998, while I was on sabbatical leave. His detailed reading of the draft of this text identified many areas of ambiguity and greatly improved the mathematical treatment. I also thank Costas Cristophi for typing my lecture notes, and Yvonne Sparling for a careful review of the final text and programming assistance. I also wish to thank my present and former statistical collaborators at the Biostatistics Center, who together have shared a common devotion to the pursuit of good science: Raymond Bain, Oliver Bautista, Patricia Cleary, Mary Foulkes, Sarah Fowler, Tavia Gordon, Shuping Lan, James Rochon, William Rosenberger, Larry Shaw, Elizabeth Thom, Desmond Thompson, Dante Verme, Joel Verter, Elizabeth Wright, and Naji Younes, among many.

Finally, I especially wish to thank the many scientists with whom I have had the opportunity to collaborate in the conduct of medical research over the past 30 years: Dr. Joseph Schachter, who directed the Research Center in Child Psychiatry where I worked during graduate training; Dr. Leslie Schoenfield, who directed the National Cooperative Gallstone Study; Dr. Edmund Lewis, who directed the Collaborative Study Group in the conduct of the Study of Plasmapheresis in Lupus Nephritis and the Study of Captropil in Diabetic Nephropathy; Dr. Thomas Garvey, who directed the preparation of the New Drug Application for treatment of gallstones with ursodiol; Dr. Peter Stacpoole, who directed the Study of Dichloroacetate in the Treatment of Lactic Acidosis; and especially Drs. Oscar Crofford, Saul Genuth and David Nathan, among many others, with whom I have collaborated since 1982 in the conduct of the Diabetes Control and Complications Trial, the study of the Epidemiology of Diabetes Interventions and Complications, and the Diabetes Prevention Program. The statistical responsibility for studies of such great import has provided the dominant motivation for me to continually improve my skills as a biostatistician.

JOHN M. LACHIN

Rockville, Maryland

1

Biostatistics and Biomedical Science

1.1 STATISTICS AND THE SCIENTIFIC METHOD

The aim of all biomedical research is the acquisition of new information so as to expand the body of knowledge that comprises the biomedical sciences. This body of knowledge consists of three broad components:

1. descriptions of phenomena in terms of observable characteristics of elements or events;
2. descriptions of associations among phenomena;
3. descriptions of causal relationships between phenomena.

The various sciences can be distinguished by the degree to which each contains knowledge of each of these three types. The hard sciences (e.g., physics and chemistry) contain large bodies of knowledge of the third kind — causal relationships. The soft sciences (e.g., the social sciences) principally contain large bodies of information of the first and second kind — phenomenological and associative.

None of these descriptions, however, are exact. To quote the philosopher and mathematician Jacob Bronowski (1973):

All information is imperfect. We have to treat it with humility…. Errors are inextricably bound up with the nature of human knowledge….

Thus, every science consists of shared information, all of which, to some extent, is uncertain.

When a scientific investigator adds to the body of scientific knowledge, the degree of uncertainty about each piece of information is described through statistical assessments of the probability that statements are either true or false. Thus, the language of science is statistics, for it is through the process of statistical analysis and interpretation that the investigator communicates the results to the scientific community. The syntax of this language is probability, because the laws of probability are used to assess the inherent uncertainty, errors, or precision of estimates of population parameters, and probabilistic statements are used as the basis for drawing conclusions.

The means by which the investigator attempts to control the degree of uncertainty in research conclusions is the application of the scientific method. In a nutshell, the scientific method is a set of strategies, based on common sense and statistics, that is intended to minimize the degree of uncertainty and maximize the degree of validity of the resulting knowledge. Therefore, the scientific method is deeply rooted in statistical principles.

When considered sound and likely to be free of error, such knowledge is termed scientifically valid. The designation of scientific validity, however, is purely subjective. The soundness or validity of any scientific result depends on the manner in which the observations were collected, that is, on the design and conduct of the study, as well as the manner in which the data were analyzed.

Therefore, in the effort to acquire scientifically valid information, one must consider the statistical aspects of all elements of a study:– its design, execution and analysis. To do so requires a firm understanding of the statistical basis for each type of study and for the analytic strategies commonly employed to assess a study’s objectives.

1.2 BIOSTATISTICS

Biostatistics is characterized principally by the application of statistical principles to the biological/biomedical sciences, in contrast to other areas of application of statistics, such as psychometrics and econometrics. Thus, biostatistics refers to the development of statistical methods for, and the application of statistical principles to, the study of biological and medical phenomena.

Biomedical research activities range from the study of cellular biology to clinical therapeutics. At the basic physical level it includes bench research, the study of genetic, biochemical, physiological, and biological processes, such as the study of genetic defects, metabolic pathways, kinetic models, and pharmacology. Although some studies in this realm involve investigation in animals and humans (in vivo), many of these investigations are conducted in “test tubes” (in vitro). The ultimate objective of these inquiries is to advance our understanding of the pathobiology or pathophysiology of human diseases and of the potential mechanisms for their treatment.

Clinical research refers to direct observation of the clinical features of populations. This includes epidemiology, which can be broadly defined as the study of the distribution and etiology of human disease. Some elements, such as infectious disease epidemiology, are strongly biologically based, whereas others are more heavily dependent on empirical observations within populations. The latter include such areas as occupational and environmental epidemiology, the study of the associations between occupational and environmental exposures with the risk of specific diseases. This type of epidemiology is often characterized as population based because it relies on the observation of natural samples from populations.

Ultimately, bench research or epidemiologic observation leads to advances in medical therapeutics: the development of new pharmaceuticals (drugs), devices, surgical procedures, or interventions. Such therapeutic advances are often assessed using a randomized, controlled clinical trial. Such studies evaluate the biological effectiveness of the new agent (biological efficacy), the clinical effectiveness of the therapy in practice (the intention-to-treat comparison), as well as the incidence of adverse effects.

The single feature that most sharply distinguishes clinical biomedical research from other forms of biological research is the propensity to assess the absolute and relative risks of various outcomes within populations. Absolute risk refers to the distribution of a disease, or risk factors for a disease, in a population. This risk may be expressed cross-sectionally as a simple probability, or it may be expressed longitudinally over time as a hazard function (or survival function) or an intensity process. Relative risk refers to a measure of the difference in risks among subsets of the population with specific characteristics, such as those exposed to a risk factor versus not exposed, or those randomly assigned to a new drug treatment versus a placebo control. The relative risk of an outcome is sometimes described as a difference in the absolute risks of the outcome, the ratio of the risks, or a ratio of the odds of the outcome.

Thus, a major part of biostatistics concerns the assessment of absolute and relative risks through epidemiological studies of various types and randomized clinical trials. This, in general, is the subject of the book. This entails the study of discrete outcomes, some of which are assessed over time. It also includes many major areas of statistics that are beyond the scope of any single book. For example, the analysis of longitudinal data is another of the various types of processes studied through biostatistics. In many studies, however, interest in a longitudinal quantitative or ordinal measure arises because of its fundamental relationship to an ultimate discrete outcome of interest. For example, longitudinal analysis of quantitative serum cholesterol levels in a population is of interest because of the strong relationship between serum lipids and the risk of cardiovascular disease. Thus, this text is devoted exclusively to the assessment of the risks of discrete characteristics or events in populations.

1.3 NATURAL HISTORY OF DISEASE PROGRESSION

Underlying virtually all clinical research is some model of our understanding of the natural history of the progression of the disease under investigation. As an example, consider the study of diabetic nephropathy (kidney disease) associated with type 1 or insulin-dependent diabetes mellitus, also known as juvenile diabetes. Diabetes is characterized by a state of metabolic dysfunction in which the subject is deficient in endogenous (self-produced) insulin. Thus, the patient must administer exogenous insulin by some imperfect mechanical device, such as by multiple daily injections or a continuous subcutaneous insulin infusion (CSII) device, also called a “pump”. Because of technological deficiencies with the way that insulin can be administered, it is difficult to maintain normal levels of blood glucose throughout the day, day after day. The resulting hyperglycemia leads to microvascular complications, the two most prevalent being diabetic retinopathy (disease of the retina in the eye) and diabetic nephropathy, and ultimately to cardiovascular disease.

Table 1.1 Stages of Progression of Diabetic Nephropathy

1.

Normal: Albumin excretion rate (AER) ≤ 40 mg/24 h

2.

Microalbuminuria: 40 < AER < 300 mg/24 h

3.

Proteinuria (overt albuminuria): AER ≥ 300 mg/24 h

4.

Renal insufficiency: Serum creatinine > 2 mg/dL

5.

End-stage renal disease: Need for dialysis or renal transplant

6.

Mortality

Diabetic nephropathy is known to progress through a well-characterized sequence of disease states, characterized in Table 1.1. The earliest sign of emergent kidney disease is the leakage of small amounts of protein (albumin) into urine. The amount or rate of albumin excretion can be measured from a timed urine collection in which all the urine voided over a fixed period of time is collected. From the measurement of the urine volume and the concentration of albumin in the serum and urine at specific intervals of time, it is possible to compute the albumin excretion rate (AER) expressed as the mg/24 h of albumin excreted into the urine by the kidneys.

In the normal (nondiseased) subject, the AER is no greater than 40 mg/24 h - some would say no greater than 20 or 30 mg/24 h. The earliest sign of possible diabetic nephropathy is microalbuminuria, defined as an AER > 40 mg/24 h (but < 300 mg/24 h). As the disease progresses, the next landmark is the development of definite albuminuria, defined as an AER > 300 mg/24 h. This is often termed overt proteinuria because it is at this level of albumin (protein) excretion that a simple dipstick test for protein in urine will be positive. This is also the point at which nephropathy, and the biological processes that ultimately lead to destruction of the kidney, are considered well established.

To then chart the further loss of kidney function, a different measure is used: the glomerular filtration rate (GFR). The glomerulus is the cellular structure that serves as the body’s filtration system. As diabetic nephropathy progresses, fewer and fewer intact glomeruli remain, so that the rate of filtration declines, starting with the leakage of protein and other elements into the urine. The GFR is difficult to measure accurately. In practice, a measure of creatinine clearance, also from a timed urine collection, or a simple measure of the creatinine concentration in serum is used to monitor disease progression. Renal insufficiency is often declared when the serum creatinine exceeds 2 mg/dL. This is followed by end-stage renal disease (ESRD), at which point the patient requires frequent dialysis or renal transplantation to prolong survival. Ultimately the patient dies from the renal insufficiency or related causes if a suitable donor kidney is not available for transplantation.

Thus, the natural history of diabetic nephropathy is described by a collection of quantitative, ordinal, and qualitative assessments. In the early stages of the disease, a study might focus entirely on quantitative measures of AER. Later, during the middle stages of the disease, this becomes problematic. For example, patients with established proteinuria may be characterized over time using a measure of GFR, but the analysis will be complicated by informatively missing observations because some patients reached ESRD or died before the scheduled completion of follow-up.

However, a study that assesses the risk of discrete outcomes, such as the incidence or prevalence of proteinuria or renal insufficiency, is less complicated by such factors and is readily interpretable by physicians. For example, if a study shows that a new drug treatment reduces the mean AER by 10 mg/24 h less than that with placebo, it is difficult to establish the clinical significance of the result. On the other hand, if the same study demonstrated a relative risk of developing proteinuria of 0.65, a 35% risk reduction with drug treatment versus placebo, the clinical significance is readily apparent to most physicians.

Therefore, we shall focus on the description of the absolute and relative risks of discrete outcomes, historically the core of biostatistics.

1.4 TYPES OF BIOMEDICAL STUDIES

Biomedical research employs various types of study designs, some of which involve formal experimentation, others not, among other characteristics. In this section the characteristics and the roles of each type of study are described briefly.

Study designs can be distinguished by three principal characteristics:

1. Number of samples: single versus multiple samples.
2.Source of samples: natural versus experimental. An experimental sample is one to which a treatment or procedure has been applied by the investigator. This may or may not involve randomization as an experimental device to assign treatments to individual patients.
3. Time course of observation: prospective versus retrospective versus concurrent collection of measurements and observation of responses or outcome events.

Based on these characteristics, there are basically four types of designs for biomedical studies in humans: (1) the cross-sectional study, (2) the cohort study, (3) the case-control study, and (4) the randomized experiment. A more exhaustive classification was provided by Bailar et al. (1984), but these four are the principal types. Examples of each type of study are described subsequently.

The cross-sectional study is a study of a single natural sample with concurrent measurement of a variety of characteristics. In the review by Bailar et al. (1984), 39% of published studies were of this type. Some notable examples are the National Health and Nutritional Examination Survey (NHANES) of the relationship between health and nutrition, and the annual Health Interview Survey of the prevalence of various diseases in the general U.S. population. Such studies have provided important descriptions of the prevalence of disease in specified populations, of the co-occurrence of the disease and other factors (i.e., associations), and of the sensitivity and specificity of diagnostic procedures.

In a cohort study (25% of studies), one or more samples (cohorts) of individuals, either natural or experimental samples, are followed prospectively and subsequent status is evaluated.

A case-control study (5% of studies) employs multiple natural samples with retrospective measurements. A sample of cases with the disease is compared to a sample of controls without the disease with respect to the previous presence of, or exposure to, some factor.

An important characteristic of cohort and case-control studies is whether or not the study employs matching of pairs or sets of subjects with respect to selected covariate values. Matching is a strategy to remove bias in the comparison of groups by ensuring equality of distributions of the matching covariates employed. Matching, however, changes the sample frame or the sampling unit in the analysis from the individual subject in an unmatched study to the matched set in the matched study. Thus, matched studies require analytic procedures that are different from those more commonly applied to unmatched studies.

A randomized, controlled clinical trial or parallel-comparative trial (15% of studies) employs two or more parallel randomized cohorts, each of which receives only one treatment in the trial. Such studies provide a controlled assessment of a new drug, therapy, diagnostic procedure, or intervention procedure. Variations of this design include the multiple-period crossover design and the crossed factorial design. Since a clinical trial uses randomization to assign each subject to receive either the active treatment or a control (e.g., drug vs. placebo), the comparison of the groups is in expectation unbiased. However, a truly unbiased study also requires other conditions such as complete and unbiased follow-up assessments.

Each of the first three types is commonly referred to as an observational or epidemiological study, in contrast to a clinical trial. It is rare, some might say impossible, that a population-based observational study will identify a single necessary and sufficient cause for a biological effect, or a 1:1 causal relationship. Almost always, a risk factor is identified that has a biological effect that is associated with a change in the risk of an outcome. It is only after a preponderance of evidence is accumulated from many such studies that such a risk factor may be declared to be a causal agent.Such was the case with the relationship between smoking and lung cancer, and the criteria employed to declare smoking a causal agent are now widely accepted (U.S. Surgeon General, 1964, 1982).

The principal advantage of the randomized controlled trial (RCT), on the other hand, is that it can provide conclusions with respect to causal relationships because other intervening factors are controlled through randomization. Thus, the RCT provides an unbiased comparison of the effects of administering one treatment versus another on the outcome in the specified population of patients, and any differences observed can be confidently ascribed to the differences between the treatments. Therefore, the distinction between a relationship based on an observational study and one based on a randomized experiment rests on the degree to which an observed relationship might be explained by other variables or other mechanisms.

However, in no study is there an absolute guarantee that all possible influential variables are controlled, even in a randomized, controlled experiment. Also, as the extent of knowledge about the underlying natural history of a disease expands, it becomes increasingly important to account for the known or suspected risk factors in the assessment of the effects of treatments or exposures, especially in an observational cross-sectional, cohort, or case-control study. This entails the use of an appropriate statistical model for the simultaneous influence of multiple covariates on the absolute or relative risk of important outcomes or events.

Thus, the principal objective of this book is to describe methods for the assessment of risk relationships derived from each type of study, and to consider methods to adjust or control for other factors in these assessments.

1.5 STUDIES OF DIABETIC NEPHROPATHY

To illustrate the different types of studies, we close this chapter with a review of selected studies on various aspects of diabetic nephropathy.

Cross-sectional surveys such as the National Health Interview Survey (NHIS) and the National Health and Nutrition Evaluation Survey (NHANES) indicate that approximately 16 million people in the U.S. population had some form of diabetes mellitus (Harris et al., 1987) as of around 1980. The majority had what is termed type 2 or noninsulin-dependent diabetes mellitus. Approximately 10% or 1.6 million had the more severe form, termed type 1 or insulin-dependent diabetes mellitus, for which daily insulin injections or infusions are required to sustain life. Among the most important clinical features of type 1 diabetes are the development of complications related to micro- and macrovascular abnormalities, among the most severe being diabetic nephropathy (kidney disease), which ultimately leads to end-stage renal disease (ESRD) in about a third of patients. These and other national surveys indicate that approximately 35% of all ESRD in the United States is attributed to diabetes.

As an illustration of a longitudinal observational cohort study, Deckert et al. (1978) followed a cohort of 907 Danish subjects with type 1 diabetes for many years and reported the annual incidence (proportion) of new cases of proteinuria (overt albuminuria) to appear each year. They showed that the peak incidence or greatest risk occurs approximately 15 years after the onset of diabetes. Their study also showed that over a lifetime, approximately 30% of subjects develop nephropathy whereas approximately 70% do not, suggesting that there is some mechanism that protects patients from nephropathy, possibly of a genetic nature, possibly related to the lifetime exposure to hyperglycemia, or possibly related to some environmental exposure or characteristic.

Since the discovery of insulin in the 1920s, one of the principal issues of contention in the scientific community is what was often called the glucose hypothesis. This hypothesis asserts that the extent of exposure to elevated levels of blood glucose or hyperglycemia is the dominant determinant of the risk of diabetic nephropathy and other microvascular abnormalities or complications of type 1 diabetes. Among the first studies to suggest an association was a large observational study conducted by Pirart (1978a,b) in Belgium over the period 1947–1973. This study examined the association between the level of blood glucose and the prevalence (presence or absence) of nephropathy. The data were obtained from a retrospective examination of the clinical history of 4400 patients treated in a community hospital over a period of up to 25 years in some patients. The rather crude analysis consisted of figures that displayed the prevalence of nephropathy by year of diabetes duration for subgroups categorized as being in good, fair, or poor control of blood glucose levels. These figures suggest that as the mean level of hyperglycemia increases, the risk (prevalence) of nephropathy also increases. This type of study is clearly open to various types of sampling or selection biases. Nevertheless, the study provides evidence that hyperglycemia may be a strong risk factor, or is associated with the risk of diabetic nephropathy. Note that this study is not strictly a prospective cohort study because the cohort was identified later and the longitudinal observations were then obtained retrospectively.

In all of these studies, biochemical measures of renal function are used to assess the presence and extent of nephropathy. Ultimately, however, end-stage renal disease is characterized by the physiological destruction of the kidney, specifically the glomeruli, which are the cellular structures that actually perform the filtration of blood. However, the only way to determine the physical extent of glomerular damage is to conduct a morphologic evaluation of a tissue specimen obtained by a needle biopsy of the kidney. As an example of a case-control study, Chavers, Bilous, Ellis et al. (1989) conducted a retrospective study to determine whether there was an association between the presence of established nephropathy versus not (the cases vs. controls) and evidence of morphological (structural tissue) abnormalities in the kidneys (the risk factor or exposure). They showed that approximately 69% of patients with nephropathy showed morphological abnormalities versus 42% among those without nephropathy, for an odds ratio of 3.1, computed as (0.69/0.31) divided by (0.42/0.58). Other studies (cf. Steffes et al. (1989) show that the earliest stage of nephropathy, microalbuminuria (which they defined as an AER ≥ 20 mg/24 h), is highly predictive of progression to proteinuria, with a positive predictive value ranging from 83 to 100%. These findings established that proteinuria is indeed associated with glomerular destruction and that microalbuminuria is predictive of proteinuria. Thus, a treatment that reduces the risk of microalbuminuria can be expected to reduce the risk of progression to proteinuria, and one that reduces the risk of proteinuria will also reduce the extent of physiological damage to the kidneys.

The major question to be addressed, therefore, was whether the risk of albuminuria or nephropathy could be reduced by a treatment that consistently lowered the levels of blood glucose. By the 1980s, technological developments made an experiment (clinical trial) to test this hypothesis feasible. The level of blood glucose varies continuously over the 24-hour period, with peaks following meals and troughs before meals. It was discovered that the hemoglobin (red cells) in the blood become glycosylated when exposed to blood glucose. Thus, the percent of the total hemoglobin that has become glycosylated (the HbA1c%) provides an indirect measure of the mean level of hyperglycemia over the preceding four to six weeks, the half-life of the red blood cell. This made it possible to assess the average extent of hyperglycemia in individual patients. Other developments then made it possible for patients and their health care teams to control their blood sugar levels so as to lower the level of hyperglycemia, as reflected by the level of HbA1c. Devices for self-monitoring of blood glucose allowed patients to measure the current level of blood glucose (mg/dL) from a drop of blood obtained by a finger prick. Patients could then alter the amount of insulin administered to keep the level of blood glucose within a desirable range. Also, a variety of types of insulin were developed, some of which acted quickly and some over long periods of time, that could be administered using multiple daily insulin injections or a pump. The health care team could then try different algorithms to vary the amount of insulin administered in response to the current level of blood glucose.

Fig. 1.1 Cumulative incidence of microalbuminuria (AER > 40 mg/24 h) over nine years of follow-up in the DCCT primary prevention cohort. Reproduced with permission.

With these advances, in 1981 the National Institute of Diabetes, Digestive and Kidney Disease launched the Diabetes Control and Complications Trial (DCCT) to test the glucose hypothesis (DCCT, 1990, 1993). This was a large-scale randomized controlled clinical trial involving 1441 patients enrolled in 29 clinical centers in the United States and Canada and followed for an average of 6.5 years (4 to 9 years). Of these, the 726 patients comprising the primary prevention cohort were free of any microvascular complications (AER ≤ 40 mg/dL and no retinopathy, among other features); and the 715 patients comprising the secondary intervention cohort may have had minimal preexisting levels of albuminuria (AER < 200 mg/dL) and mild retinopathy. Patients were randomly assigned to receive either intensive or conventional treatment. Intensive treatment used all available means (self-monitoring four or more times a day with three or more multiple daily injections or a pump in conjunction with diet and exercise) to obtain levels of HbA1c as close as possible to the normal range (< 6.05%) while attempting to avoid hypoglycemia. Hypoglycemia occurs when the blood glucose level is reduced below a physiologically safe level, resulting is dizziness and possibly coma (unconsciousness) or seizures. Conventional treatment, on the other hand, consisted of one or two daily injections of insulin with less frequent self-monitoring with the goal of maintaining the clinical well-being of the patient, but without specific glucose targets.

Fig. 1.2 Cumulative incidence of microalbuminuria (AER > 40 mg/24 h) over nine years of follow-up in the DCCT secondary intervention cohort. Reproduced with permission.

Figure 1.1 presents the cumulative incidence of microalbuminuria (AER > 40 mg/24 h) among the 724 patients free of microalbuminuria at baseline in the primary cohort (adapted from DCCT, 1993). The average hazard ratio for intensive versus conventional treatment (I:C) over the nine years is 0.66. This corresponds to a 34% risk reduction with intensive therapy, 95% confidence limits (2, 56%) (DCCT, 1993, 1995a). Likewise, Figure 1.2 presents the cumulative incidence of microalbuminuria among the 641 patients free of microalbuminuria at baseline in the secondary cohort. The average hazard ratio is 0.57, corresponding to a 43% (C.I.: 21, 58%) risk reduction with intensive therapy (DCCT, 1995a). These risk reductions are adjusted for the baseline level of log AER using the proportional hazards regression model. A model that also employed a stratified adjustment for primary and secondary cohorts yields a risk reduction of 39% (21, 52%) in the combined cohorts. Similar analyses indicate a reduction of 54% (19, 74%) in the risk of overt albuminuria or proteinuria (AER > 300 mg/24 h) in the combined cohorts. Thus, intensive therapy aimed at near-normal blood glucose levels dramatically reduces the incidence of severe nephropathy, which may ultimately lead to end-stage renal disease.

However, intensive treatment was associated with an increased incidence of severe episodes of hypoglycemia (DCCT, 1993, 1995b, 1997). Over the 4770 patient years of treatment and follow-up in the intensive treatment group, 271 patients experienced 770 episodes of hypoglycemia accompanied by coma and/or seizures, or 16.3 events per 100 patient years (100 PY) of follow-up. In contrast, over the 4732 patient years in the conventional treatment group, 137 patients experienced 257 episodes, or 5.4 per 100 PY. The relative risk is 3.02 with 95% confidence limits of 2.36 to 3.86 (DCCT, 1995b, 1997). Because of substantial overdispersion of the subject-specific event rates, this confidence limit was computed using a random effects or overdispersed Poisson model.

Thus, the DCCT demonstrated that a multifaceted intensive treatment aimed at achieving near-normal levels of blood glucose greatly reduces the risk of nephropathy. The ultimate questions, however, were whether these risk reductions are caused principally by the alterations in levels of blood glucose, as opposed to changes in diet or exercise, for example, and whether there is some threshold for hyperglycemia below which there are no further reductions in risk. Thus, analyses were performed using Poisson and proportional hazards regression models, separately in the intensive and conventional treatment groups, using the current mean level of HbA1c since entry into the trial as a time-dependent covariate in conjunction with numerous covariates measured at baseline. Adjusting for 25 other covariates, these models showed that the dominant determinant of the risk of proteinuria is the current level of the log mean HbA1c since entry, with a 71% increase in risk per 10% increase in HbA1c (such as from an HbA1c of 9 to 9.9) in the conventional group, which explains approximately 5% of the variation in risk (DCCT, 1995c). Further analyses demonstrated that there is no statistical breakpoint or threshold in this risk relationship (DCCT, 1996).

These various studies and analyses, all of which concern the absolute and relative risks of discrete outcomes, show that microalbuminuria and proteinuriaare associated with structural changes in renal tissue, that an intensive treatment regimen greatly reduces the risk of nephropathy, and that the principal risk factor is the lifetime exposure to hyperglycemia. Given that diabetes is the leading cause of end-stage renal disease, it can be anticipated that implementation of intensive therapy in the wide population with type 1 diabetes will ultimately reduce the progression of nephropathy to end-stage renal disease, with pursuant reductions in the incidence of morbidity and mortality caused by diabetic kidney disease and the costs to the public. The methods used to reach these conclusions, and their statistical basis, are described in the chapters to follow.

Throughout the book, selected programs are cited that perform specific computations, or access a particular data set. These and many more programs are provided at the author’s website for the book at www.bsc.gwu.edu/jml/biostatmethods. This includes all of the data sets employed in the book and the programs used for all examples.

2

Relative Risk Estimates and Tests for Independent Groups

The core of biostatistics relates to the evaluation and comparison of the risks of disease and other health outcomes in specific populations. Among the many different designs, the most basic is the comparison of two independent groups of subjects drawn from two different populations. This could be a cross-sectional study comparing the current health status of those with, versus those without, a specific exposure of interest; or a longitudinal cohort study of the development of health outcomes among a group of subjects exposed to a purported risk factor versus a group not so exposed; or a retrospective study comparing the previous exposure risk among independent (unmatched) samples of cases of the disease versus controls; or perhaps a clinical trial where the health outcomes of subjects are compared among those randomly assigned to receive the experimental treatment versus those assigned to receive the control treatment. Each of these cases will involve a comparison of the proportions with the response or outcome between the two groups.

Many texts provide a review of the methods for comparison of the risks or probabilitiesof an outcome between groups. These include the classic text by Fleiss (1981), and the updated edition of Fleiss et al. (2003), and many texts on statistical methods for epidemiology such as Breslow and Day (1980, 1987), Sahai and Khurshid (1995), Selvin (1996), and Kelsey et al. (1996), among many. Because this book is intended principally as a graduate text, readers are referred to these books for review of other topics not covered herein.

2.1 PROBABILITY AS A MEASURE OF RISK

2.1.1 Prevalence and Incidence

The simplest data structure in biomedical research is a sample of n independent and identically distributed (i.i.d.) Bernoulli observations {yi}from a sample of n subjects (i = 1,…, n) drawn at random from a population with a probability π of a characteristic of interest such as death or worsening, or perhaps survival or improvement. The character of interest is often referred to as the positive response, the outcome, or the event. Thus, Y is a binary random variable such that yi = I {positive response for the ith observation}, where I{·} is the indicator function: I{·} = 1 if true, 0 if not. The total number of subjects in the sample with a positive response is x = Σiyi, and the simple proportion with a positive response in the sample is p = x/n.

The prevalence of a characteristic is the probability π in the population, or the proportion p in a sample, with that characteristic present in a cross section of the population at a specific point in time. For example, the prevalence of adult-onset type 2 diabetes as of 1980 was estimated to be approximately 6.8% of the U.S. population based on the National Health and Nutrition Examination Survey (NHANES) (Harris et al., 1987). Half of those who met the criteria for diabetes on an oral glucose tolerance test (3.4%) were previously undiagnosed. In such a study, n is the total sample size of whom x have the positive characteristic (diabetes).

The incidence of an event (the positive characteristic) is the probability π in the population, or the proportion p in a sample, that acquire the positive characteristic or experience an event over an interval of time among those who were free of the characteristic at baseline. In this case, n is the sample size at risk in a prospective longitudinal follow-up study, of whom x experience the event over a period of time. For example, from the annual National Health Interview Survey (NHIS) it is estimated that the incidence of a new diagnosis of diabetes among adults in the U.S. population is 2.42 new cases per 1000 in the population per year (Kenny et al., 1995).

Such estimates of the prevalence of a characteristic, or the incidence of an event, are usually simple proportions based on a sample of n i.i.d. Bernoulli observations.

2.1.2 Binomial Distribution and Large Sample Approximations

Whether from a cross-sectional study of prevalence or a prospective study of incidence, the number of positive responses X is distributed as binomial with probability π, or

(2.1)

where E(X) = nπ and V(X) = nπ(1 ‒ π). Since E(X) = nπ, a naturai moment estimate of π is p, where p is the simple proportion of events p = x/n. This is also the maximum likelihood estimate. From the normal approximation to the binomial, it is well known that X is normally distributed asymptotically (in large samples) as

(2.2)

from which

(2.3)

These expressions follow from the central limit theorem because x can be expressed as the nth partial sum of a potentially infinite series of i.i.d. random variables {yi} (see Section A.2). Thus, p is the mean of a set of i.i.d. random variables,

As described in the Appendix, (2.3) is a casual notation for the asymptotic distribution of p or of . More precisely, we would write

(2.4)

which indicates that as the sample size becomes infinitely large, the proportion pn converges in distribution to the normal distribution and that p is a -consistent estimator for π. In this notation, the variance is a fixed quantity, whereas in (2.3) the variance ↓ 0 as n → ∞.

The expression for the variance in (2.3), V(p) = π(l ‒ π)/n, is the large sample variance that is used in practice with finite samples to compute a test of significance or a confidence interval. A test of the null hypothesis that the probability is a specified value, H0: π = π0 against the alternative hypothesis H1π = π1 ≠ π0, is then provided by a large sample test

(2.5)

where Z is asymptotically distributed as standard normal. We would then reject H0 against the alternative H1, two-sided, for values |z| > Z1-α/2, where Z1-α/2 is the upper two-sided normal distribution percentile at level α; for example, for α = 0.05, Z0.975 = 1.96.

Since , then from Slutsky’s convergence theorem, (A.45) in Section A.4, a consistent estimate of the variance is This yields the usual large sample confidence interval at level 1 ‒ α for a proportion with lower and upper confidence limits on π obtained as

(2.6)

However, these confidence limits are not bounded by (0,1), meaning that for values of p close to 0 or 1, or for small sample sizes, the upper limit may exceed 1 or the lower limit be less than 0.

2.1.3 Asymmetrie Confidence Limits

2.1.3.1 Exact Confidence Limits

One approach that ensures that the confidence limits are bounded by (0,1) is an exact computation under the binomial distribution, often called the Clopper-Pearson confidence limits (Clopper and Pearson, 1934). In this case the upper confidence limit πU is the solution to the equation

(2.7)

and the lower confidence limit πL is the solution to

Such confidence limits are not centered about p and thus are called asymmetric confidence limits.

A solution of these equations may be obtained by iterative computations. Alternatively, Clopper and Pearson show that these limits may be obtained from the relationship between the cumulative F-distribution and the incomplete beta function, of which the binomial is a special case, see, e.g., Wilks (1962). With a small value of np, confidence limits may also be obtained from the Poisson approximation to the binomial distribution. Computations of the exact limits are readily obtained using commercial software such as StatXact.

2.1.3.2 Logit Confidence Limits

Another approach is to consider a function g(π) such that the inverted confidence limits based on g(π) are contained in the interval (0,1). One convenient function is the logit transformation,

(2.8)

where throughout, log is the natural logarithm to the base e. The logit plays a central role in the analysis of binary (Bernoulli) data. The quantity O = π/(1 ‒ π) is the odds of the characteristic of interest or an event in the population, such as O = 2 for an odds of 2:1 when π = 2/3. The inverse logit or logistic function

(2.9)

then transforms the odds back to the probability.

Woolf (1955) was among the first to describe the asymptotic distribution of the log odds. Using the delta (δ)-method (see Section A.3), then asymptotically

(2.10)

and thus provides a consistent estimate of θ = g(π) The large sample variance of the estimate is

(2.11)

where ≅ means “asymptotically equal to.” Because p is a consistent estimator of π it follows from Slutsky’s theorem (A.45) that the variance can be consistently estimated by substituting p for π to yield

(2.12)

Further, from another tenet of Slutsky’s theorem (A.47) it follows that asymptotically

(2.13)

Further, because is consistent for , it also follows from Slutsky’s theorem (A.44) that asymptotically

(2.14)

Thus, the symmetric 1 – α confidence limits on the logit θ are

(2.15)

Applying the inverse (logistic) function in (2.9) yields the asymmetric confidence limits on π:

(2.16)

that are bounded by (0,1).

2.1.3.3 Complementary log-log Confidence Limits

Another convenient function is the complementary log-log transformation

(2.17)

that is commonly used in survival analysis. It can readily be shown (see Problem 2.1) that the 1 ‒ α confidence limits on θ = g(π) obtained from the asymptotic normal distribution of are

(2.18)

Applying the inverse function yields the asymmetric confidence limits on π

(2.19)

that are also bounded by (0,1). Note that because the transformation includes a reciprocal, the lower limit πL is obtained as the inverse transformation of the upper confidence limit in (2.18).

2.1.3.4 Test-Inverted Confidence Limits

Another set of asymmetric confidence limits was suggested by Wilson (1927) based on inverting the two-sided Z-test for a proportion in (2.5) that leads to rejection of H0 for values |z| > Z1-α/2. Thus, setting z2 = (Z1-α/2)2 yields a quadratic equation in π0, the roots for which provide confidence limits for π:

(2.20)

Newcombe (1998), among others, has shown that these limits provide good coverage probabilities relative to the exact Clopper-Pearson limits. Miettinen (1976) also generalized this derivation to confidence limits for odds ratios (see Section (2.6.6)).

Example 2.1Hospital Mortality

In a particular hospital assume that x = 2 patients died postoperatively out of n = 46 patients who underwent coronary artery bypass surgery during a particular month. Then p = 0.04348, with and estimated standard error S.E.(p) = 0.030068, that yields 95% large sample confidence limits from (2.6) of (‒0.01545, 0.10241), the lower limit value less than 0 being clearly invalid. The exact computation of (2.7) using StatXact yields limits of (0.00531, 0.1484). The logit transformation yields with estimated variance and S.E. = 0.723. From (2.15) this yields 95% confidence limits on 0 of (‒4.508, ‒1.674). The logistic function of these limits yields 95% confidence limits for π of (0.0109, 0.1579), that differ slightly from the exact limits. Likewise, the complementary log-log transformation yield with S.E.. From (2.18) this yields 95% confidence limits on θ of (0.7105, 1.5751). The inverse function of these limits yields 95% confidence limits for π of (0.00798, 0.13068), that compare favorably to the exact limits. Finally, the test-inverted confidence limits from (2.20) are (0.012005,0.14533).

With only two events in 46 subjects, clearly the exact limits are preferred. However, even in this case, the large sample approximations are satisfactory, other than the ordinary large sample limits based on the asymptotic normal approximation to the distribution of p itself.

2.1.4 Case of Zero Events

In some cases it is important to describe the confidence limits for a probability based on a sample of n observations of which none have the positive characteristic present, or have experienced an event, such that x and p are both zero. From the expression for the binomial probability,

(2.21)

One then desires a one-sided confidence interval of size 1 – α of the form , where the upper confidence limit satisfies the relation

(2.22)

the “:” meaning “such that.” Solving for π yields

(2.23)

(see Louis, 1981).

For example, if n = 60, the 95% confidence interval for π when x = 0 is (0, 0.0487). Thus, with 95% confidence we must admit the possibility that π could be as large as 0.049, or about 1 in 20. If, on the other hand, we desired an upper confidence limit of 1 in 100, such that , then the total sample size would satisfy the expression 0.01 = 1 ‒ 0.051/n, that yields n = 299 (298.07 to be exact), (see Problem 2.3).

2.2 MEASURES OF DIFFERENTIAL OR RELATIVE RISK

The simplest design to compare two populations is to draw two independent samples of n1 and n2 subjects from each of the two populations and then to observe the numbers of subjects within each sample, x1 and x2, who have a positive response or characteristic of interest. The resulting data can be summarized in a simple 2×2 table to describe the association between the binary independent variable representing membership in either of two independent groups (i = 1, 2), and a binary dependent variable (the response), where the response of primary interest is denoted as + and

Table 2.1 Measures of differential or relative risk

its complement as –. This 2×2 table of frequencies can be expressed as:

(2.24)

where the “•” subscript represents summation over the corresponding index for rows or columns. For the most part, we shall use the notation in the last table when it is unambiguous. Throughout, N denotes the total sample size. Within each group (i = 1, 2), the number of positive responses is distributed as binomial with probability πi, from which .

We can now define a variety of parameters to describe the differences in risk between the two populations, as shown in Table 2.1