181,99 €
A complete guide to the key statistical concepts essential for the design and construction of clinical trials As the newest major resource in the field of medical research, Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs presents a timely and authoritative reviewof the central statistical concepts used to build clinical trials that obtain the best results. The referenceunveils modern approaches vital to understanding, creating, and evaluating data obtained throughoutthe various stages of clinical trial design and analysis. Accessible and comprehensive, the first volume in a two-part set includes newly-written articles as well as established literature from the Wiley Encyclopedia of Clinical Trials. Illustrating a variety of statistical concepts and principles such as longitudinal data, missing data, covariates, biased-coin randomization, repeated measurements, and simple randomization, the book also provides in-depth coverage of the various trial designs found within phase I-IV trials. Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs also features: * Detailed chapters on the type of trial designs, such as adaptive, crossover, group-randomized, multicenter, non-inferiority, non-randomized, open-labeled, preference, prevention, and superiority trials * Over 100 contributions from leading academics, researchers, and practitioners * An exploration of ongoing, cutting-edge clinical trials on early cancer and heart disease, mother-to-child human immunodeficiency virus transmission trials, and the AIDS Clinical Trials Group Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs is an excellent reference for researchers, practitioners, and students in the fields of clinicaltrials, pharmaceutics, biostatistics, medical research design, biology, biomedicine, epidemiology,and public health.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 2225
Veröffentlichungsjahr: 2014
Contents
Cover
Half Title page
Title page
Copyright page
Contributors
Preface
Chapter 1: Absolute Risk Reduction
1.1 Introduction
1.2 Preliminary Issues
1.3 Point and Interval Estimates for a Single Proportion
1.4 An Unpaired Difference of Proportions
1.5 Number Needed to Treat
1.6 A Paired Difference of Proportions
References
Further Reading
Chapter 2: Accelerated Approval
2.1 Introduction
2.2 Accelerated Development Versus Expanded Access in the U.S.A.
2.3 Sorting the Terminology—Which FDA Initiatives Do What?
2.4 Accelerated Approval Regulations: 21 C.F.R. 314.500, 314.520, 601.40
2.5 Stages of Drug Development and FDA Initiatives
2.6 Accelerated Approval Regulations: 21 CFR 314.500, 314.520, 601.40
2.7 Accelerated Approval with Surrogate Endpoints
2.8 Accelerated Approval with Restricted Distribution
2.9 Phase IV Studies/Post Marketing Surveillance
2.10 Benefit Analysis for Accelerated Approvals Versus Other Illnesses
2.11 Problems, Solutions, and Economic Incentives
2.12 Future Directions
References
Further Reading
Chapter 3: AIDS Clinical Trials Group (ACTG)
3.1 Introduction
3.2 A Brief Primer on HIV/AIDS
3.3 ACTG Overview
3.4 ACTG Scientific Activities
3.5 Development of Potent Antiretroviral Therapy (ART)
3.6 Expert Systems and Infrastructure
References
Chapter 4: Algorithm-Based Designs
4.1 Phase I Dose-Finding Studies
4.2 Accelerated Designs
4.3 Model-Based Approach in the Estimation of MTD
4.4 Exploring Algorithm-Based Designs with Prespecified Targeted Toxicity Levels
References
Chapter 5: Alpha-Spending Function
5.1 Introduction
5.2 Alpha Spending Function Motivation
5.3 The Alpha Spending Function
5.4 Application of the Alpha Spending Function
5.5 Confidence Intervals and Estimation
5.6 Trial Design
5.7 Conclusions
References
Further Reading
Chapter 6: Application of New Designs in Phase I Trials
6.1 Introduction
6.2 Objectives of a Phase I Trial
6.3 Standard Designs and Their Shortcomings
6.4 Some Novel Designs
6.5 Discussion
References
Further Reading
Chapter 7: ASCOT Trial
7.1 Introduction
7.2 Objectives
7.3 Study Design
7.4 Results
7.5 Discussion and Conclusions
References
Chapter 8: Benefit/Risk Assessment in Prevention Trials
8.1 Introduction
8.2 Types of B/RAs Performed in Prevention Trials
8.3 Alternative Structures of the Benefit/Risk Algorithm used in Prevention Trials
8.4 Methodological and Practical Issues with B/RA in Prevention Trials
References
Chapter 9: Biased Coin Randomization
9.1 Randomization Strategies for Overall Treatment Balance
9.2 The Biased Coin Randomization Procedure
9.3 Properties
9.4 Extensions to the Biased Coin Randomization
9.5 Adaptive Biased Coin Randomization
9.6 Urn Models
9.7 Treatment Balance for Covariates
9.8 Application of Biased Coin Designs to Response-Adaptive Randomization
References
Chapter 10: Biological Assay, Overview
10.1 Introduction
10.2 Direct Dilution Assays
10.3 Indirect Dilution Assays
10.4 Indirect Quantal Assays
10.5 Stochastic Approximation in Bioassay
10.6 Radioimmunoassay
10.7 Dosimetry and Bioassay
10.8 Semiparametrics in Bioassays
10.9 Nonparametrics in Bioassays
10.10 Bioavailability and Bioequivalence Models
10.11 Pharmacogenomics in Modern Bioassays
10.12 Complexities in Bioassay Modeling and Analysis
References
Further Reading
Chapter 11: Block Randomization
11.1 Introduction
11.2 Simple Randomization
11.3 Restricted Randomization Through the Use of Blocks
11.4 Schemes Using a Single Block for the Whole Trial
11.5 Use of Unequal and Variable Block Sizes
11.6 Inference and Analysis Following Blocked Randomization
11.7 Miscellaneous Topics Related to Blocked Randomization
References
Further Reading
Chapter 12: Censored Data
12.1 Introduction
12.2 Independent Censoring
12.3 Likelihoods: Noninformative Censoring
12.4 Other Kinds of Incomplete Observation
References
Chapter 13: Clinical Data Coordination
13.1 Introduction
13.2 Study Initiation
13.3 Study Conduct
13.4 Study Closure
13.5 Summary
References
Chapter 14: Clinical Data Management
14.1 Introduction
14.2 How Has Clinical Data Management Evolved?
14.3 Electronic Data Capture
14.4 Regulatory Involvement with Clinical Data Management
14.5 Professional Societies
14.6 Look to the Future
14.7 Conclusion
References
Chapter 15: Clinical Significance
15.1 Introduction
15.2 Historical Background
15.3 Article Outline
15.4 Design and Methodology
15.5 Examples
15.6 Recent Developments
15.7 Concluding Remarks
References
Chapter 16: Clinical Trial Misconduct
16.1 The Scope of this Article
16.2 Why Does Research Misconduct Matter?
16.3 Early Cases
16.4 Definition
16.5 Intent
16.6 What Scientific Misconduct was Not
16.7 The Process
16.8 The Past Decade
16.9 Lessons from the U.S. Experience
16.10 Outside the United States
16.11 Scientific Misconduct During Clinical Trials
16.12 Audit
16.13 Causes
16.14 Prevalence
16.15 Peer Review and Misconduct
16.16 Retractions
16.17 Prevention
References
Chapter 17: Clinical Trials, Early Cancer and Heart Disease
17.1 Introduction
17.2 Developments in Clinical Trials at the National Cancer Institute (NCI)
17.3 Developments in Clinical Trials at the National Heart, Lung, and Blood Institute (NHLBI)
References
Chapter 18: Cluster Randomization
18.1 Introduction
18.2 Examples of Cluster Randomization Trials
18.3 Principles of Experimental Design
18.4 Experimental and Quasi-Experimental Designs
18.5 The Effect of Failing to Replicate
18.6 Sample Size Estimation
18.7 Cluster Level Analyses
18.8 Individual Level Analyses
18.9 Incorporating Repeated Assessments
18.10 Study Reporting
18.11 Meta-Analysis
References
Chapter 19: Coherence in Phase I Clinical Trials
19.1 Introduction
19.2 Coherence: Definitions and Organization
19.3 Coherent Designs
19.4 Compatible Initial Design
19.5 Group Coherence
19.6 Real-Time Coherence
19.7 Discussion
References
Chapter 20: Compliance and Survival Analysis
20.1 Compliance: Cause and Effect
20.2 All-or-Nothing Compliance
20.3 More General Exposure Patterns
20.4 Other Structural Modeling Options
References
Chapter 21: Composite Endpoints in Clinical Trials
21.1 Introduction
21.2 The Rationale for Composite Endpoints
21.3 Formulation of Composite Endpoints
21.4 Examples
21.5 Interpreting Composite Endpoints
21.6 Conclusions
References
Chapter 22: Confounding
22.1 Introduction
22.2 Confounding as a Bias in Effect Estimation
22.3 Confounding and Noncollapsibility
22.4 Confounding in Experimental Design
References
Chapter 23: Control Groups
23.1 Introduction
23.2 History
23.3 Ethics
23.4 Types of Control Groups: Historical Controls
23.5 Types of Control Groups: Randomized Controls
23.6 Conclusion
References
Chapter 24: Coronary Drug Project
24.1 Introduction
24.2 Objectives
24.3 Study Design and Methods
24.4 Results
24.5 Conclusions and Lessons Learned
References
Further Reading
Chapter 25: Covariates
25.1 Universal Character of Covariates
25.2 Use of Covariates in Clinical Trials
25.3 Continuous Covariates: Categorization or Functional Form?
25.4 Reporting and Summary Assessment of Prognostic Markers
References
Chapter 26: Crossover Design
26.1 Introduction
26.2 The Two-Period, Two-Treatment Design
26.3 Higher Order Designs
26.4 Model-Based Analyses
References
Chapter 27: Crossover Trials
27.1 Introduction
27.2 2 × 2 Crossover Trial
27.3 Higher-Order Designs for Two Treatments
27.4 Designs for Three or More Treatments
27.5 Analysis of Continuous Data
27.6 Analysis of Discrete Data
27.7 Concluding Remarks
References
Chapter 28: Diagnostic Studies
28.1 Introduction
28.2 Diagnostic Studies
28.3 Reliability
28.4 Validity
References
Further Reading
Chapter 29: DNA Bank
29.1 Definition and Objectives of DNA Biobanks
29.2 Types of DNA Biobanks
29.3 Types of Samples Stored
29.4 Quality Assurance and Quality Control in DNA Biobanks
29.5 Ethical Issues
29.6 Current Biobank Initiatives
29.7 Conclusions
References
Chapter 30: Up-and-Down and Escalation Designs
30.1 Introduction
30.2 Up-and-Down Designs
30.3 Escalation Designs
30.4 Comparing U&D, Escalation and Model-Based Designs
References
Further Reading
Chapter 31: Dose Ranging Crossover Designs
31.1 Introduction
31.2 Titration Designs and Extension Studies
31.3 Randomized Designs
31.4 Discussion and Conclusion
References
Further Reading
Chapter 32: Flexible Designs
32.1 Introduction
32.2 The General Framework
32.3 Conditional Power and Sample Size Reassessment
32.4 Extending the Flexibility to the Choice of the Number of Stages
32.5 Selection of the Test Statistics
32.6 More General Adaptations and Multiple Hypotheses Testing
32.7 An Example
32.8 Conclusion
References
Chapter 33: Gene Therapy
33.1 Introduction
33.2 Requirements for Successful Therapeutic Intervention
33.3 Pre-Clinical Research
33.4 Translational Challenges of Gene Therapy Trials
33.6 Lessons Learned
33.7 The Way Forward
References
Further Reading
Chapter 34: Global Assessment Variables
34.1 Introduction
34.2 Scientific Questions for Multiple Outcomes
34.3 General Comments on the GST
34.4 Recoding Outcome Measures
34.5 Types of Global Statistical Tests (GSTs)
34.6 Other Considerations
34.7 Other Methods
34.8 Examples of the Application of GST
34.9 Conclusions
References
Chapter 35: Good Clinical Practice (GCP)
35.1 Introduction
35.2 Human Rights and Protections
35.3 Informed Consent
35.4 Investigational Protocol
35.5 Investigator’s Brochure
35.6 Investigational New Drug Application
35.7 Production of the Investigational Drug
35.8 Clinical Testing
35.9 Sponsors
35.10 Contract Research Organization
35.11 Monitors
35.12 Investigators
35.13 Documentation
35.14 Clinical Holds
35.15 Inspections/Audits
References
Further Reading
Chapter 36: Group-Randomized Trials
36.1 Introduction
36.2 Group-Randomized Trials in Context
36.3 The Development of Group- Randomized Trials in Public Health
36.4 The Range of GRTs in Public Health
36.5 Current Design and Analytic Practices in GRTs in Public Health
36.6 The Future of Group-Randomized Trials
36.7 Planning a New Group-Randomized Trial
References
Chapter 37: Group Sequential Designs
37.1 Introduction
37.2 Classical Designs
37.3 The α-Spending Function Approach
37.4 Point Estimates and Confidence Intervals
37.5 Supplements
References
Chapter 38: Hazard Ratio
38.1 Introduction
38.2 Definitions
38.3 Illustration of Hazard Rate, Hazard Ratio and Risk Ratio
38.4 Example on the Use and Usefulness of Hazard Ratios
38.5 Ad-hoc Estimator of the Hazard Ratio
38.6 Confidence Interval of the Ad-hoc Estimator
38.7 Ad-hoc Estimator Stratified for the Covariate Renal Function
38.8 Properties of the Ad-hoc Estimator
38.9 Class of Generalized Rank Estimators of the Hazard Ratio
38.10 Estimation of the Hazard Ratio with Cox’s Proportional Hazards Model
38.11 Discussion
Further Reading
References
Chapter 39: Large Simple Trials
39.1 Large, Simple Trials
39.2 Small but Clinically Important Objective
39.3 Eligibility
39.4 Randomized Assignment
39.5 Outcome Measures
39.6 Conclusions
References
Further Reading – Selected Examples of Large, Simple Trials
Chapter 40: Longitudinal Data
40.1 Definition
40.2 Longitudinal Data from Clinical Trials
40.3 Advantages
40.4 Challenges
40.5 Analysis of Longitudinal Data
References
Further Reading
Chapter 41: Maximum Duration and Information Trials
41.1 Introduction
41.2 Two Paradigms: Duration versus Information
41.3 Sequential Studies: Maximum Duration versus Information Trials
41.4 An Example of a Maximum Information Trial
References
Chapter 42: Missing Data
42.1 Introduction
42.2 Methods in Common Use
42.3 An Alternative Approach to Incomplete Data
42.4 Illustration: Orthodontic Growth Data
42.5 Inverse Probability Weighting
42.6 Multiple Imputation
42.7 Sensitivity Analysis
42.8 Conclusion
References
Chapter 43: Mother to Child Human Immunodeficiency Virus Transmission Trials
43.1 Introduction
43.2 The Pediatric Aids Clinical Trials Group 076 Trial
43.3 Results
43.4 The European Mode of Delivery Trial
43.5 The HIV Network for Prevention Trials 012 Trial
43.6 The Mashi Trial
References
Further Reading
Chapter 44: Multiple Testing in Clinical Trials
44.1 Introduction
44.2 Concepts of Error Rates
44.3 Union-Intersection Testing
44.4 Closed Testing
44.5 Partition Testing
References
Further Reading
Chapter 45: Multicenter Trials
45.1 Definitions
45.2 History
45.3 Examples
45.4 Organizational and Operational Features
45.5 Strengths
45.6 Counts
Readings
References
Chapter 46: Multiple Endpoints
46.1 Introduction
46.2 Multiple Testing Methods
46.3 Multivariate Global Tests
46.4 Conclusions
References
Chapter 47: Multiple Risk Factor Intervention Trial
47.1 Introduction
47.2 Trial Design
47.3 Trial Screening and Execution
47.4 Findings at the End of Intervention
47.5 Long-Term Follow-Up
47.6 Epidemiologic Findings from Long-Term Follow-up of 361,662 MRFIT Screenees
47.7 Conclusions
References
Further Reading
Chapter 48: N-of-1 Randomized Trials
48.1 Introduction
48.2 Goal of N-of-1 Studies
48.3 Requirements
48.4 Design Choices and Details for N-of-1 Studies
48.5 Statistical Issues
48.6 Other Issues
48.7 Conclusions
References
Chapter 49: Noninferiority Trial
49.1 Introduction
49.2 Essential Elements of Noninferiority Trial Design
49.3 Objectives of Noninferiority Trials
49.4 Measure of Treatment Effect
49.5 Noninferiority Margin
49.6 Statistical Testing for Noninferiority
49.7 Medication Nonadherence and Misclassification/Measurement Error
49.8 Testing Superiority and Noninferiority
49.9 Conclusion
References
Chapter 50: Nonrandomized Trials
50.1 Introduction
50.2 Randomized vs. Nonrandomized Clinical Trials
50.3 Control Groups in Nonrandomized Trials
50.4 Statistical Methods in Design and Analyses
50.5 Conclusion and Discussion
References
Chapter 51: Open-Labeled Trials
51.1 Introduction
51.2 The Importance of Blinding
51.3 Reasons Why Trials Might Have to be Open-Label
51.4 When Open-Label Trials Might be Desirable
51.5 Concluding Comments
References
Further Reading
Chapter 52: Optimizing Schedule of Administration in Phase I Clinical Trials
52.1 Introduction
52.2 Motivating Example
52.3 Design Issues
52.4 Trial Conduct
52.5 Extensions and Related Research
References
Chapter 53: Partially Balanced Designs
53.1 Introduction
53.2 Association Schemes
53.3 Partially Balanced Incomplete Block Designs
53.4 Generalizations of PBIBDs and Related Ideas
References
Chapter 54: Phase I/II Clinical Trials
54.1 Introduction
54.2 Traditional Approach
54.3 Recent Developments
54.4 Illustrations
References
Chapter 55: Phase II/III Trials
55.1 Introduction
55.2 Description and Legal Basis
55.3 Better Dose-Response Studies with Phase 2/3 Designs
55.4 Principles of Phase 2/3 Designs
55.5 Inferential Difficulties
55.6 Summary
References
Further Reading
Chapter 56: Phase I Trials
56.1 Introduction
56.2 Phase I in Healthy Volunteers
56.3 Phase I in Cancer Patients
56.4 Perspectives in the Future of Cancer Phase I Trials
56.5 Discussion
References
Chapter 57: Phase II Trials
57.1 Introduction
57.2 Proof-of-Concept (Phase IIa) Trials
57.3 Dose-Ranging (Phase IIb) Trials
57.4 Efficacy Endpoints
57.5 Oncology Phase II Trials
References
Further Reading
Chapter 58: Phase III Trials
58.1 Introduction
58.2 Research Methodology in Phase III
58.3 Type of Design
58.4 Discussion
References
Chapter 59: Phase IV Trials
59.1 Introduction
59.2 Definitions and Context
59.3 Different Purposes for Phase IV Trials
59.4 Essential and Desirable Features of Phase IV Trials
59.5 Examples of Phase IV Studies
59.6 Conclusion
References
Further Reading
Chapter 60: Phase I Trials in Oncology
60.1 Introduction
60.2 Dose-Limiting Toxicity
60.3 Starting Dose
60.4 Dose Level Selection
60.5 Study Design and General Considerations
60.6 Traditional, Standard, or 3 + 3 Design
60.7 Continual Reassessment Method and Other Designs that Target the MTD
60.8 Start-Up Rule
60.9 Phase I Trials with Long Follow-Up
60.10 Phase I Trials with Multiple Agents
60.11 Phase I Trials with the MTD Defined using Toxicity Grades
References
Further Reading
Chapter 61: Placebos
61.1 History of Placebo
61.2 Definitions
61.3 Magnitude of the Placebo Effect
61.4 Influences on the Placebo Effect
61.5 Ethics of Employing Placebo in Research
61.6 Guidelines for the Use of Placebos in Research
61.7 Innovations to Improve Research Involving Placebo
61.8 Summary
References
Chapter 62: Planning a Group-Randomized Trial
62.1 Introduction
62.2 The Research Question
62.3 The Research Team
62.4 The Research Design
62.5 Potential Design Problems and Methods to Avoid Them
62.6 Potential Analytic Problems and Methods to Avoid Them
62.7 Variables of Interest and Their Measures
62.8 The Intervention
62.9 Power
62.10 Summary
References
Chapter 63: Postmenopausal Estrogen/Progestin Interventions Trial (PEPI)
63.1 Introduction
63.2 Design and Objectives
63.3 Study Design
63.4 Outcomes
63.5 Results
63.6 Conclusions
References
Further Reading
Chapter 64: Preference Trials
64.1 Introduction
64.2 Potential Effects of Preference
64.3 The Patient Preference Design
64.4 Advantages and Disadvantages of the Patient Preference Design
64.5 Alternative Designs
64.6 Discussion
References
Further Reading
Chapter 65: Prevention Trials
65.1 Introduction
65.2 Role Among Possible Research Strategies
65.3 Prevention Trial Planning and Design
65.4 Conduct, Monitoring, and Analysis
References
Chapter 66: Primary Efficacy Endpoint
66.1 Defining the Primary Endpoint
66.2 Fairness of Endpoints
66.3 Specificity of the Primary Endpoint
66.4 Composite Primary Endpoints
66.5 Missing Primary Endpoint Data
66.6 Censored Primary Endpoints
66.7 Surrogate Primary Endpoints
66.8 Multiple Primary Endpoints
66.9 Secondary Endpoints
References
Further Reading
Chapter 67: Prognostic Variables in Clinical Trials
67.1 Introduction
67.2 A General Theory of Prognostic Variables
67.3 Valid Covariates and Recognizable Subsets
67.4 Stratified Randomization and Analysis
67.5 Statistical Importance of Prognostic Factors
References
Chapter 68: Randomization Procedures
68.1 Basics
68.2 General Classes of Randomization: Complete versus Imbalance-Restricted Procedures
68.3 Procedures for Imbalance-Restricted Randomization
68.4 Randomization-Based Analysis and the Validation Transformation
68.5 Conclusions
References
Chapter 69: Randomization Schedule
69.1 Introduction
69.2 Preparing the Schedule
69.3 Schedules for Open-Label Trials
69.4 Schedules to Mitigate Loss of Balance in Treatment Assignments Because of Incomplete Blocks
69.5 Issues Related to the use of Randomization Schedule
69.6 Summary
References
Further Reading
Chapter 70: Repeated Measurements
70.1 Introduction and Case Study
70.2 Linear Models for Gaussian Data
70.3 Models for Discrete Outcomes
70.4 Design Considerations
70.5 Concluding Remarks
References
Chapter 71: Simple Randomization
71.1 Introduction
71.2 Concept of Randomization
71.3 Why is Randomization Needed?
71.4 Methods: Simple Randomization
71.5 Advantages and Disadvantages of Randomization
71.6 Other Randomization Methods
71.7 Stratified Randomization
References
Further Reading
Chapter 72: Subgroups
72.1 Introduction
72.2 The General Problem
72.3 Definitions
72.4 Subgroup Effects and Interactions
72.5 Tests of Interactions and the Problem of Power
72.6 Subgroups and the Problem of Multiple Comparisons
72.7 Demographic Subgroups
72.8 Physiological Subgroups
72.9 Target Subgroups
72.10 Improper Subgroups
72.11 Summary
References
Chapter 73: Superiority Trials
73.1 Introduction
73.2 Clinicians Ask One-Sided Questions, and Want Immediate Answers
73.3 But Traditional Statistics Is Two-Sided
73.4 The Consequences of Two-Sided Answers to One-Sided Questions
73.5 The Fallacy of the “Negative” Trial
73.6 The Solution Lies in Employing One-Sided Statistics
73.7 Examples of Employing One-Sided Statistics
73.8 One-Sided Statistical Analyses Need to be Specified Ahead of Time
73.9 A Graphic Demonstration of Superiority and Noninferiority
73.10 How to Think about and Incorporate Minimally Important Differences
73.11 Incorporating Confidence Intervals for Treatment Effects
73.12 Why We Should Never Label an “Indeterminate” Trial Result as “Negative” or as Showing “No Effect”
73.13 How Does a Treatment Become “Established Effective Therapy”?
73.14 Most Trials are Too Small to Declare a Treatment “Established Effective Therapy”
73.15 How Do We Achieve a Superiority Result?
73.16 Superiority and Noninferiority Trials when Established Effective Therapy Already Exists
73.17 Exceptions to the Rule that It Is Always Unethical to Substitute Placebos for Established Effective Therapy
73.18 When a Promising New Treatment Might be Added to Established Effective Therapy
73.19 Using Placebos in a Trial Should Not Mean the Absence of Treatment
73.20 Demonstrating Trials of Promising New Treatments Against (or in Addition to) Established Effective Therapy
73.21 Why We Almost Never Find, and Rarely Seek, True “Equivalence”
73.22 The Graphical Demonstration of “Superiority” and “Noninferiority”
73.23 Completing the Circle: Converting One-Sided Clinical Thinking into One-Sided Statistical Analysis
73.24 A Final Note on Superiority and Noninferiority Trials of “Me-Too” Drugs
References
Further Reading
Chapter 74: Surrogate Endpoints
74.1 Introduction
74.2 Illustrations
74.3 Validation of Surrogates
74.4 Auxiliary Variables
74.5 Conclusions
References
Chapter 75: TNT Trial
75.1 Introduction
75.2 Objectives
75.3 Study Design
75.4 Results
75.5 Conclusions
References
Further Reading
Chapter 76: UGDP Trial
76.1 Introduction
76.2 Design and Chronology
76.3 Results
76.4 Conclusion and Discussion
References
Chapter 77: Women’s Health Initiative Hormone Therapy Trials
77.1 Introduction
77.2 Objectives
77.3 Study Design
77.4 Results
77.5 Conclusions
References
Chapter 78: Women’s Health Initiative Dietary Modification Trial: Update and Application of Biomarker Calibration to Self-Report Measures of Diet and Physical Activity
78.1 Rationale for Biomarker Calibration of Self-Report Measures of Diet
78.2 Nutrient Biomarker Study Energy and Protein Calibration
78.3 Measurement Error Properties of 4DFR, 24HR, and FFQ
78.4 Calibration of Self-Report Measures of Physical Activity
78.5 Psychosocial Measures and Biomarker-Calibrated Intake
78.6 Calibrated Energy, Protein, Protein Density, and Cardiovascular Disease Incidence
78.7 Diabetes and Calibrated Consumption
78.8 Cancer and Calibrated Intake
78.9 Associations Between Protein Intake, Frailty, and Renal Function
78.10 Summary and Future Directions
References
Index
Methods and Applications of Statistics in Clinical Trials
WILEY SERIES IN METHODS AND APPLICATIONS OF STATISTICS
Advisory Editor
N. BalakrishnanMcMaster University, Canada
The Wiley Series in Methods and Applications of Statistics is a unique grouping of research that features classic contributions from Wiley’s Encyclopedia of Statistical Sciences, Second Edition (ESS, 2e) alongside newly written articles that explore various problems of interest and their intrinsic connection to statistics. The goal of this collection is to encompass an encyclopedic scope of coverage within individual books that unify the most important and interesting applications of statistics within a specific field of study. Each book in the series successfully upholds the goals of ESS, 2e by combining established literature and newly developed contributions written by leading academics, researchers, and practitioners in a comprehensive and accessible format. The result is a succinct reference that unveils modern, cutting-edge approaches to acquiring, analyzing, and presenting data across diverse subject areas.
WILEY SERIES IN METHODS AND APPLICATIONS OF STATISTICS
Balakrishnan · Methods and Applications of Statistics in the Life and Health Sciences
Balakrishnan · Methods and Applications of Statistics in Business, Finance, and Management Science
Balakrishnan · Methods and Applications of Statistics in Engineering, Quality Control, and the Physical Sciences
Balakrishnan · Methods and Applications of Statistics in the Social and Behavioral Sciences
Balakrishnan · Methods and Applications of Statistics in the Atmospheric and Earth Sciences
Balakrishnan · Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs
Balakrishnan · Methods and Applications of Statistics in Clinical Trials, Volume 2: Planning, Analysis, and Inferential Methods
Copyright © 2014 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reserved.Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Methods and applications of statistics in clinical trials / [edited by] N. Balakrishnan. 1 online resource. — (Methods and applications of statistics) Includes bibliographical references and index. Description based on print version record and CIP data provided by publisher; resource not viewed. ISBN 978-1-118-59591-6 (ePub) — ISBN 978-1-118-59592-3 (Adobe PDF) — ISBN 978-1-118-59596-1 (ePub) — ISBN 978-1-118-59597-8 (Adobe PDF) — ISBN 978-1-118-30473-0 (cloth) I. Balakrishnan, N., 1956- editor of compilation. [DNLM: 1. Clinical Trials as Topic. 2. Statistics as Topic. QV 771.4] R853.C55 610.72’4—dc23 2013035130
Ian E. Alexander, Gene Therapy Research Unit of the Children’s Medical Research Institute and The Children’s Hospital at Westmead and University of Sydney, Discipline of Paediatrics and Child Health, Westmead, Australia, [email protected]
Janet W. Andersen, Harvard School of Public Health, Boston, MA, [email protected]
Per Kragh Andersen, University of Copenhagen, Copenhagen, Denmark, [email protected]
Andrew L. Avins, University of California, San Francisco, CA
Rosemary A. Bailey
Peter Bauer, *Deceased, 2002
David B. Barr, Kendle International, Cincinnati, OH
Shari S. Bassuk, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, [email protected]
Jeannette M. Beasley, Albert Einstein College of Medicine, Bronx, NY, [email protected]
Vance W. Berger, National Cancer Institute, Bethesda, MD¡[email protected]
Werner Brannath, University of Bremen, Bremen, Germany, [email protected]
Michael Branson, Norvatis Pharma AG, Basel, Switzerland, [email protected]
Thomas Braun, University of Michigan Arm Arbor, MI, [email protected]
Frank Bretz, Norvatis Pharma AG, Basel, Switzerland, [email protected]
Louis Cabanilla, Tufts University Center for the Study of Drug Development Boston, MA
Marion K. Campbell, University of Aberdeen, Aberdeen, UK, [email protected]
Paul L. Canner, Maryland Medical Research Institute, Baltimore, MD, [email protected].
Joseph C. Cappelleri, Pfizer, Inc., Global Research & Development, Groton, CT, [email protected]
Rick Chappell, University of Wisconsin, Madison, WI, [email protected]
Ying-Kuen K. Cheung, Columbia University, New York, NY, [email protected]
Sylvie Chevret, Inserm, Paris, France, [email protected]
Joseph P. Costantino, University of Pittsburgh, Pittsburgh, PA, [email protected]
Simon Day, Roche Products Ltd., Welwyn Garden City, UK, [email protected]
Victor DeGruttola, Harvard School of Public Health, Boston, MA, [email protected]
David L. DeMets, University of Wisconsin-Madison, Madison, WI, [email protected]
Chongzhi Di, Fred Hutchinson Cancer Research Center, Seattle, WA, [email protected]
Alexei Dmitrienko, Eli Lilly and Company Indianapolis, Indianapolis, IN
Allan Donner, University of Western Ontario, London, ON, Canada, [email protected]
Therese Dupin-Spriet
Peter J. Dyck, Mayo Clinic College of Medicine, Rochester, MN, [email protected]
Lynn E. Eberly, University of Minnesota, Minneapolis, MN, [email protected]
Thomas R. Fleming, University of Washington, Seattle, DC, [email protected]
Dean A. Follmann, National Institute of Allergy and Infectious Diseases, Bethesda, MD, [email protected]
Mary A. Foulkes, The George Washington University, Washington, DC, [email protected]
Elizabeth Garrett-Mayer, Johns Hopkins University, Baltimore, MD, [email protected]
Edmund A. Gehan, Georgetown University Medical Center, Washington, DC, [email protected]
Samantha L. Ginn, Gene Therapy Research Unit of the Children’s Medical Research Institute and The Children’s Hospital at Westmead and The University of Sydney, Sydney Medical School, Syndey, Australia, [email protected]
Els Goetghebeur, Ghent University, Ghent, Belgium, [email protected]
Charles H. Goldsmith
Erika Graf, Clinical Trials Unit, University Medical Center Freiburg, Freiburg, Germany, [email protected]
William C. Grant
Stephanie Green, Clinical Biostatistics, Pfizer, Inc., New London, CT
Sander Greenland, University of California, Los Angeles, CA, [email protected]
Scott M. Grundy, University of Texas Southwestern Medical Center, Dallas, TX, [email protected]
Weili He, Merck & Co., Inc., Rahway, NJ, [email protected]
Anne Holbrook, McMaster University, Hamilton, ON, Canada, [email protected]
Jason C. Hsu, Ohio State University, Columbus, OH, [email protected]
Peng Huang, John Hopkins University, Baltimore, MD, [email protected]
Ying Huang, Fred Hutchinson Cancer Research Center, Seattle, WA, [email protected]
H. M. James Hung, U.S. Food and Drug Administration Silver Spring, MD, [email protected]
Anastasia Ivanova, University of North Carolina at Chapel Hill, NC, [email protected]
Sudha K. Iyengar, Case Western Reserve University, Cleveland, OH, [email protected]
Byron Jones, Pfizer Pharmaceuticals, Sandwich, UK
Celia C. Kamath, Health Sciences Research, Mayo Clinic, Rochester, MN, [email protected]
Oliver Keene, GlaxoSmithKline Research and Development, Stockley Park, UK, [email protected]
Michael G. Kenward, London School of Hygiene and Tropical Medicine, London, UK, [email protected]
Kyungmann Kim, University of Wisconsin, Madison, WI, [email protected]
Cheryl Kious, Quintiles Transnational Corporation, Durham, NC
Neil Klar, Cancer Care Ontario, Toronto, ON, Canada, [email protected]
Lewis H. Kuller, University of Pittsburgh, Pittsburgh, PA, [email protected]
Olga M. Kuznetsova, Merck & Co., Inc. Rahway, NJ, [email protected]
K. K. Gordon Lan, Johnson & Johnson, Raritan, NJ, [email protected]
Robert D. Langer, University of Nevada School of Medicine, Las Vegas, NV, [email protected]
John C. Larosa, State University of New York Health, Science Center, Brooklyn, NY
Emmanuel Lesaffre, Catholic University of Leuven, Leuven, Belgium, [email protected]
Mova Leung, Carlo Fidani Peel Regional Cancer Centre, Credit Valley Hospital, Mississauga, ON, Canada
Hung-I Li
Wenjun Li, University of Massachusetts Medical School, Worcester, MA, [email protected]
Zhengqing Li, Global Biometric Science, Bristol-Myers Squibb Company, Wallingford, CT
Jun Liu, Columbia University, New York, NY, [email protected]
Qing Liu, Johnson and Johnson Pharmaceutical, Research and Development, Raritan, NJ, [email protected]
Craig Mallinckrodt, Eli Lilly and Company, Indianapolis, IN, [email protected]
JoAnn E. Manson, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, [email protected]
Ruth McBride, Axio Research, Seattle, WA, [email protected]
Damian McEntegart, ClinPhone Group Ltd., Nottingham, UK, [email protected]
Jesper Mehlsen, Frederiksberg Hospital—Clinical, Physiology & Nuclear Medicine, Frederiksberg, Denmark, [email protected]
Curtis L. Meinert, The Johns Hopkins University, Bloomberg School of Public Health, Department of Epidemiology, Baltimore, MD, [email protected]
Christopher P. Milne, Tufts University Center for the Study of Drug Development Boston, MA
Geert Molenberghs, Hasselt University, Diepenbeek, Belgium, [email protected]
Yasmin Mossavar-Rahmani, Albert Einstein College of Medicine, Bronx, NY, [email protected]
David M. Murray, Ohio State University, Columbus, OH, [email protected]
James D. Neaton, University of Minnesota, Minneapolis, MN, [email protected]
John Neuhaus, University of California, San Francisco, CA, [email protected]
Marian L. Neuhouser, Fred Hutchinson Cancer Research Center, Seattle, WA, [email protected]
Robert Newcomb, Cardiff University, Cardiff, Wales, UK, [email protected]
Peter C. O’Brien, Mayo Clinic College of Medicine, Rochester, MN
Robert O’Neill, U.S. Food and Drug Administration Silver Spring, MD, [email protected].
Assaf P. Oron, Seattle Children’s Research Institute, Seattle, WA, [email protected]
Scott D. Patterson, Wyeth Research & Development, Collegeville, PA, [email protected]
Inna T. Perevozskaya, Pfizer Pharmaceuticals, Philadelphia, PA [email protected]
Martin Posch, Medical University of Vienna, Vienna, Austria, [email protected]
Ross L. Prentice, Fred Hutchinson Cancer Research Center, Seattle, WA, [email protected]
David L. Sackett, Trout Research & Education Centre, Ontario, Canada, [email protected]
A. J. Sankoh, National Cancer Institute, Bethesda, MD
Willi Sauerbrei, University Medical Center Freiburg, Freiburg, Germany, [email protected]
Claudia Schmoor, Clinical Trials Unit, University Medical Center Freiburg, Freiburg, Germany, [email protected]
Marvin A. Schneiderman *Deceased, April 1997
Carsten Schwenke, Schwenke Consulting: Strategies and Solutions in Statistics, Berlin, Germany
Pranab K. Sen, The University of North Carolina at Chapel Hill, Chapel Hill, NC, [email protected]
David E. Shapiro, Harvard School of Public Health, Boston, MA, [email protected]
Pamela Shaw, National Institute of Health, Bethesda, MD, [email protected]
Theru A. Sivakumaran, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, [email protected]
Jeffrey A. Sloan, Health Sciences Research, Mayo Clinic, Rochester, MN, [email protected]
Mike D. Smith, Pfizer Global Research & Development, New London, CT
Jeremiah Stamler, Northwestern University, Chicago, IL, [email protected]
Peter F. Thall, M. D. Anderson Cancer Center Houston, TX, [email protected]
Barbara C. Tilley, Medical University of South Carolina, Charleston, SC, [email protected]
Lesley F. Tinker, Fred Hutchinson Cancer Research Center, Seattle, WA, [email protected]
David J. Torgerson, University of York, York, UK, [email protected]
Geert Verbeke, Catholic University of Leuven, Leuven, Belgium, [email protected]
Sue-J. Wang, U.S. Food and Drug Administration Silver Spring, MD, [email protected]
Gemot Wassmer, University of Cologne, Cologne, Germany, [email protected]
David D. Waters, San Francisco General Hospital, San Francisco, CA, [email protected]
Janet Wittes, Statistics Collaborative, Inc., Washington, DC, [email protected]
Névine Zariffa, GlaxoSmithKline, Philadelphia, PA
Cheng Zheng, University of Washington, Seattle, WA, [email protected]
Sarah Zohar, Saint-Louis Hospital, Paris, France
Preface
Planning, developing, and implementing clinical trials, have become an important and integral part of life. More and more efforts and care go into conducting various clinical trials as they have been responsible in making key advances in medicine and treatments to different illnesses. Today, clinical trials have become mandatory in the development and evaluation of modern drugs and in identifying the association of risk factors to diseases. Due to the complexity of various issues surrounding clinical trials, regulatory agencies oversee their approval and also ensure impartial review. The main purpose of this two-volume handbook is to provide a detailed exposition of historical developments and also to highlight modern advances on methods and analysis for clinical trials.
It is important to mention that the four-volume Wiley Encyclopedia of Clinical Trials served as a basis for this handbook. While many pertinent entries from this Encyclopedia have been included here, a number of them have been updated to reflect recent developments on their topics. Some new articles detailing modern advances in statistical methods in clinical trials and their applications have also been included.
A volume of this size and nature cannot be successfully completed without the co-operation and support of the contributing authors, and my sincere thanks and gratitude go to all of them. Thanks are also due to Mr. Steve Quigley and Ms. Sari Friedman (of John Wiley & Sons, Inc.) for their keen interest in this project from day one, as well as for their support and constant encouragement (and, of course, occasional nudges, too) throughout the course of this project. Careful and diligent work of Mrs. Debbie Iscoe in the typesetting of this volume and of Angioline Loredo at the production state, is gratefully acknowledged. Partial financial support of the Natural Sciences and Engineering Research Council of Canada also assisted in the preparation of this handbook, and this support is much appreciated.
This is the sixth in a series of handbooks on methods and applications of statistics. While the first handbook has focused on life and health sciences, the second handbook has focused on business, finance, and management sciences, the third has focused on engineering, quality control, and physical sciences, the fourth has focused on behavioral and social sciences, and the fifth has focused on atmospheric and earth sciences, the present handbook has concentrated on methods and applications of statistics to clinical trials. This is the first of two volumes describing in detail, statistical developments concerning clinical trials.
It is my sincere hope that this handbook and the others in the series will become basic reference resources for those involved in these fields of research!
PROF. N. BALAKRISHNANMcMASTER UNIVERSITY
Hamilton, CanadaDecember 2013
Robert Newcomb
Many response variables in clinical trials are binary: the treatment was successful or unsuccessful; the adverse effect did or did not occur. Binary variables are summarized by proportions, which may be compared between different arms of a study by calculating either an absolute difference of proportions or a relative measure, the relative risk or the odds ratio. In this article we consider several point and interval estimates for the absolute difference between two proportions, for both unpaired and paired study designs. The simplest methods encounter problems when numerators or denominators are small; accordingly, better methods are introduced. Because confidence interval methods for differences of proportions are derived from related methods for the simpler case of the single proportion, which itself can also be of interest in a clinical trial, this case is also considered in some depth. Illustrative examples relating to data from two clinical trials are shown.
In most clinical trials, the unit of data is the individual, and statistical analyses for efficacy and safety outcomes compare responses between the two (or more) treatment groups. When subjects are randomized between these groups, responses of subjects in one group are independent of those in the other group. This leads to unpaired analyses. Crossover and split-unit designs require paired analyses. These have many features in common with the unpaired analyses and will be described in the final section.
Several effect size measures are widely used for comparison of two independent proportions:
Difference of proportions p1 − p2
Ratio of proportions (risk ratio or relative risk) p1/p2
Odds ratio (p1/(1 − p1))/(p2/(1 − p2))
In this article we consider in particular the difference between two proportions, p1 − p2, as a measure of effect size. This is variously referred to as the absolute risk reduction, risk difference, or success rate difference. Other articles in this work describe the risk ratio or relative risk and the odds ratio. We consider both point and interval estimates, in recognition that “confidence intervals convey information about magnitude and precision of effect simultaneously, keeping these two aspects of measurement closely linked” [1]. In the clinical trial context, a difference between two proportions is often referred to as an absolute risk reduction. However, it should be borne in mind that any term that includes the word “reduction” really presupposes that the direction of the difference will be a reduction in risk—such terminology becomes awkward when the anticipated benefit does not materialize, including the nonsignificant case when the confidence interval for the difference extends beyond the null hypothesis value of zero. The same applies to the relative risk reduction, 1 − p1/p2. Whenever results are presented, it is vitally important that the direction of the observed difference should be made unequivocally clear. Moreover, sometimes confusing labels are used, which might be interpreted to mean something other than p1 − p2; for example, Hashemi et al. [2] refer to p1 − p2 as attributable risk. It is also vital to distinguish between relative and absolute risk reduction.
In clinical trials, as in other prospective and cross-sectional designs already described, each of the three quantities we have discussed may validly be used as a measure of effect size. The risk difference and risk ratio compare two proportions from different perspectives. A halving of risk will have much greater population impact for a common outcome than for an infrequent one. Schechtman [3] recommends that both a relative and an absolute measure should always be reported, with appropriate confidence intervals.
The odds ratio is discussed at length by Agresti [4]. It is widely regarded as having a special preferred status on account of its role in retrospective case-control studies and in logistic regression and meta-analysis. Nevertheless, it should not be regarded as having gold standard status as a measure of effect size for the 2 × 2 table [3,5].
Before considering the difference between two independent proportions in detail, we first consider some of the issues that arise in relation to the fundamental task of estimating a single proportion. These issues have repercussions for the comparison of proportions because confidence interval methods for p1 − p2 are generally based closely on those for proportions. The single proportion is also relevant to clinical trials in its own right. For example, in a clinical trial comparing surgical versus conservative management, we would be concerned with estimating the incidence of a particular complication of surgery such as postoperative bleeding, even though there is no question of obtaining a contrasting value in the conservative group or of formally comparing these.
Unfortunately, confidence intervals for proportions and their differences do not achieve their nominal coverage properties. This is because the sample space is discrete and bounded. The Wald method for the single proportion has three unfavorable properties [6–9]. These can all be traced to the interval’s simple symmetry about the empirical estimate.
The achieved coverage is much lower than the nominal value. For some values of π, the achieved coverage probability is close to zero.
The noncoverage probabilities in the two tails are very different. The location of the interval is too distal—too far out from the center of symmetry of the scale, . The noncoverage of the interval is predominantly mesial.
Many improved methods for confidence intervals for proportions have been developed. The properties of these methods are evaluated by choosing suitable parameter space points (here, combinations of n and π), using these to generate large numbers of simulated random samples, and recording how often the resulting confidence interval includes the true value π. The resulting coverage probabilities are then summarized by calculating the mean coverage and minimum coverage across the simulated datasets.
Generally, the improved methods obviate the boundary violation problem, and improve coverage and location. The most widely researched options are as follows.
A continuity correction may be incorporated: p ± {z√(pq/n) + 1/(2n)}. This certainly improves coverage and obviates zero-width intervals but increases the incidence of boundary overflow.
It is easily demonstrated [7] that the resulting interval is symmetrical on the logit scale—the other natural scale for proportions—by considering the product of the two roots for π, and likewise for 1 − π. The resulting interval is boundary respecting and has appropriate mean coverage. In contrast to the Wald interval, location is rather too mesial.
Alternatively, the Bayesian approach described elsewhere in this work may be used. The resulting intervals are best referred to as credible intervals, in recognition that the interpretation is slightly different from that of frequentist confidence intervals such as those previously described.
Bayesian inference starts with a prior distribution for the parameter of interest, in this instance the proportion π. This is then combined with the likelihood function comprising the evidence from the sample to form a posterior distribution that represents beliefs about the parameter after the data have been obtained. When a conjugate prior is chosen from the beta distribution family, the posterior distribution takes a relatively simple form: it is also a beta distribution. If substantial information about π exists, an informative prior may be chosen to encapsulate this information.
More often, an uninformative prior is used. The simplest is the uniform prior B(1,1), which assumes that all possible values of π between 0 and 1 start off equally likely. An alternative uninformative prior with some advantages is the Jeffreys prior B(, ). Both are diffuse priors, which spread the probability thinly across the whole range of possible values from 0 to 1.
The resulting posterior distribution may be displayed graphically, or may be summarized by salient summary statistics such as the posterior mean and median and selected centiles. The 2 1/2 and 97 1/2 centiles of the posterior distribution delimit the tail-based 95% credible interval. Alternatively, a highest posterior density interval may be reported. The tail-based interval is considered preferable because it produces equivalent results when a transformed scale (e.g., logit) is used [11].
These Bayesian intervals perform well in a frequentist sense [12]. Hence, it is now appropriate to regard them as confidence interval methods in their own right, with theoretical justification in the Bayesian paradigm but empirical validation from a frequentist standpoint. They may thus be termed beta intervals. They are readily calculated using software for the incomplete beta function, which is included in statistical packages and also spreadsheet software such as Microsoft Excel. As such, they should now be regarded as computationally of “closed form,” though less transparent than Wald methods.
Many statisticians consider that a coverage level should represent minimum, not average, coverage. The Clopper-Pearson “exact” or tail-based method [13] achieves this, at the cost of being excessively conservative; intervals are unnecessarily wide. There is a trade-off between coverage and width; it is always possible to increase coverage by widening intervals, and the aim is to attain good coverage without excessive width. A variant on the “exact” method involving a mid-P accumulation of tail probabilities [14,15] aligns mean coverage closely with the nominal 1 − α. Both methods have appropriate location. The Clopper-Pearson interval, but not the mid-P one, is readily programmed as a beta interval, of similar form to Bayes intervals. A variety of shortened intervals have also been developed that maintain minimum coverage but substantially shrink interval length [16,17]. Shortened intervals are much more complex, both computationally and conceptually. They also have the disadvantage that what is optimized is the interval, not the lower and upper limits separately; consequently, they are unsuitable when interest centers on one of the limits rather than the other.
Numerical examples illustrating these calculations are based on some results from a very small randomized phase II clinical trial performed by the Eastern Cooperative Oncology Group [18]. Table 1 shows the results for two outcomes, treatment success defined as shrinkage of the tumor by 50% or more, and life-threatening treatment toxicity, for the two treatment groups A and B.
Table 1: Some Results from a Very Small Randomized Phase II Clinical Trial Performed by the Eastern Cooperative Oncology Group
Treatment A
Treatment B
Number of patients
14
11
Number with successful outcome: tumor shrinkage by ≥50%
0
0
Number with life-threatening treatment toxicity
2
1
Source: Parzen et al. J Comput Graph Stat. 2002: 11; 420–436.
Table 2 shows 95% confidence intervals for both outcomes for treatment A. These examples show how Wald and derived intervals often produce inappropriate limits (see asterisks) in boundary and near-boundary cases.
Table 2: 95% Confidence Intervals for Proportions of Patients with Successful Outcome and with Life-Threatening Toxicity on Treatment A in the Eastern Cooperative Oncology Group Trial
Outcome
Successful Tumor Shrinkage
Life-Threatening Toxicity
Empirical estimate
0
0.1429
Wald interval
0 to 0*
<0* to 0.3262
Wald interval with continuity correction
<0* to 0.0357
<0* to 0.3619
Wilson score interval
0 to 0.2153
0.0401 to 0.3994
Agresti-Coull shrinkage estimate
0.1111
0.2222
Agresti-Coull interval
<0* to 0.2563
0.0302 to 0.4143
Bayes interval,
B
(1,1) prior
0 to 0.2180
0.0433 to 0.4046
Bayes interval,
B
(, ) prior
0 to 0.1616
0.0309 to 0.3849
Clopper-Pearson “exact” interval
0 to 0.2316
0.0178 to 0.4281
Mid-
P
interval
0 to 0.1926
0.0247 to 0.3974
Note: Asterisks denote boundary violations.
Source: Parzen et al. J Comput Graph Stat. 2002: 11; 420–436.
We return to the unpaired difference case. As described elsewhere in this work, hypothesis testing for the comparison of two proportions takes a quite different form according to whether the objective of the trial is to ascertain difference or equivalence. When we report the contrast between two proportions with an appropriately constructed confidence interval, this issue is taken into account only when we come to interpret the calculated point and interval estimates. In this respect, in comparison with hypothesis testing, the confidence interval approach leads to much simpler, more flexible patterns of inference.
The Wald interval is calculated as p1 − p2 ± z√(p1q1/n1 + p2q2/n2). It has poor mean and minimum coverage and fails to produce an interval when both p1 and p2 are 0 or 1. Overshoot can occur when one proportion is close to 1 and the other is close to 0, but this situation is expected to occur infrequently in practice. Use of a continuity correction improves mean coverage, but minimum coverage remains low.
Some of the better methods substitute the profile estimate γδ, which is the maximum likelihood estimate of γ conditional on a hypothesized value of δ. These include score-type asymptotic intervals developed by Mee [19] and Miettinen and Nurminen [20]. Newcombe [21] developed tail-based exact and mid-P intervals involving substitution of the profile estimate.
All these intervals are boundary respecting. The “exact” method aligns the minimum coverage quite well with the nominal 1 − α; the others align mean coverage well with 1 − α, at the expense of fairly complex iterative calculation.
Bayesian intervals for p1 − p2 and other comparative measures may be constructed [2,11], but they are computationally much more complex than in the single proportion case, requiring use of numerical integration or computer-intensive methodology such as Markov chain Monte Carlo (MCMC) methods. It may be more appropriate to incorporate a prior for p1 − p2 itself rather than independent priors for p1 and p2 [22]. The Bayesian formulation is readily adapted to incorporate functional constraints such as δ ≥ 0 [22]. Walters [23] and Agresti and Min [11] have shown that Bayes intervals for p1 − p2 with uninformative beta priors have favorable frequentist properties.
Two computationally simpler, effective approaches have been developed. Newcombe [21] also formulated square-and-add intervals for differences of proportions. The concept is a very simple one. Assuming independence, the variance of a difference between two quantities is the sum of their variances. In other words, standard errors “square and add”—they combine in the same way that differences in x and in y coordinates combine to give the Euclidean distance along the diagonal, as in Pythagoras’ theorem. This is precisely how the Wald interval for p1 − p2 is constructed. The same principle may be applied starting with other, better intervals for p1 and p2 separately. The Wilson score interval is a natural choice as it already involves square roots, though squaring and adding would work equally effectively starting with, for instance, tail-based [24] or Bayes intervals. It is easily demonstrated that the square-and-add process preserves the property of respecting boundaries.
This easily computed interval aligns mean coverage closely with the nominal 1 − α. A continuity correction is readily incorporated, resulting in more conservative coverage. Both intervals tend to be more mesially positioned than the γδ-based intervals discussed previously.
The square-and-add approach may be applied a second time to obtain a confidence interval for a difference between differences of proportions [25]; this is the linear scale analogue of assessing an interaction effect in logistic regression.
Another simple approach that is a great improvement over the Wald method is the pseudo-frequency method [26,27]. A pseudo-frequency is added to each of the four cells of the 2 × 2 table, resulting in the shrinkage estimator (r1 + )/(n1 + 2) − (r2 + )/(n2 + 2).
The Wald formula then produces the limits
where
Agresti and Caffo [27] evaluated the effect of choosing different values of , and they reported that adding 1 to each cell is optimal here. So here, just as for the single proportion case, in total four pseudo-observations are added. This approach also aligns mean coverage effectively with 1 − α. Interval location is rather too mesial, very similar to that of the square-and-add method. Zero-width intervals cannot occur. Boundary violation is not ruled out but is expected to be infrequent.
Table 3 shows 95% confidence intervals calculated by these methods, comparing treatments A and B in the ECOG trial [18].
Table 3: 95% Confidence Intervals for Differences in Proportions of Patients with Successful Outcome and with Life-Threatening Toxicity between Treatments A and B in the Eastern Cooperative Oncology Group Trial
Outcome
Successful Tumor Shrinkage
Life-Threatening Toxicity
Empirical estimate
0
0.0519
Wald interval
0* to 0*
-0.1980 to 0.3019
Mee interval
-0.2588 to 0.2153
-0.2619 to 0.3312
Miettinen-Nurminen interval
-0.2667 to 0.2223
-0.2693 to 0.3374
Tail-based “exact” interval
-0.2849 to 0.2316
-0.2721 to 0.3514
Tail-based mid-
P
interval
-0.2384 to 0.1926
-0.2539 to 0.3352
Bayes interval, B(1,1) priors for
p
1
and
p
2
-0.2198 to 0.1685
-0.2432 to 0.2986
Bayes interval, B(, ) priors for
p
1
and
p
2
-0.1768 to 0.1361
-0.2288 to 0.3008
Square-and-add Wilson interval
-0.2588 to 0.2153
-0.2524 to 0.3192
Agresti-Caffo shrinkage estimate
-0.0144
0.0337
Agresti-Caffo interval
-0.2016 to 0.1728
-0.2403 to 0.3076
Note: Asterisks denote boundary violations.
Source: Parzen et al. J Comput Graph Stat. 2002: 11; 420–436.
Accordingly, it seems preferable to report absolute risk reductions in percentage rather than reciprocal form. The most appropriate uses of the NNT are in giving simple bottomline figures to patients (in which situation, usually only the point estimate would be given), and in labeling a secondary axis on a graph.
Crossover and split-unit trial designs lead to paired analyses. Regimes that aim to produce a cure are generally not suitable for evaluation in these designs, because in the event that a treatment is effective, there would be a carryover effect into the next treatment period. For this reason, these designs tend to be used for evaluation of regimes that seek to control symptomatology, and thus most often give rise to continuous outcome measures. Examples of paired analyses of binary data in clinical trials include comparisons of different antinauseant regimes administered in randomized order during different cycles of chemotherapy, comparisons of treatments for headache pain, and split-unit studies in ophthalmology and dermatology. Results can be reported in either risk difference or NNT form, though the latter appears not to be frequently used in this context. Other examples in settings other than clinical trials include longitudinal comparison of oral carriage of an organism before and after third molar extraction, and twin studies.
Let a, b, c, and d denote the four cells of the paired contingency table. Here, b and c are the discordant cells, and interest centers on the difference of marginals:
Hypothesis testing is most commonly performed using the McNemar approach [32], using either an asymptotic test statistic expressed as z or chi-square, or an aggregated tail probability. In both situations, inference is conditional on the total number of discordant pairs, b + c.
Newcombe [33] reviewed confidence interval methods for the paired difference case. Many of these are closely analogous to unpaired methods. The Wald interval performs poorly. So does a conditional approach, based on an interval for the simple proportion b/(b + c). Exact and tail-based profile methods perform well; although, as before, these are computationally complex. A closed-form square-and-add approach, modified to take account of the nonindependence, also aligns mean coverage with 1 − α, provided that a novel form of continuity correction is incorporated.
