Methods and Applications of Statistics in Clinical Trials, Volume 1 -  - E-Book

Methods and Applications of Statistics in Clinical Trials, Volume 1 E-Book

0,0
181,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A complete guide to the key statistical concepts essential for the design and construction of clinical trials As the newest major resource in the field of medical research, Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs presents a timely and authoritative reviewof the central statistical concepts used to build clinical trials that obtain the best results. The referenceunveils modern approaches vital to understanding, creating, and evaluating data obtained throughoutthe various stages of clinical trial design and analysis. Accessible and comprehensive, the first volume in a two-part set includes newly-written articles as well as established literature from the Wiley Encyclopedia of Clinical Trials. Illustrating a variety of statistical concepts and principles such as longitudinal data, missing data, covariates, biased-coin randomization, repeated measurements, and simple randomization, the book also provides in-depth coverage of the various trial designs found within phase I-IV trials. Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs also features: * Detailed chapters on the type of trial designs, such as adaptive, crossover, group-randomized, multicenter, non-inferiority, non-randomized, open-labeled, preference, prevention, and superiority trials * Over 100 contributions from leading academics, researchers, and practitioners * An exploration of ongoing, cutting-edge clinical trials on early cancer and heart disease, mother-to-child human immunodeficiency virus transmission trials, and the AIDS Clinical Trials Group Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs is an excellent reference for researchers, practitioners, and students in the fields of clinicaltrials, pharmaceutics, biostatistics, medical research design, biology, biomedicine, epidemiology,and public health.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 2225

Veröffentlichungsjahr: 2014

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Cover

Half Title page

Title page

Copyright page

Contributors

Preface

Chapter 1: Absolute Risk Reduction

1.1 Introduction

1.2 Preliminary Issues

1.3 Point and Interval Estimates for a Single Proportion

1.4 An Unpaired Difference of Proportions

1.5 Number Needed to Treat

1.6 A Paired Difference of Proportions

References

Further Reading

Chapter 2: Accelerated Approval

2.1 Introduction

2.2 Accelerated Development Versus Expanded Access in the U.S.A.

2.3 Sorting the Terminology—Which FDA Initiatives Do What?

2.4 Accelerated Approval Regulations: 21 C.F.R. 314.500, 314.520, 601.40

2.5 Stages of Drug Development and FDA Initiatives

2.6 Accelerated Approval Regulations: 21 CFR 314.500, 314.520, 601.40

2.7 Accelerated Approval with Surrogate Endpoints

2.8 Accelerated Approval with Restricted Distribution

2.9 Phase IV Studies/Post Marketing Surveillance

2.10 Benefit Analysis for Accelerated Approvals Versus Other Illnesses

2.11 Problems, Solutions, and Economic Incentives

2.12 Future Directions

References

Further Reading

Chapter 3: AIDS Clinical Trials Group (ACTG)

3.1 Introduction

3.2 A Brief Primer on HIV/AIDS

3.3 ACTG Overview

3.4 ACTG Scientific Activities

3.5 Development of Potent Antiretroviral Therapy (ART)

3.6 Expert Systems and Infrastructure

References

Chapter 4: Algorithm-Based Designs

4.1 Phase I Dose-Finding Studies

4.2 Accelerated Designs

4.3 Model-Based Approach in the Estimation of MTD

4.4 Exploring Algorithm-Based Designs with Prespecified Targeted Toxicity Levels

References

Chapter 5: Alpha-Spending Function

5.1 Introduction

5.2 Alpha Spending Function Motivation

5.3 The Alpha Spending Function

5.4 Application of the Alpha Spending Function

5.5 Confidence Intervals and Estimation

5.6 Trial Design

5.7 Conclusions

References

Further Reading

Chapter 6: Application of New Designs in Phase I Trials

6.1 Introduction

6.2 Objectives of a Phase I Trial

6.3 Standard Designs and Their Shortcomings

6.4 Some Novel Designs

6.5 Discussion

References

Further Reading

Chapter 7: ASCOT Trial

7.1 Introduction

7.2 Objectives

7.3 Study Design

7.4 Results

7.5 Discussion and Conclusions

References

Chapter 8: Benefit/Risk Assessment in Prevention Trials

8.1 Introduction

8.2 Types of B/RAs Performed in Prevention Trials

8.3 Alternative Structures of the Benefit/Risk Algorithm used in Prevention Trials

8.4 Methodological and Practical Issues with B/RA in Prevention Trials

References

Chapter 9: Biased Coin Randomization

9.1 Randomization Strategies for Overall Treatment Balance

9.2 The Biased Coin Randomization Procedure

9.3 Properties

9.4 Extensions to the Biased Coin Randomization

9.5 Adaptive Biased Coin Randomization

9.6 Urn Models

9.7 Treatment Balance for Covariates

9.8 Application of Biased Coin Designs to Response-Adaptive Randomization

References

Chapter 10: Biological Assay, Overview

10.1 Introduction

10.2 Direct Dilution Assays

10.3 Indirect Dilution Assays

10.4 Indirect Quantal Assays

10.5 Stochastic Approximation in Bioassay

10.6 Radioimmunoassay

10.7 Dosimetry and Bioassay

10.8 Semiparametrics in Bioassays

10.9 Nonparametrics in Bioassays

10.10 Bioavailability and Bioequivalence Models

10.11 Pharmacogenomics in Modern Bioassays

10.12 Complexities in Bioassay Modeling and Analysis

References

Further Reading

Chapter 11: Block Randomization

11.1 Introduction

11.2 Simple Randomization

11.3 Restricted Randomization Through the Use of Blocks

11.4 Schemes Using a Single Block for the Whole Trial

11.5 Use of Unequal and Variable Block Sizes

11.6 Inference and Analysis Following Blocked Randomization

11.7 Miscellaneous Topics Related to Blocked Randomization

References

Further Reading

Chapter 12: Censored Data

12.1 Introduction

12.2 Independent Censoring

12.3 Likelihoods: Noninformative Censoring

12.4 Other Kinds of Incomplete Observation

References

Chapter 13: Clinical Data Coordination

13.1 Introduction

13.2 Study Initiation

13.3 Study Conduct

13.4 Study Closure

13.5 Summary

References

Chapter 14: Clinical Data Management

14.1 Introduction

14.2 How Has Clinical Data Management Evolved?

14.3 Electronic Data Capture

14.4 Regulatory Involvement with Clinical Data Management

14.5 Professional Societies

14.6 Look to the Future

14.7 Conclusion

References

Chapter 15: Clinical Significance

15.1 Introduction

15.2 Historical Background

15.3 Article Outline

15.4 Design and Methodology

15.5 Examples

15.6 Recent Developments

15.7 Concluding Remarks

References

Chapter 16: Clinical Trial Misconduct

16.1 The Scope of this Article

16.2 Why Does Research Misconduct Matter?

16.3 Early Cases

16.4 Definition

16.5 Intent

16.6 What Scientific Misconduct was Not

16.7 The Process

16.8 The Past Decade

16.9 Lessons from the U.S. Experience

16.10 Outside the United States

16.11 Scientific Misconduct During Clinical Trials

16.12 Audit

16.13 Causes

16.14 Prevalence

16.15 Peer Review and Misconduct

16.16 Retractions

16.17 Prevention

References

Chapter 17: Clinical Trials, Early Cancer and Heart Disease

17.1 Introduction

17.2 Developments in Clinical Trials at the National Cancer Institute (NCI)

17.3 Developments in Clinical Trials at the National Heart, Lung, and Blood Institute (NHLBI)

References

Chapter 18: Cluster Randomization

18.1 Introduction

18.2 Examples of Cluster Randomization Trials

18.3 Principles of Experimental Design

18.4 Experimental and Quasi-Experimental Designs

18.5 The Effect of Failing to Replicate

18.6 Sample Size Estimation

18.7 Cluster Level Analyses

18.8 Individual Level Analyses

18.9 Incorporating Repeated Assessments

18.10 Study Reporting

18.11 Meta-Analysis

References

Chapter 19: Coherence in Phase I Clinical Trials

19.1 Introduction

19.2 Coherence: Definitions and Organization

19.3 Coherent Designs

19.4 Compatible Initial Design

19.5 Group Coherence

19.6 Real-Time Coherence

19.7 Discussion

References

Chapter 20: Compliance and Survival Analysis

20.1 Compliance: Cause and Effect

20.2 All-or-Nothing Compliance

20.3 More General Exposure Patterns

20.4 Other Structural Modeling Options

References

Chapter 21: Composite Endpoints in Clinical Trials

21.1 Introduction

21.2 The Rationale for Composite Endpoints

21.3 Formulation of Composite Endpoints

21.4 Examples

21.5 Interpreting Composite Endpoints

21.6 Conclusions

References

Chapter 22: Confounding

22.1 Introduction

22.2 Confounding as a Bias in Effect Estimation

22.3 Confounding and Noncollapsibility

22.4 Confounding in Experimental Design

References

Chapter 23: Control Groups

23.1 Introduction

23.2 History

23.3 Ethics

23.4 Types of Control Groups: Historical Controls

23.5 Types of Control Groups: Randomized Controls

23.6 Conclusion

References

Chapter 24: Coronary Drug Project

24.1 Introduction

24.2 Objectives

24.3 Study Design and Methods

24.4 Results

24.5 Conclusions and Lessons Learned

References

Further Reading

Chapter 25: Covariates

25.1 Universal Character of Covariates

25.2 Use of Covariates in Clinical Trials

25.3 Continuous Covariates: Categorization or Functional Form?

25.4 Reporting and Summary Assessment of Prognostic Markers

References

Chapter 26: Crossover Design

26.1 Introduction

26.2 The Two-Period, Two-Treatment Design

26.3 Higher Order Designs

26.4 Model-Based Analyses

References

Chapter 27: Crossover Trials

27.1 Introduction

27.2 2 × 2 Crossover Trial

27.3 Higher-Order Designs for Two Treatments

27.4 Designs for Three or More Treatments

27.5 Analysis of Continuous Data

27.6 Analysis of Discrete Data

27.7 Concluding Remarks

References

Chapter 28: Diagnostic Studies

28.1 Introduction

28.2 Diagnostic Studies

28.3 Reliability

28.4 Validity

References

Further Reading

Chapter 29: DNA Bank

29.1 Definition and Objectives of DNA Biobanks

29.2 Types of DNA Biobanks

29.3 Types of Samples Stored

29.4 Quality Assurance and Quality Control in DNA Biobanks

29.5 Ethical Issues

29.6 Current Biobank Initiatives

29.7 Conclusions

References

Chapter 30: Up-and-Down and Escalation Designs

30.1 Introduction

30.2 Up-and-Down Designs

30.3 Escalation Designs

30.4 Comparing U&D, Escalation and Model-Based Designs

References

Further Reading

Chapter 31: Dose Ranging Crossover Designs

31.1 Introduction

31.2 Titration Designs and Extension Studies

31.3 Randomized Designs

31.4 Discussion and Conclusion

References

Further Reading

Chapter 32: Flexible Designs

32.1 Introduction

32.2 The General Framework

32.3 Conditional Power and Sample Size Reassessment

32.4 Extending the Flexibility to the Choice of the Number of Stages

32.5 Selection of the Test Statistics

32.6 More General Adaptations and Multiple Hypotheses Testing

32.7 An Example

32.8 Conclusion

References

Chapter 33: Gene Therapy

33.1 Introduction

33.2 Requirements for Successful Therapeutic Intervention

33.3 Pre-Clinical Research

33.4 Translational Challenges of Gene Therapy Trials

33.6 Lessons Learned

33.7 The Way Forward

References

Further Reading

Chapter 34: Global Assessment Variables

34.1 Introduction

34.2 Scientific Questions for Multiple Outcomes

34.3 General Comments on the GST

34.4 Recoding Outcome Measures

34.5 Types of Global Statistical Tests (GSTs)

34.6 Other Considerations

34.7 Other Methods

34.8 Examples of the Application of GST

34.9 Conclusions

References

Chapter 35: Good Clinical Practice (GCP)

35.1 Introduction

35.2 Human Rights and Protections

35.3 Informed Consent

35.4 Investigational Protocol

35.5 Investigator’s Brochure

35.6 Investigational New Drug Application

35.7 Production of the Investigational Drug

35.8 Clinical Testing

35.9 Sponsors

35.10 Contract Research Organization

35.11 Monitors

35.12 Investigators

35.13 Documentation

35.14 Clinical Holds

35.15 Inspections/Audits

References

Further Reading

Chapter 36: Group-Randomized Trials

36.1 Introduction

36.2 Group-Randomized Trials in Context

36.3 The Development of Group- Randomized Trials in Public Health

36.4 The Range of GRTs in Public Health

36.5 Current Design and Analytic Practices in GRTs in Public Health

36.6 The Future of Group-Randomized Trials

36.7 Planning a New Group-Randomized Trial

References

Chapter 37: Group Sequential Designs

37.1 Introduction

37.2 Classical Designs

37.3 The α-Spending Function Approach

37.4 Point Estimates and Confidence Intervals

37.5 Supplements

References

Chapter 38: Hazard Ratio

38.1 Introduction

38.2 Definitions

38.3 Illustration of Hazard Rate, Hazard Ratio and Risk Ratio

38.4 Example on the Use and Usefulness of Hazard Ratios

38.5 Ad-hoc Estimator of the Hazard Ratio

38.6 Confidence Interval of the Ad-hoc Estimator

38.7 Ad-hoc Estimator Stratified for the Covariate Renal Function

38.8 Properties of the Ad-hoc Estimator

38.9 Class of Generalized Rank Estimators of the Hazard Ratio

38.10 Estimation of the Hazard Ratio with Cox’s Proportional Hazards Model

38.11 Discussion

Further Reading

References

Chapter 39: Large Simple Trials

39.1 Large, Simple Trials

39.2 Small but Clinically Important Objective

39.3 Eligibility

39.4 Randomized Assignment

39.5 Outcome Measures

39.6 Conclusions

References

Further Reading – Selected Examples of Large, Simple Trials

Chapter 40: Longitudinal Data

40.1 Definition

40.2 Longitudinal Data from Clinical Trials

40.3 Advantages

40.4 Challenges

40.5 Analysis of Longitudinal Data

References

Further Reading

Chapter 41: Maximum Duration and Information Trials

41.1 Introduction

41.2 Two Paradigms: Duration versus Information

41.3 Sequential Studies: Maximum Duration versus Information Trials

41.4 An Example of a Maximum Information Trial

References

Chapter 42: Missing Data

42.1 Introduction

42.2 Methods in Common Use

42.3 An Alternative Approach to Incomplete Data

42.4 Illustration: Orthodontic Growth Data

42.5 Inverse Probability Weighting

42.6 Multiple Imputation

42.7 Sensitivity Analysis

42.8 Conclusion

References

Chapter 43: Mother to Child Human Immunodeficiency Virus Transmission Trials

43.1 Introduction

43.2 The Pediatric Aids Clinical Trials Group 076 Trial

43.3 Results

43.4 The European Mode of Delivery Trial

43.5 The HIV Network for Prevention Trials 012 Trial

43.6 The Mashi Trial

References

Further Reading

Chapter 44: Multiple Testing in Clinical Trials

44.1 Introduction

44.2 Concepts of Error Rates

44.3 Union-Intersection Testing

44.4 Closed Testing

44.5 Partition Testing

References

Further Reading

Chapter 45: Multicenter Trials

45.1 Definitions

45.2 History

45.3 Examples

45.4 Organizational and Operational Features

45.5 Strengths

45.6 Counts

Readings

References

Chapter 46: Multiple Endpoints

46.1 Introduction

46.2 Multiple Testing Methods

46.3 Multivariate Global Tests

46.4 Conclusions

References

Chapter 47: Multiple Risk Factor Intervention Trial

47.1 Introduction

47.2 Trial Design

47.3 Trial Screening and Execution

47.4 Findings at the End of Intervention

47.5 Long-Term Follow-Up

47.6 Epidemiologic Findings from Long-Term Follow-up of 361,662 MRFIT Screenees

47.7 Conclusions

References

Further Reading

Chapter 48: N-of-1 Randomized Trials

48.1 Introduction

48.2 Goal of N-of-1 Studies

48.3 Requirements

48.4 Design Choices and Details for N-of-1 Studies

48.5 Statistical Issues

48.6 Other Issues

48.7 Conclusions

References

Chapter 49: Noninferiority Trial

49.1 Introduction

49.2 Essential Elements of Noninferiority Trial Design

49.3 Objectives of Noninferiority Trials

49.4 Measure of Treatment Effect

49.5 Noninferiority Margin

49.6 Statistical Testing for Noninferiority

49.7 Medication Nonadherence and Misclassification/Measurement Error

49.8 Testing Superiority and Noninferiority

49.9 Conclusion

References

Chapter 50: Nonrandomized Trials

50.1 Introduction

50.2 Randomized vs. Nonrandomized Clinical Trials

50.3 Control Groups in Nonrandomized Trials

50.4 Statistical Methods in Design and Analyses

50.5 Conclusion and Discussion

References

Chapter 51: Open-Labeled Trials

51.1 Introduction

51.2 The Importance of Blinding

51.3 Reasons Why Trials Might Have to be Open-Label

51.4 When Open-Label Trials Might be Desirable

51.5 Concluding Comments

References

Further Reading

Chapter 52: Optimizing Schedule of Administration in Phase I Clinical Trials

52.1 Introduction

52.2 Motivating Example

52.3 Design Issues

52.4 Trial Conduct

52.5 Extensions and Related Research

References

Chapter 53: Partially Balanced Designs

53.1 Introduction

53.2 Association Schemes

53.3 Partially Balanced Incomplete Block Designs

53.4 Generalizations of PBIBDs and Related Ideas

References

Chapter 54: Phase I/II Clinical Trials

54.1 Introduction

54.2 Traditional Approach

54.3 Recent Developments

54.4 Illustrations

References

Chapter 55: Phase II/III Trials

55.1 Introduction

55.2 Description and Legal Basis

55.3 Better Dose-Response Studies with Phase 2/3 Designs

55.4 Principles of Phase 2/3 Designs

55.5 Inferential Difficulties

55.6 Summary

References

Further Reading

Chapter 56: Phase I Trials

56.1 Introduction

56.2 Phase I in Healthy Volunteers

56.3 Phase I in Cancer Patients

56.4 Perspectives in the Future of Cancer Phase I Trials

56.5 Discussion

References

Chapter 57: Phase II Trials

57.1 Introduction

57.2 Proof-of-Concept (Phase IIa) Trials

57.3 Dose-Ranging (Phase IIb) Trials

57.4 Efficacy Endpoints

57.5 Oncology Phase II Trials

References

Further Reading

Chapter 58: Phase III Trials

58.1 Introduction

58.2 Research Methodology in Phase III

58.3 Type of Design

58.4 Discussion

References

Chapter 59: Phase IV Trials

59.1 Introduction

59.2 Definitions and Context

59.3 Different Purposes for Phase IV Trials

59.4 Essential and Desirable Features of Phase IV Trials

59.5 Examples of Phase IV Studies

59.6 Conclusion

References

Further Reading

Chapter 60: Phase I Trials in Oncology

60.1 Introduction

60.2 Dose-Limiting Toxicity

60.3 Starting Dose

60.4 Dose Level Selection

60.5 Study Design and General Considerations

60.6 Traditional, Standard, or 3 + 3 Design

60.7 Continual Reassessment Method and Other Designs that Target the MTD

60.8 Start-Up Rule

60.9 Phase I Trials with Long Follow-Up

60.10 Phase I Trials with Multiple Agents

60.11 Phase I Trials with the MTD Defined using Toxicity Grades

References

Further Reading

Chapter 61: Placebos

61.1 History of Placebo

61.2 Definitions

61.3 Magnitude of the Placebo Effect

61.4 Influences on the Placebo Effect

61.5 Ethics of Employing Placebo in Research

61.6 Guidelines for the Use of Placebos in Research

61.7 Innovations to Improve Research Involving Placebo

61.8 Summary

References

Chapter 62: Planning a Group-Randomized Trial

62.1 Introduction

62.2 The Research Question

62.3 The Research Team

62.4 The Research Design

62.5 Potential Design Problems and Methods to Avoid Them

62.6 Potential Analytic Problems and Methods to Avoid Them

62.7 Variables of Interest and Their Measures

62.8 The Intervention

62.9 Power

62.10 Summary

References

Chapter 63: Postmenopausal Estrogen/Progestin Interventions Trial (PEPI)

63.1 Introduction

63.2 Design and Objectives

63.3 Study Design

63.4 Outcomes

63.5 Results

63.6 Conclusions

References

Further Reading

Chapter 64: Preference Trials

64.1 Introduction

64.2 Potential Effects of Preference

64.3 The Patient Preference Design

64.4 Advantages and Disadvantages of the Patient Preference Design

64.5 Alternative Designs

64.6 Discussion

References

Further Reading

Chapter 65: Prevention Trials

65.1 Introduction

65.2 Role Among Possible Research Strategies

65.3 Prevention Trial Planning and Design

65.4 Conduct, Monitoring, and Analysis

References

Chapter 66: Primary Efficacy Endpoint

66.1 Defining the Primary Endpoint

66.2 Fairness of Endpoints

66.3 Specificity of the Primary Endpoint

66.4 Composite Primary Endpoints

66.5 Missing Primary Endpoint Data

66.6 Censored Primary Endpoints

66.7 Surrogate Primary Endpoints

66.8 Multiple Primary Endpoints

66.9 Secondary Endpoints

References

Further Reading

Chapter 67: Prognostic Variables in Clinical Trials

67.1 Introduction

67.2 A General Theory of Prognostic Variables

67.3 Valid Covariates and Recognizable Subsets

67.4 Stratified Randomization and Analysis

67.5 Statistical Importance of Prognostic Factors

References

Chapter 68: Randomization Procedures

68.1 Basics

68.2 General Classes of Randomization: Complete versus Imbalance-Restricted Procedures

68.3 Procedures for Imbalance-Restricted Randomization

68.4 Randomization-Based Analysis and the Validation Transformation

68.5 Conclusions

References

Chapter 69: Randomization Schedule

69.1 Introduction

69.2 Preparing the Schedule

69.3 Schedules for Open-Label Trials

69.4 Schedules to Mitigate Loss of Balance in Treatment Assignments Because of Incomplete Blocks

69.5 Issues Related to the use of Randomization Schedule

69.6 Summary

References

Further Reading

Chapter 70: Repeated Measurements

70.1 Introduction and Case Study

70.2 Linear Models for Gaussian Data

70.3 Models for Discrete Outcomes

70.4 Design Considerations

70.5 Concluding Remarks

References

Chapter 71: Simple Randomization

71.1 Introduction

71.2 Concept of Randomization

71.3 Why is Randomization Needed?

71.4 Methods: Simple Randomization

71.5 Advantages and Disadvantages of Randomization

71.6 Other Randomization Methods

71.7 Stratified Randomization

References

Further Reading

Chapter 72: Subgroups

72.1 Introduction

72.2 The General Problem

72.3 Definitions

72.4 Subgroup Effects and Interactions

72.5 Tests of Interactions and the Problem of Power

72.6 Subgroups and the Problem of Multiple Comparisons

72.7 Demographic Subgroups

72.8 Physiological Subgroups

72.9 Target Subgroups

72.10 Improper Subgroups

72.11 Summary

References

Chapter 73: Superiority Trials

73.1 Introduction

73.2 Clinicians Ask One-Sided Questions, and Want Immediate Answers

73.3 But Traditional Statistics Is Two-Sided

73.4 The Consequences of Two-Sided Answers to One-Sided Questions

73.5 The Fallacy of the “Negative” Trial

73.6 The Solution Lies in Employing One-Sided Statistics

73.7 Examples of Employing One-Sided Statistics

73.8 One-Sided Statistical Analyses Need to be Specified Ahead of Time

73.9 A Graphic Demonstration of Superiority and Noninferiority

73.10 How to Think about and Incorporate Minimally Important Differences

73.11 Incorporating Confidence Intervals for Treatment Effects

73.12 Why We Should Never Label an “Indeterminate” Trial Result as “Negative” or as Showing “No Effect”

73.13 How Does a Treatment Become “Established Effective Therapy”?

73.14 Most Trials are Too Small to Declare a Treatment “Established Effective Therapy”

73.15 How Do We Achieve a Superiority Result?

73.16 Superiority and Noninferiority Trials when Established Effective Therapy Already Exists

73.17 Exceptions to the Rule that It Is Always Unethical to Substitute Placebos for Established Effective Therapy

73.18 When a Promising New Treatment Might be Added to Established Effective Therapy

73.19 Using Placebos in a Trial Should Not Mean the Absence of Treatment

73.20 Demonstrating Trials of Promising New Treatments Against (or in Addition to) Established Effective Therapy

73.21 Why We Almost Never Find, and Rarely Seek, True “Equivalence”

73.22 The Graphical Demonstration of “Superiority” and “Noninferiority”

73.23 Completing the Circle: Converting One-Sided Clinical Thinking into One-Sided Statistical Analysis

73.24 A Final Note on Superiority and Noninferiority Trials of “Me-Too” Drugs

References

Further Reading

Chapter 74: Surrogate Endpoints

74.1 Introduction

74.2 Illustrations

74.3 Validation of Surrogates

74.4 Auxiliary Variables

74.5 Conclusions

References

Chapter 75: TNT Trial

75.1 Introduction

75.2 Objectives

75.3 Study Design

75.4 Results

75.5 Conclusions

References

Further Reading

Chapter 76: UGDP Trial

76.1 Introduction

76.2 Design and Chronology

76.3 Results

76.4 Conclusion and Discussion

References

Chapter 77: Women’s Health Initiative Hormone Therapy Trials

77.1 Introduction

77.2 Objectives

77.3 Study Design

77.4 Results

77.5 Conclusions

References

Chapter 78: Women’s Health Initiative Dietary Modification Trial: Update and Application of Biomarker Calibration to Self-Report Measures of Diet and Physical Activity

78.1 Rationale for Biomarker Calibration of Self-Report Measures of Diet

78.2 Nutrient Biomarker Study Energy and Protein Calibration

78.3 Measurement Error Properties of 4DFR, 24HR, and FFQ

78.4 Calibration of Self-Report Measures of Physical Activity

78.5 Psychosocial Measures and Biomarker-Calibrated Intake

78.6 Calibrated Energy, Protein, Protein Density, and Cardiovascular Disease Incidence

78.7 Diabetes and Calibrated Consumption

78.8 Cancer and Calibrated Intake

78.9 Associations Between Protein Intake, Frailty, and Renal Function

78.10 Summary and Future Directions

References

Index

Methods and Applications of Statistics in Clinical Trials

WILEY SERIES IN METHODS AND APPLICATIONS OF STATISTICS

Advisory Editor

N. BalakrishnanMcMaster University, Canada

The Wiley Series in Methods and Applications of Statistics is a unique grouping of research that features classic contributions from Wiley’s Encyclopedia of Statistical Sciences, Second Edition (ESS, 2e) alongside newly written articles that explore various problems of interest and their intrinsic connection to statistics. The goal of this collection is to encompass an encyclopedic scope of coverage within individual books that unify the most important and interesting applications of statistics within a specific field of study. Each book in the series successfully upholds the goals of ESS, 2e by combining established literature and newly developed contributions written by leading academics, researchers, and practitioners in a comprehensive and accessible format. The result is a succinct reference that unveils modern, cutting-edge approaches to acquiring, analyzing, and presenting data across diverse subject areas.

WILEY SERIES IN METHODS AND APPLICATIONS OF STATISTICS

Balakrishnan · Methods and Applications of Statistics in the Life and Health Sciences

Balakrishnan · Methods and Applications of Statistics in Business, Finance, and Management Science

Balakrishnan · Methods and Applications of Statistics in Engineering, Quality Control, and the Physical Sciences

Balakrishnan · Methods and Applications of Statistics in the Social and Behavioral Sciences

Balakrishnan · Methods and Applications of Statistics in the Atmospheric and Earth Sciences

Balakrishnan · Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs

Balakrishnan · Methods and Applications of Statistics in Clinical Trials, Volume 2: Planning, Analysis, and Inferential Methods

Copyright © 2014 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reserved.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Methods and applications of statistics in clinical trials / [edited by] N. Balakrishnan.    1 online resource. — (Methods and applications of statistics)  Includes bibliographical references and index.  Description based on print version record and CIP data provided by publisher; resource not viewed.  ISBN 978-1-118-59591-6 (ePub) — ISBN 978-1-118-59592-3 (Adobe PDF) — ISBN 978-1-118-59596-1 (ePub) — ISBN 978-1-118-59597-8 (Adobe PDF) — ISBN 978-1-118-30473-0 (cloth)  I. Balakrishnan, N., 1956- editor of compilation.  [DNLM: 1. Clinical Trials as Topic. 2. Statistics as Topic. QV 771.4]  R853.C55  610.72’4—dc23  2013035130

Contributors

Ian E. Alexander, Gene Therapy Research Unit of the Children’s Medical Research Institute and The Children’s Hospital at Westmead and University of Sydney, Discipline of Paediatrics and Child Health, Westmead, Australia, [email protected]

Janet W. Andersen, Harvard School of Public Health, Boston, MA, [email protected]

Per Kragh Andersen, University of Copenhagen, Copenhagen, Denmark, [email protected]

Andrew L. Avins, University of California, San Francisco, CA

Rosemary A. Bailey

Peter Bauer, *Deceased, 2002

David B. Barr, Kendle International, Cincinnati, OH

Shari S. Bassuk, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, [email protected]

Jeannette M. Beasley, Albert Einstein College of Medicine, Bronx, NY, [email protected]

Vance W. Berger, National Cancer Institute, Bethesda, MD¡[email protected]

Werner Brannath, University of Bremen, Bremen, Germany, [email protected]

Michael Branson, Norvatis Pharma AG, Basel, Switzerland, [email protected]

Thomas Braun, University of Michigan Arm Arbor, MI, [email protected]

Frank Bretz, Norvatis Pharma AG, Basel, Switzerland, [email protected]

Louis Cabanilla, Tufts University Center for the Study of Drug Development Boston, MA

Marion K. Campbell, University of Aberdeen, Aberdeen, UK, [email protected]

Paul L. Canner, Maryland Medical Research Institute, Baltimore, MD, [email protected].

Joseph C. Cappelleri, Pfizer, Inc., Global Research & Development, Groton, CT, [email protected]

Rick Chappell, University of Wisconsin, Madison, WI, [email protected]

Ying-Kuen K. Cheung, Columbia University, New York, NY, [email protected]

Sylvie Chevret, Inserm, Paris, France, [email protected]

Joseph P. Costantino, University of Pittsburgh, Pittsburgh, PA, [email protected]

Simon Day, Roche Products Ltd., Welwyn Garden City, UK, [email protected]

Victor DeGruttola, Harvard School of Public Health, Boston, MA, [email protected]

David L. DeMets, University of Wisconsin-Madison, Madison, WI, [email protected]

Chongzhi Di, Fred Hutchinson Cancer Research Center, Seattle, WA, [email protected]

Alexei Dmitrienko, Eli Lilly and Company Indianapolis, Indianapolis, IN

Allan Donner, University of Western Ontario, London, ON, Canada, [email protected]

Therese Dupin-Spriet

Peter J. Dyck, Mayo Clinic College of Medicine, Rochester, MN, [email protected]

Lynn E. Eberly, University of Minnesota, Minneapolis, MN, [email protected]

Thomas R. Fleming, University of Washington, Seattle, DC, [email protected]

Dean A. Follmann, National Institute of Allergy and Infectious Diseases, Bethesda, MD, [email protected]

Mary A. Foulkes, The George Washington University, Washington, DC, [email protected]

Elizabeth Garrett-Mayer, Johns Hopkins University, Baltimore, MD, [email protected]

Edmund A. Gehan, Georgetown University Medical Center, Washington, DC, [email protected]

Samantha L. Ginn, Gene Therapy Research Unit of the Children’s Medical Research Institute and The Children’s Hospital at Westmead and The University of Sydney, Sydney Medical School, Syndey, Australia, [email protected]

Els Goetghebeur, Ghent University, Ghent, Belgium, [email protected]

Charles H. Goldsmith

Erika Graf, Clinical Trials Unit, University Medical Center Freiburg, Freiburg, Germany, [email protected]

William C. Grant

Stephanie Green, Clinical Biostatistics, Pfizer, Inc., New London, CT

Sander Greenland, University of California, Los Angeles, CA, [email protected]

Scott M. Grundy, University of Texas Southwestern Medical Center, Dallas, TX, [email protected]

Weili He, Merck & Co., Inc., Rahway, NJ, [email protected]

Anne Holbrook, McMaster University, Hamilton, ON, Canada, [email protected]

Jason C. Hsu, Ohio State University, Columbus, OH, [email protected]

Peng Huang, John Hopkins University, Baltimore, MD, [email protected]

Ying Huang, Fred Hutchinson Cancer Research Center, Seattle, WA, [email protected]

H. M. James Hung, U.S. Food and Drug Administration Silver Spring, MD, [email protected]

Anastasia Ivanova, University of North Carolina at Chapel Hill, NC, [email protected]

Sudha K. Iyengar, Case Western Reserve University, Cleveland, OH, [email protected]

Byron Jones, Pfizer Pharmaceuticals, Sandwich, UK

Celia C. Kamath, Health Sciences Research, Mayo Clinic, Rochester, MN, [email protected]

Oliver Keene, GlaxoSmithKline Research and Development, Stockley Park, UK, [email protected]

Michael G. Kenward, London School of Hygiene and Tropical Medicine, London, UK, [email protected]

Kyungmann Kim, University of Wisconsin, Madison, WI, [email protected]

Cheryl Kious, Quintiles Transnational Corporation, Durham, NC

Neil Klar, Cancer Care Ontario, Toronto, ON, Canada, [email protected]

Lewis H. Kuller, University of Pittsburgh, Pittsburgh, PA, [email protected]

Olga M. Kuznetsova, Merck & Co., Inc. Rahway, NJ, [email protected]

K. K. Gordon Lan, Johnson & Johnson, Raritan, NJ, [email protected]

Robert D. Langer, University of Nevada School of Medicine, Las Vegas, NV, [email protected]

John C. Larosa, State University of New York Health, Science Center, Brooklyn, NY

Emmanuel Lesaffre, Catholic University of Leuven, Leuven, Belgium, [email protected]

Mova Leung, Carlo Fidani Peel Regional Cancer Centre, Credit Valley Hospital, Mississauga, ON, Canada

Hung-I Li

Wenjun Li, University of Massachusetts Medical School, Worcester, MA, [email protected]

Zhengqing Li, Global Biometric Science, Bristol-Myers Squibb Company, Wallingford, CT

Jun Liu, Columbia University, New York, NY, [email protected]

Qing Liu, Johnson and Johnson Pharmaceutical, Research and Development, Raritan, NJ, [email protected]

Craig Mallinckrodt, Eli Lilly and Company, Indianapolis, IN, [email protected]

JoAnn E. Manson, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, [email protected]

Ruth McBride, Axio Research, Seattle, WA, [email protected]

Damian McEntegart, ClinPhone Group Ltd., Nottingham, UK, [email protected]

Jesper Mehlsen, Frederiksberg Hospital—Clinical, Physiology & Nuclear Medicine, Frederiksberg, Denmark, [email protected]

Curtis L. Meinert, The Johns Hopkins University, Bloomberg School of Public Health, Department of Epidemiology, Baltimore, MD, [email protected]

Christopher P. Milne, Tufts University Center for the Study of Drug Development Boston, MA

Geert Molenberghs, Hasselt University, Diepenbeek, Belgium, [email protected]

Yasmin Mossavar-Rahmani, Albert Einstein College of Medicine, Bronx, NY, [email protected]

David M. Murray, Ohio State University, Columbus, OH, [email protected]

James D. Neaton, University of Minnesota, Minneapolis, MN, [email protected]

John Neuhaus, University of California, San Francisco, CA, [email protected]

Marian L. Neuhouser, Fred Hutchinson Cancer Research Center, Seattle, WA, [email protected]

Robert Newcomb, Cardiff University, Cardiff, Wales, UK, [email protected]

Peter C. O’Brien, Mayo Clinic College of Medicine, Rochester, MN

Robert O’Neill, U.S. Food and Drug Administration Silver Spring, MD, [email protected].

Assaf P. Oron, Seattle Children’s Research Institute, Seattle, WA, [email protected]

Scott D. Patterson, Wyeth Research & Development, Collegeville, PA, [email protected]

Inna T. Perevozskaya, Pfizer Pharmaceuticals, Philadelphia, PA [email protected]

Martin Posch, Medical University of Vienna, Vienna, Austria, [email protected]

Ross L. Prentice, Fred Hutchinson Cancer Research Center, Seattle, WA, [email protected]

David L. Sackett, Trout Research & Education Centre, Ontario, Canada, [email protected]

A. J. Sankoh, National Cancer Institute, Bethesda, MD

Willi Sauerbrei, University Medical Center Freiburg, Freiburg, Germany, [email protected]

Claudia Schmoor, Clinical Trials Unit, University Medical Center Freiburg, Freiburg, Germany, [email protected]

Marvin A. Schneiderman *Deceased, April 1997

Carsten Schwenke, Schwenke Consulting: Strategies and Solutions in Statistics, Berlin, Germany

Pranab K. Sen, The University of North Carolina at Chapel Hill, Chapel Hill, NC, [email protected]

David E. Shapiro, Harvard School of Public Health, Boston, MA, [email protected]

Pamela Shaw, National Institute of Health, Bethesda, MD, [email protected]

Theru A. Sivakumaran, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, [email protected]

Jeffrey A. Sloan, Health Sciences Research, Mayo Clinic, Rochester, MN, [email protected]

Mike D. Smith, Pfizer Global Research & Development, New London, CT

Jeremiah Stamler, Northwestern University, Chicago, IL, [email protected]

Peter F. Thall, M. D. Anderson Cancer Center Houston, TX, [email protected]

Barbara C. Tilley, Medical University of South Carolina, Charleston, SC, [email protected]

Lesley F. Tinker, Fred Hutchinson Cancer Research Center, Seattle, WA, [email protected]

David J. Torgerson, University of York, York, UK, [email protected]

Geert Verbeke, Catholic University of Leuven, Leuven, Belgium, [email protected]

Sue-J. Wang, U.S. Food and Drug Administration Silver Spring, MD, [email protected]

Gemot Wassmer, University of Cologne, Cologne, Germany, [email protected]

David D. Waters, San Francisco General Hospital, San Francisco, CA, [email protected]

Janet Wittes, Statistics Collaborative, Inc., Washington, DC, [email protected]

Névine Zariffa, GlaxoSmithKline, Philadelphia, PA

Cheng Zheng, University of Washington, Seattle, WA, [email protected]

Sarah Zohar, Saint-Louis Hospital, Paris, France

Preface

Planning, developing, and implementing clinical trials, have become an important and integral part of life. More and more efforts and care go into conducting various clinical trials as they have been responsible in making key advances in medicine and treatments to different illnesses. Today, clinical trials have become mandatory in the development and evaluation of modern drugs and in identifying the association of risk factors to diseases. Due to the complexity of various issues surrounding clinical trials, regulatory agencies oversee their approval and also ensure impartial review. The main purpose of this two-volume handbook is to provide a detailed exposition of historical developments and also to highlight modern advances on methods and analysis for clinical trials.

It is important to mention that the four-volume Wiley Encyclopedia of Clinical Trials served as a basis for this handbook. While many pertinent entries from this Encyclopedia have been included here, a number of them have been updated to reflect recent developments on their topics. Some new articles detailing modern advances in statistical methods in clinical trials and their applications have also been included.

A volume of this size and nature cannot be successfully completed without the co-operation and support of the contributing authors, and my sincere thanks and gratitude go to all of them. Thanks are also due to Mr. Steve Quigley and Ms. Sari Friedman (of John Wiley & Sons, Inc.) for their keen interest in this project from day one, as well as for their support and constant encouragement (and, of course, occasional nudges, too) throughout the course of this project. Careful and diligent work of Mrs. Debbie Iscoe in the typesetting of this volume and of Angioline Loredo at the production state, is gratefully acknowledged. Partial financial support of the Natural Sciences and Engineering Research Council of Canada also assisted in the preparation of this handbook, and this support is much appreciated.

This is the sixth in a series of handbooks on methods and applications of statistics. While the first handbook has focused on life and health sciences, the second handbook has focused on business, finance, and management sciences, the third has focused on engineering, quality control, and physical sciences, the fourth has focused on behavioral and social sciences, and the fifth has focused on atmospheric and earth sciences, the present handbook has concentrated on methods and applications of statistics to clinical trials. This is the first of two volumes describing in detail, statistical developments concerning clinical trials.

It is my sincere hope that this handbook and the others in the series will become basic reference resources for those involved in these fields of research!

PROF. N. BALAKRISHNANMcMASTER UNIVERSITY

Hamilton, CanadaDecember 2013

Chapter 1

Absolute Risk Reduction

Robert Newcomb

1.1 Introduction

Many response variables in clinical trials are binary: the treatment was successful or unsuccessful; the adverse effect did or did not occur. Binary variables are summarized by proportions, which may be compared between different arms of a study by calculating either an absolute difference of proportions or a relative measure, the relative risk or the odds ratio. In this article we consider several point and interval estimates for the absolute difference between two proportions, for both unpaired and paired study designs. The simplest methods encounter problems when numerators or denominators are small; accordingly, better methods are introduced. Because confidence interval methods for differences of proportions are derived from related methods for the simpler case of the single proportion, which itself can also be of interest in a clinical trial, this case is also considered in some depth. Illustrative examples relating to data from two clinical trials are shown.

1.2 Preliminary Issues

In most clinical trials, the unit of data is the individual, and statistical analyses for efficacy and safety outcomes compare responses between the two (or more) treatment groups. When subjects are randomized between these groups, responses of subjects in one group are independent of those in the other group. This leads to unpaired analyses. Crossover and split-unit designs require paired analyses. These have many features in common with the unpaired analyses and will be described in the final section.

Several effect size measures are widely used for comparison of two independent proportions:

Difference of proportions p1 − p2

Ratio of proportions (risk ratio or relative risk) p1/p2

Odds ratio (p1/(1 − p1))/(p2/(1 − p2))

In this article we consider in particular the difference between two proportions, p1 − p2, as a measure of effect size. This is variously referred to as the absolute risk reduction, risk difference, or success rate difference. Other articles in this work describe the risk ratio or relative risk and the odds ratio. We consider both point and interval estimates, in recognition that “confidence intervals convey information about magnitude and precision of effect simultaneously, keeping these two aspects of measurement closely linked” [1]. In the clinical trial context, a difference between two proportions is often referred to as an absolute risk reduction. However, it should be borne in mind that any term that includes the word “reduction” really presupposes that the direction of the difference will be a reduction in risk—such terminology becomes awkward when the anticipated benefit does not materialize, including the nonsignificant case when the confidence interval for the difference extends beyond the null hypothesis value of zero. The same applies to the relative risk reduction, 1 − p1/p2. Whenever results are presented, it is vitally important that the direction of the observed difference should be made unequivocally clear. Moreover, sometimes confusing labels are used, which might be interpreted to mean something other than p1 − p2; for example, Hashemi et al. [2] refer to p1 − p2 as attributable risk. It is also vital to distinguish between relative and absolute risk reduction.

In clinical trials, as in other prospective and cross-sectional designs already described, each of the three quantities we have discussed may validly be used as a measure of effect size. The risk difference and risk ratio compare two proportions from different perspectives. A halving of risk will have much greater population impact for a common outcome than for an infrequent one. Schechtman [3] recommends that both a relative and an absolute measure should always be reported, with appropriate confidence intervals.

The odds ratio is discussed at length by Agresti [4]. It is widely regarded as having a special preferred status on account of its role in retrospective case-control studies and in logistic regression and meta-analysis. Nevertheless, it should not be regarded as having gold standard status as a measure of effect size for the 2 × 2 table [3,5].

1.3 Point and Interval Estimates for a Single Proportion

Before considering the difference between two independent proportions in detail, we first consider some of the issues that arise in relation to the fundamental task of estimating a single proportion. These issues have repercussions for the comparison of proportions because confidence interval methods for p1 − p2 are generally based closely on those for proportions. The single proportion is also relevant to clinical trials in its own right. For example, in a clinical trial comparing surgical versus conservative management, we would be concerned with estimating the incidence of a particular complication of surgery such as postoperative bleeding, even though there is no question of obtaining a contrasting value in the conservative group or of formally comparing these.

Unfortunately, confidence intervals for proportions and their differences do not achieve their nominal coverage properties. This is because the sample space is discrete and bounded. The Wald method for the single proportion has three unfavorable properties [6–9]. These can all be traced to the interval’s simple symmetry about the empirical estimate.

The achieved coverage is much lower than the nominal value. For some values of π, the achieved coverage probability is close to zero.

The noncoverage probabilities in the two tails are very different. The location of the interval is too distal—too far out from the center of symmetry of the scale, . The noncoverage of the interval is predominantly mesial.

Many improved methods for confidence intervals for proportions have been developed. The properties of these methods are evaluated by choosing suitable parameter space points (here, combinations of n and π), using these to generate large numbers of simulated random samples, and recording how often the resulting confidence interval includes the true value π. The resulting coverage probabilities are then summarized by calculating the mean coverage and minimum coverage across the simulated datasets.

Generally, the improved methods obviate the boundary violation problem, and improve coverage and location. The most widely researched options are as follows.

A continuity correction may be incorporated: p ± {z√(pq/n) + 1/(2n)}. This certainly improves coverage and obviates zero-width intervals but increases the incidence of boundary overflow.

It is easily demonstrated [7] that the resulting interval is symmetrical on the logit scale—the other natural scale for proportions—by considering the product of the two roots for π, and likewise for 1 − π. The resulting interval is boundary respecting and has appropriate mean coverage. In contrast to the Wald interval, location is rather too mesial.

Alternatively, the Bayesian approach described elsewhere in this work may be used. The resulting intervals are best referred to as credible intervals, in recognition that the interpretation is slightly different from that of frequentist confidence intervals such as those previously described.

Bayesian inference starts with a prior distribution for the parameter of interest, in this instance the proportion π. This is then combined with the likelihood function comprising the evidence from the sample to form a posterior distribution that represents beliefs about the parameter after the data have been obtained. When a conjugate prior is chosen from the beta distribution family, the posterior distribution takes a relatively simple form: it is also a beta distribution. If substantial information about π exists, an informative prior may be chosen to encapsulate this information.

More often, an uninformative prior is used. The simplest is the uniform prior B(1,1), which assumes that all possible values of π between 0 and 1 start off equally likely. An alternative uninformative prior with some advantages is the Jeffreys prior B(, ). Both are diffuse priors, which spread the probability thinly across the whole range of possible values from 0 to 1.

The resulting posterior distribution may be displayed graphically, or may be summarized by salient summary statistics such as the posterior mean and median and selected centiles. The 2 1/2 and 97 1/2 centiles of the posterior distribution delimit the tail-based 95% credible interval. Alternatively, a highest posterior density interval may be reported. The tail-based interval is considered preferable because it produces equivalent results when a transformed scale (e.g., logit) is used [11].

These Bayesian intervals perform well in a frequentist sense [12]. Hence, it is now appropriate to regard them as confidence interval methods in their own right, with theoretical justification in the Bayesian paradigm but empirical validation from a frequentist standpoint. They may thus be termed beta intervals. They are readily calculated using software for the incomplete beta function, which is included in statistical packages and also spreadsheet software such as Microsoft Excel. As such, they should now be regarded as computationally of “closed form,” though less transparent than Wald methods.

Many statisticians consider that a coverage level should represent minimum, not average, coverage. The Clopper-Pearson “exact” or tail-based method [13] achieves this, at the cost of being excessively conservative; intervals are unnecessarily wide. There is a trade-off between coverage and width; it is always possible to increase coverage by widening intervals, and the aim is to attain good coverage without excessive width. A variant on the “exact” method involving a mid-P accumulation of tail probabilities [14,15] aligns mean coverage closely with the nominal 1 − α. Both methods have appropriate location. The Clopper-Pearson interval, but not the mid-P one, is readily programmed as a beta interval, of similar form to Bayes intervals. A variety of shortened intervals have also been developed that maintain minimum coverage but substantially shrink interval length [16,17]. Shortened intervals are much more complex, both computationally and conceptually. They also have the disadvantage that what is optimized is the interval, not the lower and upper limits separately; consequently, they are unsuitable when interest centers on one of the limits rather than the other.

Numerical examples illustrating these calculations are based on some results from a very small randomized phase II clinical trial performed by the Eastern Cooperative Oncology Group [18]. Table 1 shows the results for two outcomes, treatment success defined as shrinkage of the tumor by 50% or more, and life-threatening treatment toxicity, for the two treatment groups A and B.

Table 1: Some Results from a Very Small Randomized Phase II Clinical Trial Performed by the Eastern Cooperative Oncology Group

Treatment A

Treatment B

Number of patients

14

11

Number with successful outcome: tumor shrinkage by ≥50%

0

0

Number with life-threatening treatment toxicity

2

1

Source: Parzen et al. J Comput Graph Stat. 2002: 11; 420–436.

Table 2 shows 95% confidence intervals for both outcomes for treatment A. These examples show how Wald and derived intervals often produce inappropriate limits (see asterisks) in boundary and near-boundary cases.

Table 2: 95% Confidence Intervals for Proportions of Patients with Successful Outcome and with Life-Threatening Toxicity on Treatment A in the Eastern Cooperative Oncology Group Trial

Outcome

Successful Tumor Shrinkage

Life-Threatening Toxicity

Empirical estimate

0

0.1429

Wald interval

0 to 0*

<0* to 0.3262

Wald interval with continuity correction

<0* to 0.0357

<0* to 0.3619

Wilson score interval

0 to 0.2153

0.0401 to 0.3994

Agresti-Coull shrinkage estimate

0.1111

0.2222

Agresti-Coull interval

<0* to 0.2563

0.0302 to 0.4143

Bayes interval,

B

(1,1) prior

0 to 0.2180

0.0433 to 0.4046

Bayes interval,

B

(, ) prior

0 to 0.1616

0.0309 to 0.3849

Clopper-Pearson “exact” interval

0 to 0.2316

0.0178 to 0.4281

Mid-

P

interval

0 to 0.1926

0.0247 to 0.3974

Note: Asterisks denote boundary violations.

Source: Parzen et al. J Comput Graph Stat. 2002: 11; 420–436.

1.4 An Unpaired Difference of Proportions

We return to the unpaired difference case. As described elsewhere in this work, hypothesis testing for the comparison of two proportions takes a quite different form according to whether the objective of the trial is to ascertain difference or equivalence. When we report the contrast between two proportions with an appropriately constructed confidence interval, this issue is taken into account only when we come to interpret the calculated point and interval estimates. In this respect, in comparison with hypothesis testing, the confidence interval approach leads to much simpler, more flexible patterns of inference.

The Wald interval is calculated as p1 − p2 ± z√(p1q1/n1 + p2q2/n2). It has poor mean and minimum coverage and fails to produce an interval when both p1 and p2 are 0 or 1. Overshoot can occur when one proportion is close to 1 and the other is close to 0, but this situation is expected to occur infrequently in practice. Use of a continuity correction improves mean coverage, but minimum coverage remains low.

Some of the better methods substitute the profile estimate γδ, which is the maximum likelihood estimate of γ conditional on a hypothesized value of δ. These include score-type asymptotic intervals developed by Mee [19] and Miettinen and Nurminen [20]. Newcombe [21] developed tail-based exact and mid-P intervals involving substitution of the profile estimate.

All these intervals are boundary respecting. The “exact” method aligns the minimum coverage quite well with the nominal 1 − α; the others align mean coverage well with 1 − α, at the expense of fairly complex iterative calculation.

Bayesian intervals for p1 − p2 and other comparative measures may be constructed [2,11], but they are computationally much more complex than in the single proportion case, requiring use of numerical integration or computer-intensive methodology such as Markov chain Monte Carlo (MCMC) methods. It may be more appropriate to incorporate a prior for p1 − p2 itself rather than independent priors for p1 and p2 [22]. The Bayesian formulation is readily adapted to incorporate functional constraints such as δ ≥ 0 [22]. Walters [23] and Agresti and Min [11] have shown that Bayes intervals for p1 − p2 with uninformative beta priors have favorable frequentist properties.

Two computationally simpler, effective approaches have been developed. Newcombe [21] also formulated square-and-add intervals for differences of proportions. The concept is a very simple one. Assuming independence, the variance of a difference between two quantities is the sum of their variances. In other words, standard errors “square and add”—they combine in the same way that differences in x and in y coordinates combine to give the Euclidean distance along the diagonal, as in Pythagoras’ theorem. This is precisely how the Wald interval for p1 − p2 is constructed. The same principle may be applied starting with other, better intervals for p1 and p2 separately. The Wilson score interval is a natural choice as it already involves square roots, though squaring and adding would work equally effectively starting with, for instance, tail-based [24] or Bayes intervals. It is easily demonstrated that the square-and-add process preserves the property of respecting boundaries.

This easily computed interval aligns mean coverage closely with the nominal 1 − α. A continuity correction is readily incorporated, resulting in more conservative coverage. Both intervals tend to be more mesially positioned than the γδ-based intervals discussed previously.

The square-and-add approach may be applied a second time to obtain a confidence interval for a difference between differences of proportions [25]; this is the linear scale analogue of assessing an interaction effect in logistic regression.

Another simple approach that is a great improvement over the Wald method is the pseudo-frequency method [26,27]. A pseudo-frequency is added to each of the four cells of the 2 × 2 table, resulting in the shrinkage estimator (r1 + )/(n1 + 2) − (r2 + )/(n2 + 2).

The Wald formula then produces the limits

where

Agresti and Caffo [27] evaluated the effect of choosing different values of , and they reported that adding 1 to each cell is optimal here. So here, just as for the single proportion case, in total four pseudo-observations are added. This approach also aligns mean coverage effectively with 1 − α. Interval location is rather too mesial, very similar to that of the square-and-add method. Zero-width intervals cannot occur. Boundary violation is not ruled out but is expected to be infrequent.

Table 3 shows 95% confidence intervals calculated by these methods, comparing treatments A and B in the ECOG trial [18].

Table 3: 95% Confidence Intervals for Differences in Proportions of Patients with Successful Outcome and with Life-Threatening Toxicity between Treatments A and B in the Eastern Cooperative Oncology Group Trial

Outcome

Successful Tumor Shrinkage

Life-Threatening Toxicity

Empirical estimate

0

0.0519

Wald interval

0* to 0*

-0.1980 to 0.3019

Mee interval

-0.2588 to 0.2153

-0.2619 to 0.3312

Miettinen-Nurminen interval

-0.2667 to 0.2223

-0.2693 to 0.3374

Tail-based “exact” interval

-0.2849 to 0.2316

-0.2721 to 0.3514

Tail-based mid-

P

interval

-0.2384 to 0.1926

-0.2539 to 0.3352

Bayes interval, B(1,1) priors for

p

1

and

p

2

-0.2198 to 0.1685

-0.2432 to 0.2986

Bayes interval, B(, ) priors for

p

1

and

p

2

-0.1768 to 0.1361

-0.2288 to 0.3008

Square-and-add Wilson interval

-0.2588 to 0.2153

-0.2524 to 0.3192

Agresti-Caffo shrinkage estimate

-0.0144

0.0337

Agresti-Caffo interval

-0.2016 to 0.1728

-0.2403 to 0.3076

Note: Asterisks denote boundary violations.

Source: Parzen et al. J Comput Graph Stat. 2002: 11; 420–436.

1.5 Number Needed to Treat

Accordingly, it seems preferable to report absolute risk reductions in percentage rather than reciprocal form. The most appropriate uses of the NNT are in giving simple bottomline figures to patients (in which situation, usually only the point estimate would be given), and in labeling a secondary axis on a graph.

1.6 A Paired Difference of Proportions

Crossover and split-unit trial designs lead to paired analyses. Regimes that aim to produce a cure are generally not suitable for evaluation in these designs, because in the event that a treatment is effective, there would be a carryover effect into the next treatment period. For this reason, these designs tend to be used for evaluation of regimes that seek to control symptomatology, and thus most often give rise to continuous outcome measures. Examples of paired analyses of binary data in clinical trials include comparisons of different antinauseant regimes administered in randomized order during different cycles of chemotherapy, comparisons of treatments for headache pain, and split-unit studies in ophthalmology and dermatology. Results can be reported in either risk difference or NNT form, though the latter appears not to be frequently used in this context. Other examples in settings other than clinical trials include longitudinal comparison of oral carriage of an organism before and after third molar extraction, and twin studies.

Let a, b, c, and d denote the four cells of the paired contingency table. Here, b and c are the discordant cells, and interest centers on the difference of marginals:

Hypothesis testing is most commonly performed using the McNemar approach [32], using either an asymptotic test statistic expressed as z or chi-square, or an aggregated tail probability. In both situations, inference is conditional on the total number of discordant pairs, b + c.

Newcombe [33] reviewed confidence interval methods for the paired difference case. Many of these are closely analogous to unpaired methods. The Wald interval performs poorly. So does a conditional approach, based on an interval for the simple proportion b/(b + c). Exact and tail-based profile methods perform well; although, as before, these are computationally complex. A closed-form square-and-add approach, modified to take account of the nonindependence, also aligns mean coverage with 1 − α, provided that a novel form of continuity correction is incorporated.