E-Book
119,99 €

Analysis of Biomarker Data E-Book

Stephen W. Looney

0,0

119,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

A “how to” guide for applying statistical methods to biomarker data analysis

Presenting a solid foundation for the statistical methods that are used to analyze biomarker data, Analysis of Biomarker Data: A Practical Guide features preferred techniques for biomarker validation. The authors provide descriptions of select elementary statistical methods that are traditionally used to analyze biomarker data with a focus on the proper application of each method, including necessary assumptions, software recommendations, and proper interpretation of computer output. In addition, the book discusses frequently encountered challenges in analyzing biomarker data and how to deal with them, methods for the quality assessment of biomarkers, and biomarker study designs.

Covering a broad range of statistical methods that have been used to analyze biomarker data in published research studies, Analysis of Biomarker Data: A Practical Guide also features:

A greater emphasis on the application of methods as opposed to the underlying statistical and mathematical theory
The use of SAS®, R, and other software throughout to illustrate the presented calculations for each example
Numerous exercises based on real-world data as well as solutions to the problems to aid in reader comprehension
The principles of good research study design and the methods for assessing the quality of a newly proposed biomarker
A companion website that includes a software appendix with multiple types of software and complete data sets from the book’s examples

Analysis of Biomarker Data: A Practical Guide is an ideal upper-undergraduate and graduate-level textbook for courses in the biological or environmental sciences. An excellent reference for statisticians who routinely analyze and interpret biomarker data, the book is also useful for researchers who wish to perform their own analyses of biomarker data, such as toxicologists, pharmacologists, epidemiologists, environmental and clinical laboratory scientists, and other professionals in the health and environmental sciences.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 724

Veröffentlichungsjahr: 2015

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

ANALYSIS OF BIOMARKER DATA

A Practical Guide

STEPHEN W. LOONEY

Department of Biostatistics and Epidemiology Georgia Regents University Augusta, Georgia

JOSEPH L. HAGAN

Texas Children's Hospital Houston, Texas

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data

Looney, Stephen W., author. Analysis of biomarker data : a practical guide / Stephen W. Looney, Joseph L. Hagan. p. ; cm. Includes bibliographical references and index. ISBN 978-1-118-02755-4 (cloth) I. Hagan, Joseph L., author. II. Title. [DNLM: 1. Biological Markers–analysis. 2. Data Interpretation, Statistical. 3. Models, Statistical. QW 541] R857.B54 610.28′4–dc23

2014024697

To my students –Stephen W. Looney

To S.L., my mentor and friend –Joseph L. Hagan

Preface

Acknowledgements

Chapter 1: Introduction

1.1 What is a Biomarker?

1.2 Biomarkers Versus Surrogate Endpoints

1.3 Organization of this Book

Chapter 2: Designing Biomarker Studies

2.1 Introduction

2.2 Designing the Study

2.3 Designing the Analysis

2.4 Presenting Statistical Results

Problems

Chapter 3: Elementary Statistical Methods for Analyzing Biomarker Data

3.1 Introduction

3.2 Graphical and Tabular Summaries

3.3 Descriptive Statistics

3.4 Describing the Shape of Distributions

3.5 Sampling Distributions

3.6 Introduction to Statistical Inference

3.7 Comparing Means Across Groups

3.8 Correlation Analysis

3.9 Regression Analysis

3.10 Analyzing Cross-Classified Data

Problems

Chapter 4: Frequently Encountered Challenges in Analyzing Biomarker Data and how to Deal with Them

4.1 Introduction

4.2 Non-Normally Distributed Data

4.3 Heterogeneity of Variance

4.4 Dependent Groups

4.5 Correlated Outcomes

4.6 Clustered Data

4.7 Outliers

4.8 Limits of Detection and Non-Detected Observations

4.9 The Analysis of Cross-Classified Categorical Data

Problems

Chapter 5: Validation of Biomarkers

5.1 Overview of Methods for Assessing Characteristics of Biomarkers

5.2 General Description of Measures of Agreement

5.3 Assessing Reliability of a Biomarker

5.4 Assessing Validity

Problems

References

Solutions to Problems

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Index

End User License Agreement

List of Tables

Chapter 2

Table 2.1

Table 2.2

Chapter 3

Table 3.1

Table 3.2

Table 3.3

Table 3.4

Table 3.5

Table 3.6

Table 3.7

Table 3.8

Table 3.9

Table 3.10

Chapter 4

Table 4.1

Table 4.2

Table 4.3

Table 4.4

Table 4.5

Table 4.6

Table 4.7

Table 4.8

Table 4.9

Table 4.10

Table 4.11

Table 4.12

Table 4.13

Table 4.14

Table 4.15

Table 4.16

Table 4.17

Table 4.18

Table 4.19

Table 4.20

Table 4.21

Table 4.22

Table 4.23

Table 4.24

Table 4.25

Table 4.26

Table 4.27

Table 4.28

Table 4.29

Table 4.30

Table 4.31

Table 4.32

Table 4.33

Table 4.34

Table 4.35

Table 4.36

Table 4.37

Table 4.38

Table 4.39

Table 4.40

Table 4.41

Table 4.42

Table 4.43

Table 4.44

Table 4.45

Table 4.46

Table 4.47

Table 4.48

Table 4.49

Chapter 5

Table 5.1

Table 5.2

Table 5.3

Table 5.4

Table 5.5

Table 5.6

Table 5.7

Table 5.8

Table 5.9

Table 5.10

Table 5.11

Table 5.12

Table 5.13

Table 5.14

Table 5.15

Table 5.16

Table 5.17

Table 5.18

Table 5.19

Table 5.20

Table 5.21

Table 5.22

Table 5.23

Table 5.24

Table 5.25

Table 5.26

Table 5.27

Table 5.28

Table 5.29

Table 5.30

Table 5.31

Table 5.32

Guide

Cover

Table of Contents

Preface

Pages

xiii

xiv

xvii

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

142

143

144

145

146

147

148

149

150

151

153

155

156

157

158

159

160

161

162

165

166

167

168

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

356

357

358

359

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

PREFACE

As the title indicates, this book is intended to be a practical guide for the statistical analysis of biomarker data. To us, such a guide should include information on the proper application of statistical methods that are most commonly used to analyze biomarker data, with special emphasis placed on the underlying assumptions. This includes recommendations concerning: (1) preferred methods for determining whether or not the underlying assumptions are valid for a particular set of biomarker data and (2) how to proceed if the underlying assumptions appear to be violated. In addition to emphasizing the underlying assumptions, we have also placed considerable emphasis on computational issues related to the methods most commonly used to analyze biomarker data. To the greatest extent possible, we have provided software (primarily SAS® code) for performing the statistical methods that we recommend. All of this software, along with the complete data sets for all of the examples, is included in the Software Appendix that is available on the companion website for this textbook. For many of the examples, the complete data set, along with the computer code for analyzing the data, are also provided in the text.

Our intention has been to present our descriptions of the statistical methods that are included in this textbook at a fairly low technical level; however, we have provided all of the necessary formulas for those who are interested in the more theoretical aspects of these methods. By including these formulas, we hoped to achieve two goals: (1) to provide sufficient information for those who wish to program these methods themselves, using Excel, R or C++, for example and (2) to provide the mathematical details for those who are interested in the theoretical foundations of these methods. Sections that we feel are unusually technical (in terms of extensive use of formulas and/or theoretical development) are marked with an asterisk (*) and can be safely skipped without loss of continuity. Overall, the presentation in this textbook does not require familiarity with calculus or matrix algebra. Generally speaking, a sufficient background for the material in this book would be the equivalent of a two-semester statistical methods sequence at the advanced undergraduate or introductory graduate level. Minimal requirements include a good working knowledge of basic statistical inference (point and interval estimation and hypothesis testing), as well as some familiarity with correlation analysis, chi-square analysis, and nonparametric statistics. Chapter 3 contains a review of the basic statistical concepts and methods that we consider to be sufficient prerequisites for the material in Chapters 4 and 5.

One may question our choice of statistical methods included in this book. For example, there is very little discussion of multivariable models, although we do provide a very elementary description of multiple linear regression in Chapter 3. We also briefly mention multiple logistic regression in several places. We realize the importance of the application of these and other multivariable methods in biomarker research (e.g., in the analysis of survival data or longitudinal data), but feel that we cannot fully address the numerous complex considerations involved in the appropriate use of these methods within the limited scope of this book. There is very heavy emphasis (some may say too heavy) in this book on testing distributional assumptions, and on nonparametric and exact methods. As illustrated in Section 4.2, investigators often state that they used a data transformation or nonparametric method to analyze their biomarker data due to “extreme skewness” (or similar language); however, very rarely do they provide any statistical justification for following this course of action. Many of the studies involving biomarker data that we have worked on as collaborating statisticians have had very small sample sizes (some as small as 5 or 6), hence the strong emphasis on exact methods. Another area that some may say has received too much emphasis in this book is the assessment of the reliability and validity of a biomarker. We feel that this aspect of biomarker development is too often ignored (or given only minimal attention) in the biomarker literature. Thus, one of our goals in writing this textbook has been to provide sufficient background knowledge and software tools so that it will be easier for investigators to undertake a more thorough assessment of these important characteristics of a newly proposed biomarker.

Wherever possible, we have illustrated the statistical methods included in this textbook with examples that make use of “real” data. Most of these data sets were taken (or adapted) from the biomarker literature, or from unpublished studies we participated in as collaborators. In some instances, we have used hypothetical data (which we hope are also realistic) when the “real data,” we needed to illustrate a particular point were not readily available. Most of these hypothetical data sets were constructed so that they retained certain key features of the published study on which they were based.

We have not written this book with the anticipation that it would be used as the primary textbook for a course on the statistical analysis of biomarker data; however, it could be used for that purpose for a summer course, an independent study, or a graduate seminar. With this in mind, we have included a small number of problems (and solutions) at the end of each chapter. We consider our primary audience for this textbook to consist of investigators who wish to perform their own analyses of biomarker data; this includes toxicologists, pharmacologists, epidemiologists, environmental and clinical laboratory scientists, and other professionals in the health and environmental sciences. Statisticians who routinely analyze and interpret biomarker data could also find it to be of interest. Since most of the statistical techniques that we discuss in this book also apply to surrogate markers (or, more accurately, surrogate endpoints), professionals who work in the areas of pharmaceutical and chemical product development could also find this book to be useful. Our goal has been to provide enough practical information (including software code and software recommendations) to facilitate the statistical analysis of biomarker data for all of these individuals.

ACKNOWLEDGEMENTS

We gratefully acknowledge the contributions of Courtney McCracken, who graciously allowed us to use excerpts from her doctoral dissertation as the basis for Section 4.8 of this text, which deals with estimating correlation coefficients when both variables are subject to limits of detection. Jennifer Waller graciously provided basic SAS® code for calculating improved confidence intervals for correlation coefficients, which we expanded into the SAS code we provide in Section 4.5. We are grateful to our collaborators, Ganesan Ramesh, Allison Hunter Buchanan, Mary Stuart, Michelle Reid, Joe Miller, Jack Price, Luis Espinoza, and Jason Goldberg for their kind permission to reproduce their research data here. We also wish to acknowledge the generosity of our colleagues Ioannis Dimakos, Paul Juneau, Nancy Cheng, Stuart Gansky, Allison Deal, Chuck Coleman, and Mark Solak for their permission to reproduce their SAS code. We gratefully acknowledge Cindy Oxford and Teresa McVeigh for their editorial assistance in the final stages of the development of this manuscript.

We wish to express our sincere appreciation to Susanne Steitz-Filler, Senior Editor for Mathematics and Statistics at John Wiley and Sons, for her encouragement, assistance, and most of all, patience, throughout the development of this manuscript. Sari Friedman, Senior Editorial Assistant for Mathematics and Statistics at Wiley, also provided very helpful advice and assistance.

1INTRODUCTION

1.1 WHAT IS A BIOMARKER?

According to the Dictionary of Epidemiology, a biomarker (or biological marker) is “a cellular, biochemical, or molecular indicator of exposure; of biological, subclinical, or clinical effects; or of possible susceptibility” (Porta 2008, p. 21). As Porta points out, the term “biomarker” is often ambiguous; this is perhaps an indication that there is insufficient understanding of the pathophysiological or mechanistic role of the “marker.”

The ambiguity may also be due to the fact that biomarkers are involved in one way or another with so many different disciplines (clinical trialists, statisticians, regulators, etc.) and clinical research applications. In fact, there is so much potential ambiguity associated with the term “biomarker” that several efforts have been made to provide a formal definition of exactly what a biomarker is.

For example, in a 1987 US National Research Council report, biomarkers were defined to be “indicators signaling events in biological systems or samples.” In 2001, a Biomarkers Definitions Working Group (BDWG), convened by the US National Institutes of Health, proposed the following definition of biological marker (biomarker): “A characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention.”

As interest in the development, validation, and application of new biomarkers has increased, numerous classification systems for biomarkers have been proposed. These include Type 0–Type 6 biomarkers; Type I and II biomarkers (Mildvan et al. 1997); prognostic and predictive biomarkers; genomic, proteomic, and combinatorial biomarkers; screening and stratification biomarkers, and so on. (See Table 1 of DeCaprio (2006) for details.) Most of these classification systems reflect the intended use of the biomarker data in a particular discipline; however, all biomarkers are related in the sense that each of them is designed to be an “indicator” of something, as noted in the Dictionary of Epidemiology definition cited above. Our primary focus in this book is on markers of exposure, although the statistical techniques we describe can be applied to almost any type of biomarker. By using “real” data taken from published biomarker studies to exemplify the proper application of these techniques, we have tried to illustrate the broad applicability of statistical methods in the analysis of biomarker data, regardless of the particular type of biomarker that is being considered.

1.2 BIOMARKERS VERSUS SURROGATE ENDPOINTS

In their report describing preferred definitions for biomarkers and surrogate endpoints, the BDWG defined a clinical endpoint as: “A characteristic or variable that reflects how a patient feels, functions, or survives.” They then defined a surrogate endpoint as: “A biomarker that is intended to substitute for a clinical endpoint.” A surrogate endpoint is thus expected to “predict clinical benefit (or harm or lack of benefit or harm) based on epidemiologic, therapeutic, pathophysiologic, or other scientific evidence.” As they pointed out, all surrogate endpoints are biomarkers, but not all biomarkers are surrogate endpoints. In fact, “it is likely that only a few biomarkers will achieve surrogate endpoint status.” (Note that they discouraged the use of the term surrogate marker, and advocated the exclusive use of surrogate endpoint instead (BDWG 2001, p. 91).)

Because of the requirement that one must be able to substitute a surrogate endpoint in place of the corresponding clinical endpoint, the process of validating a surrogate endpoint goes far beyond what is usually required when validating a biomarker (see Chapter 5). In fact, the BDWG claimed that the term validation is unsuitable for describing the process of linking biomarkers to clinical endpoints; they proposed that the process of determining surrogate endpoint status be referred to as evaluation. They reserved use of validation to describe the process of addressing what they referred to as the “performance characteristics” (e.g., sensitivity, specificity, and reproducibility) of a measurement process or assay technique. This is consistent with our use of the term biomarker validation in Chapter 5.

Because of the complexity involved in evaluating a surrogate endpoint, various approaches have been proposed, almost all of which involve examining the effect of a treatment for the clinical endpoint (typically referred to as the “disease”) on the surrogate for the endpoint. In a landmark paper, Prentice (1989) formulated a definition of surrogate endpoints and defined a set of operational criteria for their evaluation. In their subsequent work, Freedman et al. (1992) proposed that one should focus attention on the proportion of the treatment effect explained by the surrogate for the disease endpoint, whereas Buyse and Molenberghs (1998) proposed that the primary focus should be on the relative effect of the treatment on the surrogate. Various authors also advocated the use of meta-analytic data in the evaluation of a surrogate endpoint (Freedman et al. 1992; Lin et al. 1997; Daniels and Hughes 1997). The application of meta-analytic techniques to surrogate endpoint evaluation was further developed by Buyse et al. (2000); Gail et al. (2000); Molenberghs et al. (2002); and others. The very comprehensive textbook edited by Burzykowski et al. (2005) thoroughly discuss all of these statistical approaches and subsequent developments. The Institute of Medicine report (Micheel and Ball 2010) approaches the evaluation of surrogate endpoints from a more clinical perspective.

Although surrogate endpoints are certainly a very important special case of biomarkers, we feel that the specialized techniques developed for evaluating them, especially as these techniques relate to treatment of the clinical endpoint, are beyond the scope of this text. Hence, we do not discuss surrogate endpoints as a separate topic elsewhere in this book. However, the methods that we describe for analyzing biomarker data and validating a biomarker (as defined by BDWG), certainly apply to surrogate endpoints as well.

1.3 ORGANIZATION OF THIS BOOK

In Chapter 1, we define what we mean by a biomarker and then describe our understanding of the differences and similarities between biomarkers and surrogate endpoints.

In Chapter 2, we cover basic principles of effective design of a study that will make use of biomarker data, including selecting the most appropriate type of study design (cross-sectional, case–control, etc.), choosing the appropriate measure of association once the type of design has been selected, designing the statistical analysis that will be applied to the study data once they have been obtained, and choosing the appropriate sample size for the study that is being planned. We also describe several features of what we consider to be the effective presentation of statistical results once the study data have been analyzed.

In Chapter 3, we provide a survey of elementary statistical methods that are widely used when analyzing biomarker data. To be specific, the methods that we cover include: graphical and tabular summaries; descriptive statistics; basic concepts of statistical inference, including point estimation, confidence interval estimation, and hypothesis testing; comparisons of means between two groups and among more than two groups; statistical inference for correlation coefficients; simple and multiple linear regression; and analysis of cross-classified data, including the chi-square test of independence and methods for comparing proportions across two or more groups. Our intention in this chapter is not to provide comprehensive coverage of all of elementary statistical methods, but rather to describe selected methods in sufficient detail so that someone who is relatively inexperienced in the application of statistics will be able to carry out these analyses appropriately and with a minimum of effort.

In Chapter 4, we describe various “challenges” that one is likely to encounter in the analysis of biomarker data and offer our recommendations on preferred methods for dealing with them. These challenges include: (1) violations of underlying assumptions (normality, homogeneity of variance), (2) lack of independence between the groups being compared, (3) proper analysis of correlated data, (4) clustered data, (5) contaminated data, (6) non-detectable observations, (7) choosing the appropriate measure of association between predictor and outcome, and (8) choosing the appropriate method of analysis for cross-classified data (i.e., contingency tables). Each of these challenges is illustrated using data from a “real” biomarker study, most of which were taken from the scientific literature.

In Chapter 5, we provide a detailed discussion of the methods we recommend for evaluating the quality of a newly proposed (or existing) biomarker (also called biomarker validation). Our focus is on establishing that the biomarker has adequate reliability and validity.

Throughout Chapters 3–5, we provide what we hope is sufficient mathematical detail for those who are interested, but our primary emphasis is on the proper application of the statistical methods. Sections marked with an asterisk (*) contain a more theoretical treatment of the topic at hand and can safely be omitted without loss of continuity with the remainder of the text.

To the greatest extent possible, we provide software code for performing the statistical methods that we describe. Our software of choice is SAS because of its flexibility and widespread use in industry, government, and academia; but, in some instances, we also indicate how to perform an analysis using R (R Core Development Team 2014) or other statistical software (SPSS, STATA, etc.). The data sets for the fully worked examples in the book are provided along with the code used to analyze them. Shorter segments of code are included in the body of the text and are available on the companion website; longer segments are not provided in the text, but are available on the website.

We do not anticipate that the primary audience for this textbook will be students, so we have not provided extensive problem sets. However, we do recognize that exercises (with solutions) are an effective tool for anyone who is trying to learn how to perform a particular type of statistical analysis for the first time, or for someone who is trying to refresh their memory of statistical methods that they may have studied years ago. From personal experience, we know that exercises with solutions can be extremely helpful for experienced statisticians who are trying to learn about a statistical method they have never used before, or about applications of certain statistical methods in a scientific field that is new to them. Exercises with solutions are also useful as “test cases,” when someone is trying to write their own software to carry out a statistical method for which easily accessible software is not available. With this in mind, we have provided small problem sets at the end of Chapters 3, 4, and 5. These contain exercises that are similar to the worked examples included in the text. To the greatest extent possible, these exercises are based on “real” data taken from published biomarker studies. Solutions to these exercises, including the SAS or R code needed to carry out the analyses, are provided at the end of the text.