An Introduction to Bootstrap Methods with Applications to R - Michael R. Chernick - E-Book

An Introduction to Bootstrap Methods with Applications to R E-Book

Michael R. Chernick

0,0
107,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A comprehensive introduction to bootstrap methods in the R programming environment

Bootstrap methods provide a powerful approach to statistical data analysis, as they have more general applications than standard parametric methods. An Introduction to Bootstrap Methods with Applications to R explores the practicality of this approach and successfully utilizes R to illustrate applications for the bootstrap and other resampling methods. This book provides a modern introduction to bootstrap methods for readers who do not have an extensive background in advanced mathematics. Emphasis throughout is on the use of bootstrap methods as an exploratory tool, including its value in variable selection and other modeling environments.

The authors begin with a description of bootstrap methods and its relationship to other resampling methods, along with an overview of the wide variety of applications of the approach. Subsequent chapters offer coverage of improved confidence set estimation, estimation of error rates in discriminant analysis, and applications to a wide variety of hypothesis testing and estimation problems, including pharmaceutical, genomics, and economics. To inform readers on the limitations of the method, the book also exhibits counterexamples to the consistency of bootstrap methods.

An introduction to R programming provides the needed preparation to work with the numerous exercises and applications presented throughout the book. A related website houses the book's R subroutines, and an extensive listing of references provides resources for further study.

Discussing the topic at a remarkably practical and accessible level, An Introduction to Bootstrap Methods with Applications to R is an excellent book for introductory courses on bootstrap and resampling methods at the upper-undergraduate and graduate levels. It also serves as an insightful reference for practitioners working with data in engineering, medicine, and the social sciences who would like to acquire a basic understanding of bootstrap methods.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 418

Veröffentlichungsjahr: 2014

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



CONTENTS

PREFACE

ACKNOWLEDGMENTS

LIST OF TABLES

1 INTRODUCTION

1.1 HISTORICAL BACKGROUND

1.2 DEFINITION AND RELATIONSHIP TO THE DELTA METHOD AND OTHER RESAMPLING METHODS

1.3 WIDE RANGE OF APPLICATIONS

1.4 THE BOOTSTRAP AND THE R LANGUAGE SYSTEM

1.5 HISTORICAL NOTES

1.6 EXERCISES

REFERENCES

2 ESTIMATION

2.1 ESTIMATING BIAS

2.2 ESTIMATING LOCATION

2.3 ESTIMATING DISPERSION

2.4 LINEAR REGRESSION

2.5 NONLINEAR REGRESSION

2.6 NONPARAMETRIC REGRESSION

2.7 HISTORICAL NOTES

2.8 EXERCISES

REFERENCES

3 CONFIDENCE INTERVALS

3.1 SUBSAMPLING, TYPICAL VALUE THEOREM, AND EFRON’S PERCENTILE METHOD

3.2 BOOTSTRAP-T

3.3 ITERATED BOOTSTRAP

3.4 BIAS-CORRECTED (BC) BOOTSTRAP

3.5 BCA AND ABC

3.6 TILTED BOOTSTRAP

3.7 VARIANCE ESTIMATION WITH SMALL SAMPLE SIZES

3.8 HISTORICAL NOTES

3.9 EXERCISES

REFERENCES

4 HYPOTHESIS TESTING

4.1 Relationship to Confidence Intervals

4.2 WHY TEST HYPOTHESES DIFFERENTLY?

4.3 TENDRIL DX EXAMPLE

4.4 KLINGENBERG EXAMPLE: BINARY DOSE–RESPONSE

4.5 HISTORICAL NOTES

4.6 EXERCISES

REFERENCES

5 TIME SERIES

5.1 FORECASTING METHODS

5.2 TIME DOMAIN MODELS

5.3 CAN BOOTSTRAPPING IMPROVE PREDICTION INTERVALS?

5.4 MODEL-BASED METHODS

5.5 BLOCK BOOTSTRAPPING FOR STATIONARY TIME SERIES

5.6 DEPENDENT WILD BOOTSTRAP (DWB)

5.7 FREQUENCY-BASED APPROACHES FOR STATIONARY TIME SERIES

5.8 SIEVE BOOTSTRAP

5.9 HISTORICAL NOTES

5.10 EXERCISES

REFERENCES

6 BOOTSTRAP VARIANTS

6.1 BAYESIAN BOOTSTRAP

6.2 SMOOTHED BOOTSTRAP

6.3 PARAMETRIC BOOTSTRAP

6.4 DOUBLE BOOTSTRAP

6.5 THE M-OUT-OF-N BOOTSTRAP

6.6 THE WILD BOOTSTRAP

6.7 HISTORICAL NOTES

6.8 EXERCISES

REFERENCES

7 CHAPTER SPECIAL TOPICS

7.1 SPATIAL DATA

7.2 SUBSET SELECTION IN REGRESSION

7.3 DETERMINING THE NUMBER OF DISTRIBUTIONS IN A MIXTURE

7.4 CENSORED DATA

7.5 P-VALUE ADJUSTMENT

7.6 BIOEQUIVALENCE

7.7 PROCESS CAPABILITY INDICES

7.8 MISSING DATA

7.9 POINT PROCESSES

7.10 BOOTSTRAP TO DETECT OUTLIERS

7.11 LATTICE VARIABLES

7.12 COVARIATE ADJUSTMENT OF AREA UNDER THE CURVE ESTIMATES FOR RECEIVER OPERATING CHARACTERISTIC (ROC) CURVES

7.13 BOOTSTRAPPING IN SAS

7.14 HISTORICAL NOTES

7.15 EXERCISES

REFERENCES

8 WHEN THE BOOTSTRAP IS INCONSISTENT AND How TO REMEDY IT

8.1 TOO SMALL OF A SAMPLE SIZE

8.2 DISTRIBUTIONS WITH INFINITE SECOND MOMENTS

8.3 ESTIMATING EXTREME VALUES

8.4 SURVEY SAMPLING

8.5 M-DEPENDENT SEQUENCES

8.6 UNSTABLE AUTOREGRESSIVE PROCESSES

8.7 LONG-RANGE DEPENDENCE

8.8 BOOTSTRAP DIAGNOSTICS

8.9 HISTORICAL NOTES

8.10 EXERCISES

REFERENCES

AUTHOR INDEX

SUBJECT INDEX

Copyright © 2011 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Chernick, Michael R.

 An introduction to bootstrap methods with applications to R / Michael R. Chernick, Robert A. LaBudde.

   p. cm.

  Includes bibliographical references and index.

 ISBN 978-0-470-46704-6 (hardback)

1. Bootstrap (Statistics) 2. R (Computer program language) I. LaBudde, Robert A., 1947– II. Title.QA276.8.C478 2011519.5’4–dc22

2011010972

PREFACE

The term “bootstrapping” refers to the concept of “pulling oneself up by one's bootstraps,” a phrase apparently first used in The Singular Travels, Campaigns and Adventures of Baron Munchausen by Rudolph Erich Raspe in 1786. The derivative of the same term is used in a similar manner to describe the process of “booting” a computer by a sequence of software increments loaded into memory at power-up.

In statistics, “bootstrapping” refers to making inferences about a sampling distribution of a statistic by “resampling” the sample itself with replacement, as if it were a finite population. To the degree that the resampling distribution mimics the original sampling distribution, the inferences are accurate. The accuracy improves as the size of the original sample increases, if the central limit theorem applies.

“Resampling” as a concept was first used by R. A. Fisher (1935) in his famous randomization test, and by E. J. G. Pitman (1937, 1938), although in these cases the sampling was done without replacement.

The “bootstrap” as sampling with replacement and its Monte Carlo approximate form was first presented in a Stanford University technical report by Brad Efron in 1977. This report led to his famous paper in the Annals of Statistics in 1979. However, the Monte Carlo approximation may be much older. In fact, it is known that Julian Simon at the University of Maryland proposed the Monte Carlo approximation as an educational tool for teaching probability and statistics. In the 1980s, Simon and Bruce started a company called Resampling Stats that produced a software product to do bootstrap and permutation sampling for both educational and inference purposes.

But it was not until Efron's paper that related the bootstrap to the jackknife and other resampling plans that the statistical community got involved. Over the next 20 years, the theory and applications of the bootstrap blossomed, and the Monte Carlo approximation to the bootstrap became a very practiced approach to making statistical inference without strong parametric assumptions.

Michael Chernick was a graduate student in statistics at the time of Efron's early research and saw the development of bootstrap methods from its very beginning. However, Chernick did not get seriously involved into bootstrap research until 1984 when he started to find practical applications in nonlinear regression models and classification problems while employed at the Aerospace Corporation.

After meeting Philip Good in the mid-1980s, Chernick and Good set out to accumulate an extensive bibliography on resampling methods and planned a two-volume text with Chernick to write the bootstrap methods volume and Good the volume on permutation tests. The project that was contracted by Oxford University Press was eventually abandoned. Good eventually published his work with Springer-Verlag, and later, Chernick separately published his work with Wiley. Chernick and Good later taught together short courses on resampling methods, first at UC Irvine and later the Joint Statistical Meetings in Indianapolis in 2000. Since that time, Chernick has taught resampling methods with Peter Bruce and later bootstrap methods for statistics.com.

Robert LaBudde wrote his PhD dissertation in theoretical chemistry on the application of Monte Carlo methods to simulating elementary chemical reactions. A long time later, he took courses in resampling methods and bootstrap methods from statistics.com, and it was in this setting in Chernick's bootstrap methods course that the two met. In later sessions, LaBudde was Chernick's teaching assistant and collaborator in the course and provided exercises in the R programming language. This course was taught using the second edition of Chernick's bootstrap text. However, there were several deficiencies with the use of this text including lack of homework problems and software to do the applications. The level of knowledge required was also variable. This text is intended as a text for an elementary course in bootstrap methods including Chernick's statistics.com course.

This book is organized in a similar way as Bootstrap Methods: A Guide for Practitioners and Researchers. Chapter 1 provides an introduction with some historical background, a formal description of the bootstrap and its relationship to other resampling methods, and an overview of the wide variety of applications of the approach. An introduction to R programming is also included to prepare the student for the exercises and applications that require programming using this software system. Chapter 2 covers point estimation, Chapter 3 confidence intervals, and Chapter 4 hypothesis testing. More advanced topics begin with time series in Chapter 5. Chapter 6 covers some of the more important variants of the bootstrap. Chapter 7 covers special topics including spatial data analysis, P-value adjustment in multiple testing, censored data, subset selection in regression models, process capability indices, and some new material on bioequivalence and covariate adjustment to area under the curve for receiver operating characteristics for diagnostic tests. The final chapter, Chapter 8, covers various examples where the bootstrap was found not to work as expected (fails asymptotic consistency requirements). But in every case, modifications have been found that are consistent.

This text is suitable for a one-semester or one-quarter introductory course in bootstrap methods. It is designed for users of statistics more than statisticians. So, students with an interest in engineering, biology, genetics, geology, physics, and even psychology and other social sciences may be interested in this course because of the various applications in their field. Of course, statisticians needing a basic understanding of the bootstrap and the surrounding literature may find the course useful. But it is not intended for a graduate course in statistics such as Hall (1992), Shao and Tu (1995), or Davison and Hinkley (1997). A shorter introductory course could be taught using just Chapters 1–4. Chapters 1–4 could also be used to incorporate bootstrap methods into a first course on statistical inference. References to the literature are covered in the historical notes sections in each chapter. At the end of each chapter is a set of homework exercises that the instructor may select from for homework assignments.

Initially, it was our goal to create a text similar to Chernick (2007) but more suitable for a full course on bootstrapping with a large number of exercises and examples illustrated in R. Also, the intent was to make the course elementary with technical details left for the interested student to read the original articles or other books. Our belief was that there were few new developments to go beyond the coverage of Chernick (2007). However, we found that with the introduction of “bagging” and “boosting,” a new role developed for the bootstrap, particularly when estimating error rates for classification problems. As a result, we felt that it was appropriate to cover the new topics and applications in more detail. So parts of this text are not at the elementary level.

MICHAEL R. CHERNICKROBERT A. LABUDDE

ACKNOWLEDGMENTS

The authors would like to thank our acquisitions editor Steve Quigley, who has always been enthusiastic about our proposals and always provides good advice. Also, Jackie Palmieri at Wiley for always politely reminding us when our manuscript was expected and for cheerfully accepting changes when delays become necessary. But it is that gentle push that gets us moving to completion. We also thank Dr. Jiajing Sun for her review and editing of the manuscript as well as her help putting the equations into Latex. We especially would like to thank Professor Dmitiris Politis for a nice and timely review of the entire manuscript. He also provided us with several suggestions for improving the text and some additions to the literature to take account of important ideas that we omitted. He also provided us with a number of reference that preceded Bickel, Gotze, and van Zwet (1996) on the value of taking bootstrap samples of size m less than n, as well as some other historical details for sieve bootstrap, and subsampling.

M.R.C.R.A.L.

LIST OF TABLES

1

INTRODUCTION

1.1 HISTORICAL BACKGROUND

The “bootstrap” is one of a number of techniques that is now part of the broad umbrella of nonparametric statistics that are commonly called resampling methods. Some of the techniques are far older than the bootstrap. Permutation methods go back to Fisher (1935) and Pitman (1937, 1938), and the jackknife started with Quenouille (1949). Bootstrapping was made practical through the use of the Monte Carlo approximation, but it too goes back to the beginning of computers in the early 1940s.

However, 1979 is a critical year for the bootstrap because that is when Brad Efron’s paper in the Annals of Statistics was published (Efron, 1979). Efron had defined a resampling procedure that he coined as bootstrap. He constructed it as a simple approximation to the jackknife (an earlier resampling method that was developed by John Tukey), and his original motivation was to derive properties of the bootstrap to better understand the jackknife. However, in many situations, the bootstrap is as good as or better than the jackknife as a resampling procedure. The jackknife is primarily useful for small samples, becoming computationally inefficient for larger samples but has become more feasible as computer speed increases. A clear description of the jackknife and its connecton to the bootstrap can be found in the SIAM monograph Efron (1982). A description of the jackknife is also given in Section 1.2.1.

Although permutation tests were known in the 1930s, an impediment to their use was the large number (i.e., n!) of distinct permutations available for samples of size n. Since ordinary bootstrapping involves sampling with replacement n times for a sample of size n, there are nn possible distinct ordered bootstrap samples (though some are equivalent under the exchangeability assumption because they are permutations of each other). So, complete enumeration of all the bootstrap samples becomes infeasible except in very small sample sizes. Random sampling from the set of possible bootstrap samples becomes a viable way to approximate the distribution of bootstrap samples. The same problem exists for permutations and the same remedy is possible. The only difference is that n! does not grow as fast as nn, and complete enumeration of permutations is possible for larger n than for the bootstrap.

The idea of taking several Monte Carlo samples of size n with replacement from the original observations was certainly an important idea expressed by Efron but was clearly known and practiced prior to Efron (1979). Although it may not be the first time it was used, Julian Simon laid claim to priority for the bootstrap based on his use of the Monte Carlo approximation in Simon (1969). But Simon was only recommending the Monte Carlo approach as a way to teach probability and statistics in a more intuitive way that does not require the abstraction of a parametric probability model for the generation of the original sample. After Efron made the bootstrap popular, Simon and Bruce joined the campaign (see Simon and Bruce, 1991, 1995).

Efron, however, starting with Efron (1979), first connected bootstrapping to the jackknife, delta method, cross-validation, and permutation tests. He was the first to show it to be a real competitor to the jackknife and delta method for estimating the standard error of an estimator. Also, quite early on, Efron recognized the broad applicability of bootstrapping for confidence intervals, hypothesis testing, and more complex problems. These ideas were emphasized in Efron and Gong (1983), Diaconis and Efron (1983), Efron and Tibshirani (1986), and the SIAM monograph (Efron 1982). These influential articles along with the SIAM monograph led to a great deal of research during the 1980s and 1990s. The explosion of bootstrap papers grew at an exponential rate. Key probabilistic results appeared in Singh (1981), Bickel and Freedman (1981, 1984), Beran (1982), Martin (1990), Hall (1986, 1988), Hall and Martin (1988), and Navidi (1989).

In a very remarkable paper, Efron (1983) used simulation comparisons to show that the use of bootstrap bias correction could provide better estimates of classification error rate than the very popular cross-validation approach (often called leave-one-out and originally proposed by Lachenbruch and Mickey, 1968. These results applied when the sample size was small, and classification was restricted to two or three classes only, and the predicting features had multivariate Gaussian distributions. Efron compared several variants of the bootstrap with cross-validation and the resubstitution methods. This led to several follow-up articles that widened the applicability and superiority of a version of the bootstrap called 632. See Chatterjee and Chatterjee (1983), Chernick et al. (1985, 1986, 1988a, b), Jain et al. (1987), and Efron and Tibshirani (1997).

Chernick was a graduate student at Stanford in the late 1970s when the bootstrap activity began on the Stanford and Berkeley campuses. However, oddly the bootstrap did not catch on with many graduate students. Even Brad Efron’s graduate students chose other topics for their dissertation. Gail Gong was the first student of Efron to do a dissertation on the bootstrap. She did very useful applied work on using the bootstrap in model building (particularly for logistic regression subset selection). See Gong (1986). After Gail Gong, a number of graduate students wrote dissertations on the bootstrap under Efron, including Terry Therneau, Rob Tibshirani, and Tim Hesterberg. Michael Martin visited Stanford while working on his dissertation on bootstrap confidence intervals under Peter Hall. At Berkeley, William Navidi did his thesis on bootstrapping in regression and econometric models under David Freedman.

While exciting theoretical results developed for the bootstrap in the 1980s and 1990s, there were also negative results where it was shown that the bootstrap estimate is not “consistent” in the probabilistic sense (i.e., approaches the true parameter value as the sample size becomes infinite). Examples included the mean when the population distribution does not have a finite variance and when the maximum or minimum is taken from a sample. This is illustrated in Athreya (1987a, b), Knight (1989). Angus (1993), and Hall et al. (1993). The first published example of an inconsistent bootstrap estimate appeared in Bickel and Freedman (1981). Shao et al. (2000) showed that a particular approach to bootstrap estimation of individual bioequivalence is also inconsistent. They also provide a modification that is consistent. Generally, the bootstrap is consistent when the central limit theorem applies (a sufficient condition is Lyapanov’s condition that requires existence of the 2 + δ moment of the population distribution). Consistency results in the literature are based on the existence of Edgeworth expansions; so, additional smoothness conditions for the expansion to exist have also been assumed (but it is not known whether or not they are necessary).

One extension of the bootstrap called m-out-of-n was suggested by Bickel and Ren (1996) in light of previous research on it, and it has been shown to be a method to overcome inconsistency of the bootstrap in several instances. In the m-out-of-n bootstrap, sampling is with replacement from the original sample but with a value of m that is smaller than n. See Bickel et al. (1997), Gine and Zinn (1989), Arcones and Gine (1989), Fukuchi (1994), and Politis et al. (1999).

Some bootstrap approaches in time series have been shown to be inconsistent. Lahiri (2003) covered the use of bootstrap in time series and other dependent cases. He showed that there are remedies for the m-dependent and moving block bootstrap cases (see Section 5.5 for some coverage of moving block bootstrap) that are consistent.

1.2 DEFINITION AND RELATIONSHIP TO THE DELTA METHOD AND OTHER RESAMPLING METHODS

We will first provide an informal definition of bootstrap to provide intuition and understanding before a more formal mathematical definition. The objective of bootstrapping is to estimate a parameter based on the data, such as a mean, median, or standard deviation. We are also interested in the properties of the distribution for the parameter’s estimate and may want to construct confidence intervals. But we do not want to make overly restrictive assumptions about the form of the distribution that the observed data came from.

For the simple case of independent observations coming from the same population distribution, the basic element for bootstrapping is the empirical distribution. The empirical distribution is just the discrete distribution that gives equal weight to each data point (i.e., it assigns probability 1/n to each of the original n observations and shall be denoted Fn).

Now the idea of bootstrap is to use only what you know from the data and not introduce extraneous assumptions about the population distribution. The “bootstrap principle” says that when F is the population distribution and T(F) is the functional that defines the parameter, we wish to estimate based on a sample of size n, let Fn play the role of F and , the bootstrap distribution (soon to be defined), play the role of Fn in the resampling process. Note that the original sample is a sample of n independent identically distributed observations from the distribution F and the sample estimate of the parameter is T(Fn). So, in bootstrapping, we let Fn play the role of F and take n independent and identically distributed observations from Fn. Since Fn is the empirical distribution, this is just sampling randomly with replacement from the original data.

If we repeat this many times, we get a histogram of values for the mean, which we will call the Monte Carlo approximation to the bootstrap distribution. The average of all these values will be very close to 6.0 since the theoretical mean of the bootstrap distribution is the sample mean. But from the histogram (i.e., resampling distribution), we can also see the variability of these estimates and can use the histogram to estimate skewness, kurtosis, standard deviation, and confidence intervals.

1. Generate a sample with replacement from the empirical distribution for the data (this is a bootstrap sample).
2. Compute T() the bootstrap estimate of T(F). This is a replacement of the original sample with a bootstrap sample and the bootstrap estimate of T(F) in place of the sample estimate of T(F).
3. Repeat steps 1 and 2 M times where M is large, say 100,000.

Now a very important thing to remember is that with the Monte Carlo approximation to the bootstrap, there are two sources of error:

1. the Monte Carlo approximation to the bootstrap distribution, which can be made as small as you like by making M large;
2. the approximation of the bootstrap distribution to the population distribution F.

If T() converges to T(F) as n → ∞, then bootstrapping works. It is nice that this works out often, but it is not guaranteed. We know by a theorem called the Glivenko–Cantelli theorem that Fn converges to F uniformly. Often, we know that the sample estimate is consistent (as is the case for the sample mean). So, (1) T(Fn) converges to T(F) as n → ∞. But this is dependent on smoothness conditions on the functional T. So we also need (2) T() – T(Fn) to tend to 0 as n → ∞. In proving that bootstrapping works (i.e., the bootstrap estimate is consistent for the population parameter), probability theorists needed to verify (1) and (2). One approach that is commonly used is by verifying that smoothness conditions are satisfied for expansions like the Edgeworth and Cornish∞Fisher expansions. Then, these expansions are used to prove the limit theorems.

The probability theory associated with the bootstrap is beyond the scope of this text and can be found in books such as Hall (1992). What is important is that we know that consistency of bootstrap estimates has been demonstrated in many cases and examples where certain bootstrap estimates fail to be consistent are also known. There is a middle ground, which are cases where consistency has been neither proved nor disproved. In those cases, simulation studies can be used to confirm or deny the usefulness of the bootstrap estimate. Also, simulation studies can be used when the sample size is too small to count on asymptotic theory, and its use in small to moderate sample sizes needs to be evaluated.

1.2.1 Jackknife

The jackknife was introduced by Quenouille (1949). Quenouille’s aim was to improve an estimate by correcting for its bias. Later on, Tukey (1958) popularized the method and found that a more important use of the jackknife was to estimate standard errors of an estimate. It was Tukey who coined the name jackknife because it was a statistical tool with many purposes. While bootstrapping uses the bootstrap samples to estimate variability, the jackknife uses what are called pseudovalues.

where . The jackknife estimate of standard error for is just the square root of . Tukey defined the pseudovalue as . Then the jack-knife estimate of the parameter u is . So the name pseudovalue comes about because the estimate is the average of the pseudovalues. Expressing the estimate of the variance of the estimate in terms of the pseudovalues we get

In this form, we see that the variance is the usual estimate for variance of a sample mean. In this case, it is the sample mean of the pseudovalues. Like the bootstrap, the jackknife has been a very useful tool in estimating variances for more complicated estimators such as trimmed or Winsorized means.

One of the great surprises about the bootstrap is that in cases like the trimmed mean, the bootstrap does better than the jackknife (Efron, 1982, pp. 28–29). For the sample median, the bootstrap provides a consistent estimate of the variance but the jackknife does not! See Efron (1982, p. 16 and chapter 6). In that monograph, Efron also showed, using theorem 6.1, that the jackknife estimate of standard error is essentially the bootstrap estimate with the parameter estimate replaced by a linear approximation of it. In this way, there is a close similarity between the two methods, and if the linear approximation is a good approximation, the jackknife and the bootstrap will both be consistent. However, there are complex estimators where this is not the case.

1.2.2 Delta Method

It is often the case that we are interested in the moments of an estimator. In particular, for these various methods, the variance is the moment we are most interested in. To illustrate the delta method, let us define φf(α) where the parameters φ and α are both one-dimensional variables and f is a function differentiable with respect to α. So there exists a Taylor series expansion for f at a point say α0. Carrying it out only to first order, we get + remainder terms and dropping the remainder terms leaves

or

(1.1)

1.2.3 Cross-Validation

Cross-validation is a general procedure used in statistical modeling. It can be used to determine the best model out of alternative choices such as order of an autoregressive time series model, which variables to include in a logistic regression or a multiple linear regression, number of distributions in a mixture model, and the choice of a parametric classification model or for pruning classification trees.

The basic idea of cross-validation is to randomly split the data into two subsets. One is used to fit the model, and the other is used to test the model. The extreme case would be to fit all the data except for a single observation and see how well that model predicts the value of the observation left out. But a sample of size 1 is not very good for assessment. So, in the case of classification error rate estimation, Lachenbruch and Mickey (1968) proposed the leave-one-out method of assessment. In this case, a model is fit to the n – 1 observations that are included and is tested on the one left out. But the model fitting and prediction is then done separately for all n observations by testing the model fit without observation (for predicting the class for the case i. Results are obtained from each i and then averaged. Efron (1983) included a simulation study that showed for bivariate normal distributions the “632” variant of the bootstrap does better than leave-one-out. For pruning classification trees, see Brieman et al. (1984).

1.2.4 Subsampling

The idea of subsampling goes back to Hartigan (1969), who developed a theory of confidence intervals for random subsampling. He proved a theorem called the typical value theorem when M-estimators are used to estimate parameters. We shall see in the chapter on confidence intervals that Hartigan’s results were motivating factors for Efron to introduce the percentile method bootstrap confidence intervals.

More recently the theory of subsampling has been further developed and related to the bootstrap. It has been applied when the data are independent observations and also when there are dependencies among the data. A good summary of the current literature along with connections to the bootstrap can be found in Politis et al. (1999), and consistency under very minimal assumptions can be found in Politis and Romano(1994). Politis, Romano, and Wolf included applications when the observations are independent and also for dependent situations such as stationary and nonstationary time series, random fields, and marked point processes. The dependent situations are also well covered in section 2.8 of Lahiri (2003).

We shall now define random subsampling. Let S1, S2, …, SB–1 be B – 1 of the 2n– 1 nonempty subsets of the integers 1, 2, …, n. These B – 1 subsets are selected at random without replacement. So a subset of size 3 might be drawn, and it would contain {1, 3, 5}. Another subset of size 3 that could be drawn could be {2, 4, n}. Subsets of other sizes could also be drawn. For example, a subset of size 5 is {1, 7, 9, 12, 13}. There are many subsets to select from. There is only 1 subset of size n, and it contains all the integers from 1 to n. There are n subsets of size n – 1. Each distinct subset excludes one and only one of the integers from 1 to n. For more details on this and M-estimators and the typical value theorem see sections 3.1.1 and 3.1.2 of Chernick (2007).

1.3 WIDE RANGE OF APPLICATIONS

There is a great deal of temptation to apply the bootstrap in a wide variety of settings. But as we have seen, the bootstrap does not always work. So how do we know when it will work? We either have to prove a consistency theorem under a set of assumptions or we have to verify that it is well behaved through simulations.

In regression problems, there are at least two approaches to bootstrapping. One is called “bootstrapping residuals,” and the other is called “bootstrapping vectors or cases.” In the first approach, we fit a model to the data and compute the residuals from the model. Then we generate a bootstrap sample by resampling with replacement from the model residuals. In the second approach, we resample with replacement from the n, k +1 dimensional vectors:

In the first approach, the model is fixed. In the second, it is redetermined each time. Both methods can be applied when a parametric regression model is assumed. But in practice, we might not be sure that the parametric form is correct. In such cases, it is better to use the bootstrapping vectors approach.

The bootstrap has also been successfully applied to the estimation of error rates for discriminant functions using bias adjustment as we will see in Chapter 2. The bootstrap and another resampling procedure called “permutation tests,” as described in Good (1994), are attractive because they free the scientists from restrictive parametric assumptions that may not apply in their particular situation.

Sometimes the data can have highly skewed or heavy-tailed distributions or multiple modes. There is no need to simplify the model by, say, a linear approximation when the appropriate model is nonlinear. The estimator can be defined through an algorithm and there does not need to be an analytic expression for the parameters to be estimated.

Another feature of the bootstrap is its simplicity. For almost any problem you can think of, there is a way to construct bootstrap samples. Using the Monte Carlo approximation to the bootstrap estimate, all the work can be done by the computer. Even though it is a computer-intensive method, with the speed of the modern computer, most problems are feasible, and in many cases, up to 100,000 bootstrap samples can be generated without consuming hours of CPU time. But care must be taken. It is not always apparent when the bootstrap will fail, and failure may not be easy to diagnose.

In recent years, we are finding that there are ways to modify the bootstrap so that it will work for problems where the simple (or naive) bootstrap is known to fail. The “m-out-n” bootstrap is one such example.

In many situations, the bootstrap can alert the practitioner to variability in his procedures that he otherwise would not be aware of. One example in spatial statistics is the development of pollution level contours based on a smoothing method called “kriging.” By generating bootstrap samples, multiple kriging contour maps can be generated, and the differences in the contours can be determined visually.

Also, the stepwise logistic regression problem that is described in Gong (1986) shows that variable selection can be somewhat of a chance outcome when there are many competing variables. She showed this by bootstrapping the entire stepwise selection procedure and seeing that the number of variables and the choice of variables selected can vary from one bootstrap sample to the next.

Babu and Feigelson (1996) applied the bootstrap to astronomy problems. In clinical trials, the bootstrap is used to estimate individual bioequivalence, for P-value adjustment with multiple end points, and even to estimate mean differences when the sample size is not large enough for asymptotic theory to take hold or the data are very nonnormal and statistics other that the mean are important.

1.4 THE BOOTSTRAP AND THE R LANGUAGE SYSTEM

In subsequent chapters of this text, we will illustrate examples with calculations and short programs using the R language system and its associated packages.

R is an integrated suite of an object-oriented programming language and software facilities for data manipulation, calculation, and graphical display. Over the last decade, R has become the statistical environment of choice for academics, and probably is now the most used such software system in the world. The number of specialized packages available in R has increased exponentially, and continues to do so. Perhaps the best thing about R (besides its power and breadth) is this: It is completely free to use. You can obtain your own copy of the R system at http://www.cran.r-project.org/.

From this website, you can get not only the executable version of R for Linux, Macs, or Windows, but also even the source programs and free books containing documentation. We have found The R Book by Michael J. Crawley a good way to learn how to use R, and have found it to be an invaluable reference afterword.

There are so many good books and courses from which you can learn R, including courses that are Internet based, such as at http://statistics.com. We will not attempt to teach even the basics of R here. What we will do is show those features of direct applicability, and give program snippets to illustrate examples and the use of currently available R packages for bootstrapping. These snippets will be presented in the Courier typeface to distinguish them from regular text and to maintain spacing in output generated.

At the current time, using R version 2.10.1, the R query (“>“ denotes the R command line prompt)

> ?? bootstrap

or

> help.search(’bootstrap‘)

results in

agce::resamp.std Compute the standard deviation by bootstrap.

alr3::boot.case Case bootstrap for regression models

analogue::RMSEP Root mean square error of prediction

analogue::bootstrap Bootstrap estimation and errors

analogue::bootstrap.waBootstrap estimation and errors for WA models

analogue::bootstrapObject Bootstrap object description

analogue::getK Extract and set the number of analogues

analogue::performance Transfer function model performance statistics

analogue::screeplot.mat Screeplots of model results

analogue::summary.bootstrap.mat Summarise bootstrap resampling for MAT models

animation::boot.iid Bootstrapping the i.i.d data

ape::boot.phylo Tree Bipartition and Bootstrapping Phylogenies

aplpack::slider.bootstrap.lm.plot interactive bootstapping for lm

bnlearn::bn.boot Parametric and nonparametric bootstrap of Bayesian networks

bnlearn::boot.strength Bootstrap arc strength and direction

boot::nested.corr Functions for Bootstrap Practicals

boot::boot Bootstrap Resampling

boot::boot.array Bootstrap Resampling Arrays

boot::boot.ci Nonparametric Bootstrap Confidence Intervals

boot::cd4.nested Nested Bootstrap of cd4 data

boot::censboot Bootstrap for Censored Data

boot::freq.array Bootstrap Frequency Arrays

boot::jack.after.boot Jackknife-after-Bootstrap Plots

boot::linear.approx Linear Approximation of Bootstrap Replicates

boot::plot.boot Plots of the Output of a Bootstrap Simulation

boot::print.boot Print a Summary of a Bootstrap Object

boot::print.bootci Print Bootstrap Confidence Intervals

boot::saddle Saddlepoint Approximations for Bootstrap Statistics

boot::saddle.distn Saddlepoint Distribution Approximations for Bootstrap Statistics

boot::tilt.boot Non-parametric Tilted Bootstrap

boot::tsboot Bootstrapping of Time Series

BootCL::BootCL.distribution Find the Bootstrap distribution

BootCL::BootCL.plot Display the bootstrap distribution and p-value

BootPR::BootAfterBootPI Bootstrap-after-Bootstrap Prediction

BootPR::BootBC Bootstrap bias-corrected estimation and forecasting for AR models

BootPR::BootPI Bootstrap prediction intevals and point forecasts with no bias-correction

BootPR::BootPR-package Bootstrap Prediction Intervals and Bias-Corrected Forecasting

BootPR::ShamanStine.PI Bootstrap prediction interval using Shaman and Stine bias formula

BootRes::bootRes-package The bootRes Package for Bootstrapped Response and Correlation Functions

BootRes::dendroclim Calculation of Bootstrapped response and correlation functions.

Bootspecdens::specdens Bootstrap for testing equality of spectral densities

BootStepAIC::boot.stepAIC Bootstraps the Stepwise Algorithm of stepAIC() for Choosing a Model by AIC

Bootstrap::bootpred Bootstrap Estimates of Prediction Error

Bootstrap::bootstrap Non-Parametric Bootstrapping

Bootstrap::boott Bootstrap-t Confidence Limits

Bootstrap::ctsub Internal functions of package bootstrap

Bootstrap::lutenhorm Luteinizing Hormone

Bootstrap::scor Open/Closed Book Examination Data

Bootstrap::spatial Spatial Test Data

BSagri::BOOTSimpsonD Simultaneous confidence intervals for Simpson indices

cfa::bcfa Bootstrap-CFA

ChainLadder::BootChainLadder Bootstrap-Chain-Ladder Model

CircStats::vm.bootstrap.ci Bootstrap-Confidence Intervals

circular::mle.vonmises.bootstrap.ci Bootstrap Confidence Intervals

clue::cl_boot Bootstrap Resampling of Clustering Algorithms

CORREP::cor.bootci Bootstrap Confidence Interval for Multivariate Correlation

Daim::Daim.data1 Data set: Artificial Bootstrap data for use with Daim

DCluster::achisq.boot Bootstrap replicates of Pearson’s Chi-square statistic

DCluster::besagnewell.boot Generate boostrap replicates of Besag and Newell‘s statistic

DCluster::gearyc.boot Generate bootstrap replicates of Moran‘s I autocorrelation statistic

DCluster::kullnagar.boot Generate bootstrap replicates of Kulldorff and Nagarwalla’s statistic

DCluster::moranI.boot Generate bootstrap replicates of Moran‘s I autocorrelation statistic

DCluster::pottwhitt.boot Bootstrap replicates of Potthoff-Whittinghill‘s statistic

DCluster::stone.boot Generate boostrap replicates of Stone‘s statistic

DCluster::tango.boot Generate bootstrap replicated of Tango‘s statistic

DCluster::whittermore.boot Generate

Bootstrap replicates of Whittermore’s statistic

degreenet::rplnmle Rounded Poisson Lognormal Modeling of Discrete Data

degreenet::bsdp Calculate Bootstrap Estimates and Confidence Intervals for the Discrete Pareto Distribution

degreenet::bsnb Calculate Bootstrap Estimates and Confidence Intervals for the Negative Binomial Distribution

degreenet::bspln Calculate Bootstrap Estimates and Confidence Intervals for the Poisson Lognormal Distribution

degreenet::bswar Calculate Bootstrap Estimates and Confidence Intervals for the Waring Distribution

degreenet::bsyule Calculate Bootstrap Estimates and Confidence Intervals for the Yule Distribution

degreenet::degreenet-internal Internal degreenet Objects

delt::eval.bagg Returns a bootstrap aggregation of adaptive histograms

delt::lstseq.bagg Calculates a scale of bootstrap aggregated histograms

depmix::depmix Fitting Dependent Mixture Models

Design::anova.Design Analysis of Variance (Wald and F Statistics)

Design::bootcov Bootstrap Covariance and Distribution for Regression Coefficients

Design::calibrate Resampling Model Calibration

Design::predab.resample Predictive Ability using Resampling

Design::rm.impute Imputation of Repeated Measures

Design::validate Resampling Validation of a Fitted Model’s Indexes of Fit

Design::validate.cph Validation of a Fitted Cox or Parametric Survival Model’s Indexes of Fit

Design::validate.lrm Resampling Validation of a Logistic Model

Design::validate.ols Validation of an Ordinary Linear Model

dynCorr::bootstrapCI Bootstrap Confidence Interval

dynCorr::dynCorrData An example dataset for use in the example calls in the help files for the dynamicCorrelation and bootstrapCI functions

e1071::bootstrap.lca Bootstrap Samples of LCA Results

eba::boot Bootstrap for Elimination-By-Aspects (EBA) Models

EffectiveDose::Boot.CI Bootstrap confidence intervals for ED levels

EffectiveDose::EffectiveDose-package Estimation of the Effective Dose including Bootstrap confindence intervals

el.convex::samp sample from bootstrap

equate::se.boot Bootstrap Standard Errors of Equating

equivalence::equiv.boot Regression-based TOST using bootstrap

extRemes::boot.sequence Bootstrap a sequence.

FactoMineR::simule Simulate by bootstrap

FGN::Boot Generic Bootstrap Function

FitAR::Boot Generic Bootstrap Function

FitAR::Boot.ts Parametric Time Series Bootstrap

fitdistrplus::bootdist Bootstrap simulation of uncertainty for non-censored data

fitdistrplus::bootdistcens Bootstrap simulation of uncertainty for censored data

flexclust::bootFlexclust Bootstrap Flexclust Algorithms

fossil::bootstrap Bootstrap Species Richness Estimator

fractal::surrogate Surrogate data generation

FRB::FRBmultiregGS GS-Estimates for multivariate regression with bootstrap confidence intervals

FRB::FRBmultiregMM MM-Estimates for Multivariate Regression with Bootstrap Inference

FRB::FRBmultiregS S-Estimates for Multivariate Regression with Bootstrap Inference

FRB::FRBpcaMM PCA based on Multivariate MM-estimators with Fast and Robust Bootstrap

FRB::FRBpcaS PCA based on Multivariate S-estimators with Fast and Robust Bootstrap

FRB::GSboot_multireg Fast and Robust Bootstrap for GS-Estimates

FRB::MMboot_loccov Fast and Robust Bootstrap for MM-estimates of Location and Covariance

FRB::MMboot_multireg Fast and Robust Bootstrap for MM-Estimates of Multivariate Regression

FRB::MMboot_twosample Fast and Robust Bootstrap for Two-Sample MM-estimates of Location and Covariance

FRB::Sboot_loccov Fast and Robust Bootstrap for S-estimates of location/covariance

FRB::Sboot_multireg Fast and Robust Bootstrap for S-Estimates of Multivariate Regression

FRB::Sboot_twosample Fast and Robust Bootstrap for Two-Sample S-estimates of Location and Covariance

ftsa::fbootstrap Bootstrap independent and identically distributed functional data

gmvalid::gm.boot.coco Graphical model validation using the bootstrap (CoCo).

gmvalid::gm.boot.mim Graphical model validation using the bootstrap (MIM)

gPdtest::gPd.test Bootstrap goodness-of-fit test for the generalized Pareto distribution

hierfstat::boot.vc Bootstrap confidence intervals for variance components

Hmisc::areg Additive Regression with Optimal Transformations on Both Sides using Canonical Variates

Hmisc::aregImpute Multiple Imputation using Additive Regression, Bootstrapping, and Predictive Mean Matching

Hmisc::bootkm Bootstrap Kaplan-Meier Estimates

Hmisc::find.matches Find Close Matches

Hmisc::rm.boot Bootstrap Repeated Measurements Model

Hmisc::smean.cl.normal Compute Summary Statistics on a Vector

Hmisc::transace Additive Regression and Transformations using ace or avas

Hmisc::transcan Transformations/Imputations using Canonical Variates

homtest::HOMTESTS Homogeneity tests

hopach::boot2fuzzy function to write MapleTree files for viewing bootstrap estimated cluster membership probabilities based on hopach clustering results

hopach::bootplot function to make a barplot of bootstrap estimated cluster membership probabilities

hopach::boothopach functions to perform non-parametric bootstrap resampling of hopach clustering results

ICEinfer::ICEcolor Compute Preference Colors for Outcomes in a Bootstrap ICE Scatter within a Confidence Wedge

ICEinfer::ICEuncrt Compute Bootstrap Distribution of ICE Uncertainty for given Shadow Price of Health, lambda

ICEinfer::plot.ICEcolor Add Economic Preference Colors to Bootstrap Uncertainty Scatters within a Confidence Wedge

ICEinfer::plot.ICEuncrt Display Scatter for a possibly Transformed Bootstrap Distribution of ICE Uncertainty

ICEinfer::print.ICEuncrt Summary Statistics for a possibly Transformed Bootstrap Distribution of ICE Uncertainty

ipred::bootest Bootstrap Error Rate Estimators

maanova::consensus Build consensus tree out of bootstrap cluster result

Matching::ks.boot Bootstrap Kolmogorov-Smirnov

MBESS::ci.reliability.bs Bootstrap the confidence interval for reliability coefficient

MCE::RProj The bootstrap-then-group implementation of the Bootstrap Grouping Prediction Plot for estimating R.

MCE::groupbootMCE The group-then-bootstrap implementation of the Bootstrap Grouping Prediction Plot for estimating MCE

MCE::groupbootR The group-then-bootstrap implementation of the Bootstrap Grouping Prediction Plot for estimating R

MCE::jackafterboot Jackknife-After-Bootstrap Method of MCE estimation

MCE::mceBoot Bootstrap-After-Bootstrap estimate of MCE

MCE::mceProj The bootstrap-then-group implementation of the Bootstrap Grouping Prediction Plot for estimating MCE.

meboot::meboot Generate Maximum Entropy Bootstrapped Time Series Ensemble

meboot::meboot.default Generate Maximum Entropy Bootstrapped Time Series Ensemble

meboot::meboot.pdata.frame Maximum Entropy Bootstrap for Panel Time Series Data

meifly::lmboot Bootstrap linear models

mixreg::bootcomp Perform a bootstrap test for the number of components in a mixture of regressions.

mixstock::genboot Generate bootstrap estimates of mixed stock analyses

mixstock::mixstock.boot Bootstrap samples of mixed stock analysis data

mixtools::boot.comp Performs Parametric Bootstrap for Sequentially Testing the Number of Components in Various Mixture Models

mixtools::boot.se Performs Parametric Bootstrap for Standard Error Approximation

MLDS::simu.6pt Perform Bootstrap Test on 6-point Likelihood for MLDS FIT

MLDS::summary.mlds.bt Method to Extract Bootstrap Values for MLDS Scale Values

msm::boot.msm Bootstrap resampling for multi-state models

mstate::msboot Bootstrap function in multi-state models

multtest::boot.null Non-parametric bootstrap resampling function in package ’multtest’

ncf::mSynch the mean (cross-)correlation (with bootstrap CI) for a panel of spatiotemporal data

nFactors::eigenBootParallel Bootstrapping of the Eigenvalues From a Data Frame

nlstools::nlsBoot Bootstrap resampling

np::b.star Compute Optimal Block Length for Stationary and Circular Bootstrap

nsRFA::HOMTESTS Homogeneity tests

Oncotree::bootstrap.oncotree Bootstrap an oncogenetic tree to assess stability

ouch::browntree Fitted phylogenetic Brownian motion model

ouch::hansentree-methods Methods of the ”hansentree” class

pARccs::Boot_CI Bootstrap confidence intervals for (partial) attributable risks (AR and PAR) from case-control data

PCS::PdCSGt.bootstrap.NP2 Non-parametric Bootstrap for computing G-best and d-best PCS

PCS::PdofCSGt.bootstrap5 Parametric bootstrap for computing G-best and d-best PCS

PCS::PofCSLt.bootstrap5 Parametric bootstrap for computing L-best PCS

peperr::complexity.ipec.CoxBoost Interface function for complexity selection for CoxBoost via integrated prediction error curve and the bootstrap

peperr::complexity.ipec.rsf_mtry Interface function for complexity selection for random survival forest via integrated prediction error curve and the bootstrap

pgirmess::difshannonbio Empirical confidence interval of the bootstrap of the difference between two Shannon indices

pgirmess::piankabioboot Bootstrap Pianka’s index

pgirmess::shannonbioboot Boostrap Shannon’s and equitability indices

phangorn::bootstrap.pml Bootstrap

phybase::bootstrap Bootstrap sequences

phybase::bootstrap.mulgene Bootstrap sequences from multiple loci

popbio::boot.transitions Bootstrap observed census transitions

popbio::countCDFxt Count-based extinction probabilities and bootstrap confidence intervals

prabclus::abundtest Parametric bootstrap test for clustering in abundance matrices

prabclus::prabtest Parametric bootstrap test for clustering in presence-absence matrices

pvclust::msfit Curve Fitting for Multiscale Bootstrap Resampling

qgen::dis Bootstrap confidence intervals

qpcR::calib2 Calculation of qPCR efficiency by dilution curve analysis and bootstrapping of dilution curve replicates

qpcR::pcrboot Bootstrapping and jackknifing qPCR data

qtl::plot.scanoneboot Plot results of bootstrap for QTL position

qtl::scanoneboot Bootstrap to get interval estimate of QTL location

qtl::summary.scanoneboot Bootstrap confidence interval for QTL location

QuantPsyc::distInd.ef Complex Mediation for use in Bootstrapping

QuantPsyc::proxInd.ef Simple Mediation for use in Bootstrapping

quantreg::boot.crq Bootstrapping Censored Quantile Regression

quantreg::boot.rq Bootstrapping Quantile Regression

r4ss::SS_splitdat Split apart bootstrap data to make input file.

relaimpo::boot.relimp Functions to Bootstrap Relative Importance Metrics

ResearchMethods::bootSequence A demonstration of how bootstrapping works, taking multiple bootstrap samples and watching how the means of those samples begin to normalize.

ResearchMethods::bootSingle A demonstration of how bootstrapping works step by step for one function.

rms::anova.rms Analysis of Variance (Wald and F Statistics)

rms::bootcov Bootstrap Covariance and Distribution for Regression Coefficients

rms::calibrate Resampling Model Calibration

rms::predab.resample Predictive Ability using Resampling

rms::validate Resampling Validation of a Fitted Model’s Indexes of Fit

rms::validate.cph Validation of a Fitted Cox or Parametric Survival Model’s Indexes of Fit

rms::validate.lrm Resampling Validation of a Logistic Model

rms::validate.ols Validation of an Ordinary Linear Model

robust::rb Robust Bootstrap Standard Errors

rqmcmb2::rqmcmb Markov Chain Marginal Bootstrap for Quantile Regression

sac::BootsChapt Bootstrap (Permutation) Test of Change-Point(s) with One-Change or Epidemic Alternative

sac::BootsModelTest Bootstrap Test of the Validity of the Semiparametric Change-Point Model

SAFD::btest.mean One-sample bootstrap test for the mean of a FRV

SAFD::btest2.mean Two-sample bootstrap test on the equality of mean of two FRVs

SAFD::btestk.mean Multi-sample bootstrap test for the equality of the mean of FRVs

scaleboot::sboptions Options for Multiscale Bootstrap

scaleboot::plot.scaleboot Plot Diagnostics for Multiscale Bootstrap