E-Book
129,99 €

Understanding Computational Bayesian Statistics E-Book

William M. Bolstad

0,0

129,99 €

oder

Leseprobe lesen

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Serie: Wiley Series in Computational Statistics
Sprache: Englisch

Beschreibung

A hands-on introduction to computational statistics from a Bayesian point of view

Providing a solid grounding in statistics while uniquely covering the topics from a Bayesian perspective, Understanding Computational Bayesian Statistics successfully guides readers through this new, cutting-edge approach. With its hands-on treatment of the topic, the book shows how samples can be drawn from the posterior distribution when the formula giving its shape is all that is known, and how Bayesian inferences can be based on these samples from the posterior. These ideas are illustrated on common statistical models, including the multiple linear regression model, the hierarchical mean model, the logistic regression model, and the proportional hazards model.

The book begins with an outline of the similarities and differences between Bayesian and the likelihood approaches to statistics. Subsequent chapters present key techniques for using computer software to draw Monte Carlo samples from the incompletely known posterior distribution and performing the Bayesian inference calculated from these samples. Topics of coverage include:

Direct ways to draw a random sample from the posterior by reshaping a random sample drawn from an easily sampled starting distribution
The distributions from the one-dimensional exponential family
Markov chains and their long-run behavior
The Metropolis-Hastings algorithm
Gibbs sampling algorithm and methods for speeding up convergence
Markov chain Monte Carlo sampling

Using numerous graphs and diagrams, the author emphasizes a step-by-step approach to computational Bayesian statistics. At each step, important aspects of application are detailed, such as how to choose a prior for logistic regression model, the Poisson regression model, and the proportional hazards model. A related Web site houses R functions and Minitab macros for Bayesian analysis and Monte Carlo simulations, and detailed appendices in the book guide readers through the use of these software packages.

Understanding Computational Bayesian Statistics is an excellent book for courses on computational statistics at the upper-level undergraduate and graduate levels. It is also a valuable reference for researchers and practitioners who use computer programs to conduct statistical analyses of data and solve problems in their everyday work.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 468

Veröffentlichungsjahr: 2011

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Contents

Preface

1 Introduction to Bayesian Statistics

1.1 THE FREQUENTIST APPROACH TO STATISTICS

1.2 THE BAYESIAN APPROACH TO STATISTICS

1.3 COMPARING LIKELIHOOD AND BAYESIAN APPROACHES TO STATISTICS

1.4 COMPUTATIONAL BAYESIAN STATISTICS

1.5 PURPOSE AND ORGANIZATION OF THIS BOOK

2 Monte Carlo Sampling from the Posterior

2.1 ACCEPTANCE-REJECTION-SAMPLING

2.2 SAMPLING-IMPORTANCE-RESAMPLING

2.3 ADAPTIVE-REJECTION-SAMPLING FROM A LOG-CONCAVE DISTRIBUTION

2.4 WHY DIRECT METHODS ARE INEFFICIENT FOR HIGH-DIMENSION PARAMETER SPACE

3 Bayesian Inference

3.1 BAYESIAN INFERENCE FROM THE NUMERICAL POSTERIOR

3.2 BAYESIAN INFERENCE FROM POSTERIOR RANDOM SAMPLE

4 Bayesian Statistics Using Conjugate Priors

4.1 ONE-DIMENSIONAL EXPONENTIAL FAMILY OF DENSITIES

4.2 DISTRIBUTIONS FOR COUNT DATA

4.3 DISTRIBUTIONS FOR WAITING TIMES

4.4 NORMALLY DISTRIBUTED OBSERVATIONS WITH KNOWN VARIANCE

4.5 NORMALLY DISTRIBUTED OBSERVATIONS WITH KNOWN MEAN

4.6 NORMALLY DISTRIBUTED OBSERVATIONS WITH UNKNOWN MEAN AND VARIANCE

4.7 MULTIVARIATE NORMAL OBSERVATIONS WITH KNOWN COVARIANCE MATRIX

4.8 OBSERVATIONS FROM NORMAL LINEAR REGRESSION MODEL

Appendix: Proof of Poisson Process Theorem

5 Markov Chains

5.1 STOCHASTIC PROCESSES

5.2 MARKOV CHAINS

5.3 TIME-INVARIANT MARKOV CHAINS WITH FINITE STATE SPACE

5.4 CLASSIFICATION OF STATES OF A MARKOV CHAIN

5.5 SAMPLING FROM A MARKOV CHAIN

5.6 TIME-REVERSIBLE MARKOV CHAINS AND DETAILED BALANCE

5.7 MARKOV CHAINS WITH CONTINUOUS STATE SPACE

6 Markov Chain Monte Carlo Sampling from Posterior

6.1 METROPOLIS-HASTINGS ALGORITHM FOR A SINGLE PARAMETER

6.2 METROPOLIS-HASTINGS ALGORITHM FOR MULTIPLE PARAMETERS

6.3 BLOCKWISE METROPOLIS-HASTINGS ALGORITHM

6.4 GIBBS SAMPLING

6.5 SUMMARY

7 Statistical Inference from a Markov Chain Monte Carlo Sample

7.1 MIXING PROPERTIES OF THE CHAIN

7.2 FINDING A HEAVY-TAILED MATCHED CURVATURE CANDIDATE DENSITY

7.3 OBTAINING AN APPROXIMATE RANDOM SAMPLE FOR INFERENCE

Appendix: Procedure for Finding the Matched Curvature Candidate Density for a Multivariate Parameter

8 Logistic Regression

8.1 LOGISTIC REGRESSION MODEL

8.2 COMPUTATIONAL BAYESIAN APPROACH TO THE LOGISTIC REGRESSION MODEL

8.3 MODELLING WITH THE MULTIPLE LOGISTIC REGRESSION MODEL

9 Poisson Regression and Proportional Hazards Model

9.1 POISSON REGRESSION MODEL

9.2 COMPUTATIONAL APPROACH TO POISSON REGRESSION MODEL

9.3 THE PROPORTIONAL HAZARDS MODEL

9.4 COMPUTATIONAL BAYESIAN APPROACH TO PROPORTIONAL HAZARDS MODEL

10 GIBBS SAMPLING AND HIERARCHICAL MODELS

10.1 GIBBS SAMPLING PROCEDURE

10.2 THE GIBBS SAMPLER FOR THE NORMAL DISTRIBUTION

10.3 HIERARCHICAL MODELS AND GIBBS SAMPLING

10.4 MODELLING RELATED POPULATIONS WITH HIERARCHICAL MODELS

Appendix: Proof That Improper Jeffrey's Prior Distribution for the Hypervariance Can Lead to an Improper Posterior

11 Going Forward with Markov Chain Monte Carlo

A Using the Included Minitab Macros

B Using the Included R Functions

References

WILEY SERIES IN COMPUTATIONAL STATISTICS

Index

WILEY SERIES IN COMPUTATIONAL STATISTICS

Consulting Editors:

Paolo GiudiciUniversity ofPavia, Italy

Geof H. GivensColorado State University, USA

Bani K. MallickTexas A&M University, USA

Wiley Series in Computational Statistics is comprised of practical guides and cutting edge research books on new developments in computational statistics. It features quality authors with a strong applications focus. The texts in the series provide detailed coverage of statistical concepts, methods and case studies in areas at the interface of statistics, computing, and numerics.

With sound motivation and a wealth of practical examples, the books show in concrete terms how to select and to use appropriate ranges of statistical computing techniques in particular fields of study. Readers are assumed to have a basic understanding of introductory terminology.

The series concentrates on applications of computational methods in statistics to fields of bioinformatics, genomics, epidemiology, business, engineering, finance and applied statistics.

A complete list of titles in this series appears at the end of the volume.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Bolstad, William M., 1943–Understanding Computational Bayesian statistics / William M. Bolstad.p. cm.Includes bibliographical references and index.ISBN 978-0-470-04609-8 (cloth)1. Bayesian statistical decision theory—Data processing. I. Title.QA279.5.B649 2010519.5′42--dc222009025219

This book is dedicated to

Sylvie,

Ben, Rachel,

Mary, and Elizabeth

Preface

In theory, Bayesian statistics is very simple. The posterior is proportional to the prior times likelihood. This gives the shape of the posterior, but it is not a density so it cannot be used for inference. The exact scale factor needed to make this a density can be found only in a few special cases. For other cases, the scale factor requires a numerical integration, which may be difficult when there are multiple parameters. So in practice, Bayesian statistics is more difficult, and this has held back its use for applied statistical problems.

Computational Bayesian statistics has changed all this. It is based on the big idea that statistical inferences can be based on a random sample drawn from the posterior. The algorithms that are used allow us to draw samples from the exact posterior even when we only know its shape and we do not know the scale factor needed to make it an exact density. These algorithms include direct methods where a random sample drawn from an easily sampled distribution is reshaped by only accepting some candidate values into the final sample. More sophisticated algorithms are based on setting up a Markov chain that has the posterior as its long-run distribution. When the chain is allowed to run a sufficiently long time, a draw from the chain can be considered a random draw from the target (posterior) distribution. These algorithms are particularly well suited for complicated models with many parameters. This is revolutionizing applied statistics. Now applied statistics based on these computational Bayesian methods can be easily accomplished in practice.

Features of the text

This text grew out of a course I developed at Waikato University. My goal for that course and this text is to bring these exciting developments to upper-level undergraduate and first-year graduate students of statistics. This text introduces this big idea to students in such a way that they can develop a strategy for making statistical inferences in this way. This requires an understanding of the pitfalls that can arise when using this approach, what can be done to avoid them, and how to recognize them if they are occurring. The practitioner has many choices to make in using this approach. Poor choices will lead to incorrect inferences. Sensible choices will lead to satisfactory inferences in an efficient manner.

This text follows a step-by-step development. In Chapter 1 we learn about the similarities and differences between the Bayesian and the likelihood approaches to statistics. This is important because when a flat prior is used, the posterior has the same shape as the likelihood function, yet they have different methods for inferences. The Bayesian approach allows us to interpret the posterior as a probability density and it is this interpretation that leads to the advantages of this approach. In Chapter 2 we examine direct approaches to drawing a random sample from the posterior even when we only know its shape by reshaping a random sample drawn from another easily sampled density by only accepting some of the candidates into the final sample. These methods are satisfactory for models with only a few parameters provided the candidate density has heavier tails than the target. For models with many parameters direct methods become very inefficient. In these models, direct methods still may have a role as a small step in a larger Markov chain Monte carlo algorithm. In Chapter 3 we show how statistical inferences can be made from a random sample from the posterior in a completely analogous way to the corresponding inferences taken from a numerically calculated posterior. In Chapter 4 we study the distributions from the one-dimensional exponential family. When the observations come from a member of this family, and the prior is from the conjugate family, then the posterior will be another member of the conjugate family. It can easily be found by simple updating rules. We also look at the normal distribution with unknown mean and variance, which is a member of two-dimensional exponential family, and the multivariate normal and normal regression models. These exponential family cases are the only cases where the formula for the posterior can be found analytically. Before the development of computing, Bayesian statistics could only be done in practice in these few cases. We will use these as steps in a larger model. In Chapter 5 we introduce Markov chains. An understanding of Markov chains and their long-run behavior is needed before we study the more advanced algorithms in the book. Things that can happen in a Markov chain can also happen in a Markov chain Monte Carlo (MCMC) model. This chapter finishes with the Metropolis algorithm. This algorithm allows us take a Markov chain and find a new Markov chain from it that will have the target (posterior) as its long-run distribution. In Chapter 6 we introduce the Metropolis-Hastings algorithm and show that how it performs depends on whether we use a random-walk or independent candidate density. We show how, in a multivariate case, we can either draw all the parameters at once, or blockwise, and that the Gibbs sampler is a special case of blockwise Metropolis-Hastings. In Chapter 7 we investigate how the mixing properties of the chain depend on the choice of the candidate density. We show how to find a heavy-tailed candidate density starting from the maximum likelihood estimator and matched curvature covariance matrix. We show that this will lead to a very efficient MCMC process. We investigate several methods for deciding on burn-in time and thinning required to get an approximately random sample from the posterior density from the MCMC output as the basis for inference. In Chapter 8 we apply this to the logistic regression model. This is a generalized linear model, and we find the maximum likelihood estimator and matched curvature covariance matrix using iteratively reweighted least squares. In the cases where we have a normal prior, we can find the approximate normal posterior by the simple updating rules we studied in Chapter 4. We use the Student's t equivalent as the heavy-tailed independent candidate density for the Metropolis-Hastings algorithm. After burn-in, a draw from the Markov chain will be random draw from the exact posterior, not the normal approximation. We discuss how to determine priors for this model. We also investigate strategies to remove variables from the model to get a better prediction model. In Chapter 9, we apply these same ideas to the Poisson regression model. The Proportional hazards model turns out to have the same likelihood as a Poisson, so these ideas apply here as well. In Chapter 10 we investigate the Gibbs sampling algorithm. We demonstrate it on the normal(μ, σ2) model where both parameters are unknown for both the independent prior case and the joint conjugate prior case. We see the Gibbs sampler is particularly well suited when we have a hierarchical model. In that case, we can draw a directed acyclic graph showing the dependency structure of the parameters. The conditional distribution of each block of parameters given all other blocks has a particularly easy form. In Chapter 11, we discus methods for speeding up convergence in Gibbs sampling. We also direct the reader to more advanced topics that are beyond the scope of the text.

Software

I have developed Minitab macros that perform the computational methods shown in the text. My colleague, Dr. James Curran has written corresponding R-Functions. These may be downloaded from the following website.

http://www.stats.waikato.ac.nz/publications/bolstad/UnderstandingComputationalBayesianStatistics/

Acknowledgments

I would like to acknowledge the help I have had from many people. First, my students over the past three years, whose enthusiasm with the early drafts encouraged me to continue writing. My colleague, Dr. James Curran, who wrote the R-functions and wrote Appendix B on how to implement them, has made a major contribution to this book. I want to thank Dr. Gerry Devlin, the Clinical director of Cardiology at the Waikato Hospital for letting me use the data from the Waikato District Health Board Cardiac Survival Study, Dr. Neil Swanson and Gaelle Dutu for discussing this dataset with me, and my student Yihong Zhang, who assisted me on this study. I also want to thank Dr. David Fergusson from the Dept. of Psychological Medicine at the Christchurch School of Medicine and Health Sciences for letting me use the circumcision data from the longitudinal study of a birth cohort. I want to thank my colleagues at the University of Waikato, Dr Murray Jorgensen, Dr. Judi McWhirter, Dr. Lyn Hunt, Dr. Kevin Broughan, and my former student Dr. Jason Catchpole, who all proofread parts of the manuscript. I appreciated their helpful comments, and any errors that remain are solely my responsibility. I would like to thank Cathy Akritas at Minitab for her help in improving my Minitab macros. I would like to thank Steve Quigley, Jackie Palmieri, Melissa Yanuzzi, and the team at John Wiley & Sons for their support, as well as Amy Hendrickson of TjnXnology, Inc. for help with LaTex.

Finally, last but not least, I wish to thank my wife Sylvie for her constant love and support.

WILLIAM M. "BILL" BOLSTAD

Hamilton, New Zealand

Introduction to Bayesian Statistics

In the last few years the use of Bayesian methods in the practice of applied statistics has greatly increased. In this book we will show how the development of computational Bayesian statistics is the key to this major change in statistics. For most of the twentieth century, frequentist statistical methods dominated the practice of applied statistics. This is despite the fact that statisticians have long known that the Bayesian approach to statistics offered clear cut advantages over the frequentist approach. We will see that Bayesian solutions are easy in theory, but were difficult in practice. It is easy to find a formula giving the shape of the posterior. It is often more difficult to find the formula of the exact posterior density. Computational Bayesian statistics changed all this. These methods use algorithms to draw samples from the incompletely known posterior and use these random samples as the basis for inference. In Section 1.1 we will look briefly at the the ideas of the frequentist approach to statistics. In Section 1.2 we will introduce the ideas of Bayesian statistics. In Section 1.3 we show the similarities and differences between the likelihood approach to inference and Bayesian inference. We will see that the different interpretations of the parameters and probabilities lead to the advantages of Bayesian statistics.

1.1 THE FREQUENTIST APPROACH TO STATISTICS

In frequentist statistics, the parameter is considered a fixed but unknown value. The sample space is the set of all possible observation values. Probability is interpreted as long-run relative frequency over all values in the sample space given the unknown parameter. The performance of any statistical procedure is determined by averaging over the sample space. This can be done prior to the experiment and does not depend on the data.

There were two main sources of frequentist ideas. R. A. Fisher developed a theory of statistical inference based on the likelihood function. It has the same formula as the joint density of the sample, however, the observations are held fixed at the values that occurred and the parameter(s) are allowed to vary over all possible values. He reduced the complexity of the data through the use of sufficient statistics which contain all the relevant information about the parameter(s). He developed the theory of maximum likelihood estimators (MLE) and found their asymptotic distributions. He measured the efficiency of an estimator using the Fisher information, which gives the amount of information available in a single observation. His theory dealt with nuisance parameters by conditioning on an ancillary statistic when one is available. Other topics associated with him include analysis of variance, randomization, significance tests, permutation tests, and fiducial intervals. Fisher himself was a scientist as well as a statistician, making great contributions to genetics as well as to the design of experiments and statistical inference. As a scientist, his views on inference are in tune with science. Occam's razor requires that the simplest explanation (chance) must be ruled out before an alternative explanation is sought. Significance testing where implausibility of the chance model is required before accepting the alternative closely matches this view.

Lesen Sie weiter in der vollständigen Ausgabe!

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben: