E-Book
125,99 €

Introduction to Bayesian Statistics E-Book

William M. Bolstad

0,0

125,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

"...this edition is useful and effective in teaching Bayesian inference at both elementary and intermediate levels. It is a well-written book on elementary Bayesian inference, and the material is easily accessible. It is both concise and timely, and provides a good collection of overviews and reviews of important tools used in Bayesian statistical methods." There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian statistics. The authors continue to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inference for discrete random variables, binomial proportions, Poisson, and normal means, and simple linear regression. In addition, more advanced topics in the field are presented in four new chapters: Bayesian inference for a normal with unknown mean and variance; Bayesian inference for a Multivariate Normal mean vector; Bayesian inference for the Multiple Linear Regression Model; and Computational Bayesian Statistics including Markov Chain Monte Carlo. The inclusion of these topics will facilitate readers' ability to advance from a minimal understanding of Statistics to the ability to tackle topics in more applied, advanced level books. Minitab macros and R functions are available on the book's related website to assist with chapter exercises. Introduction to Bayesian Statistics, Third Edition also features: * Topics including the Joint Likelihood function and inference using independent Jeffreys priors and join conjugate prior * The cutting-edge topic of computational Bayesian Statistics in a new chapter, with a unique focus on Markov Chain Monte Carlo methods * Exercises throughout the book that have been updated to reflect new applications and the latest software applications * Detailed appendices that guide readers through the use of R and Minitab software for Bayesian analysis and Monte Carlo simulations, with all related macros available on the book's website Introduction to Bayesian Statistics, Third Edition is a textbook for upper-undergraduate or first-year graduate level courses on introductory statistics course with a Bayesian emphasis. It can also be used as a reference work for statisticians who require a working knowledge of Bayesian statistics.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 892

Veröffentlichungsjahr: 2016

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

INTRODUCTION TO BAYESIAN STATISTICS

Third Edition

WILLIAM M. BOLSTADJAMES M. CURRAN

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

ISBN 978-1-118-09315-8

This book is dedicated to Sylvie,Ben, Rachel,Emily, Mary, and Elizabeth

Preface

Chapter 1 Introduction to Statistical Science

1.1 The Scientific Method:honey A Process for Learning

1.2 The Role of Statistics in the Scientific Method

1.3 Main Approaches to Statistics

1.4 Purpose and Organization of This Text

Chapter 2 Scientific Data Gathering

2.1 Sampling from a Real Population

2.2 Observational Studies and Designed Experiments

Chapter 3 Displaying and Summarizing Data

3.1 Graphically Displaying a Single Variable

3.2 Graphically Comparing Two Samples

3.3 Measures of Location

3.4 Measures of Spread

3.5 Displaying Relationships Between Two or More Variables

3.6 Measures of Association for Two or More Variables

Exercises

Chapter 4 Logic, Probability, and Uncertainty

4.1 Deductive Logic and Plausible Reasoning

4.2 Probability

4.3 Axioms of Probability

4.4 Joint Probability and Independent Events

4.5 Conditional Probability

4.6 Bayes’ Theorem

4.7 Assigning Probabilities

4.8 Odds and Bayes Factor

4.9 Beat the Dealer

Exercises

Chapter 5 Discrete Random Variables

5.1 Discrete Random Variables

5.2 Probability Distribution of a Discrete Random Variable

5.3 Binomial Distribution

5.4 Hypergeometric Distribution

5.5 Poisson Distribution

5.6 Joint Random Variables

5.7 Conditional Probability for Joint Random Variables

Exercises

Chapter 6 Bayesian Inference for Discrete Random Variables

6.1 Two Equivalent Ways of Using Bayes’ Theorem

6.2 Bayes’ Theorem for Binomial with Discrete Prior

6.3 Important Consequences of Bayes’ Theorem

6.4 Bayes’ Theorem for Poisson with Discrete Prior

Exercises

Computer Exercises

Chapter 7 Continuous Random Variables

7.1 Probability Density Function

7.2 Some Continuous Distributions

7.3 Joint Continuous Random Variables

7.4 Joint Continuous and Discrete Random Variables

Exercises

Chapter 8 Bayesian Inference for Binomial Proportion

8.1 Using a Uniform Prior

8.2 Using a Beta Prior

8.3 Choosing Your Prior

8.4 Summarizing the Posterior Distribution

8.5 Estimating the Proportion

8.6 Bayesian Credible Interval

Exercises

Computer Exercises

Chapter 9 Comparing Bayesian and Frequentist Inferences for Proportion

9.1 Frequentist Interpretation of Probability and Parameters

9.2 Point Estimation

9.3 Comparing Estimators for Proportion

9.4 Interval Estimation

9.5 Hypothesis Testing

9.6 Testing a One-Sided Hypothesis

9.7 Testing a Two-Sided Hypothesis

Exercises

Monte Carlo Exercises

Chapter 10 Bayesian Inference for Poisson

10.1 Some Prior Distributions for Poisson

10.2 Inference for Poisson Parameter

Exercises

Computer Exercises

Chapter 11 Bayesian Inference for Normal Mean

11.1 Bayes’ Theorem for Normal Mean with a Discrete Prior

11.2 Bayes’ Theorem for Normal Mean with a Continuous Prior

11.3 Choosing Your Normal Prior

11.4 Bayesian Credible Interval for Normal Mean

11.5 Predictive Density for Next Observation

Exercises

Computer Exercises

Chapter 12 Comparing Bayesian and Frequentist Inferences for Mean

12.1 Comparing Frequentist and Bayesian Point Estimators

12.2 Comparing Confidence and Credible Intervals for Mean

12.3 Testing a One-Sided Hypothesis about a Normal Mean

12.4 Testing a Two-Sided Hypothesis about a Normal Mean

Exercises

Chapter 13 Bayesian Inference for Difference Between Means

13.1 Independent Random Samples from Two Normal Distributions

13.2 Case 1:honey Equal Variances

13.3 Case 2:honey Unequal Variances

13.4 Bayesian Inference for Difference Between Two Proportions Using Normal Approximation

13.5 Normal Random Samples from Paired Experiments

Exercises

Chapter 14 Bayesian Inference for Simple Linear Regression

14.1 Least Squares Regression

14.2 Exponential Growth Model

14.3 Simple Linear Regression Assumptions

14.4 Bayes’ Theorem for the Regression Model

14.5 Predictive Distribution for Future Observation

Exercises

Computer Exercises

Chapter 15 Bayesian Inference for Standard Deviation

15.1 Bayes’ Theorem for Normal Variance with a Continuous Prior

15.2 Some Specific Prior Distributions and the Resulting Posteriors

15.3 Bayesian Inference for Normal Standard Deviation

Exercises

Computer Exercises

Chapter 16 Robust Bayesian Methods

16.1 Effect of Misspecified Prior

16.2 Bayes’ Theorem with Mixture Priors

Exercises

Computer Exercises

Chapter 17 Bayesian Inference for Normal with Unknown Mean and Variance

17.1 The Joint Likelihood Function

17.2 Finding the Posterior when Independent Jeffreys’ Priors for

and

Are Used

17.3 Finding the Posterior when a Joint Conjugate Prior for

and σ

Is Used

17.4 Difference Between Normal Means with Equal Unknown Variance

17.5 Difference Between Normal Means with Unequal Unknown Variances

Computer Exercises

Appendix:honey Proof that the Exact Marginal Posterior Distribution of

Student’s t

Chapter 18 Bayesian Inference for Multivariate Normal Mean Vector

18.1 Bivariate Normal Density

18.2 Multivariate Normal Distribution

18.3 The Posterior Distribution of the Multivariate Normal Mean Vector when Covariance Matrix Is Known

18.4 Credible Region for Multivariate Normal Mean Vector when Covariance Matrix Is Known

18.5 Multivariate Normal Distribution with Unknown Covariance Matrix

Computer Exercises

Chapter 19 Bayesian Inference for the Multiple Linear Regression Model

19.1 Least Squares Regression for Multiple Linear Regression Model

19.2 Assumptions of Normal Multiple Linear Regression Model

19.3 Bayes’ Theorem for Normal Multiple Linear Regression Model

19.4 Inference in the Multivariate Normal Linear Regression Model

19.5 The Predictive Distribution for a Future Observation

Computer Exercises

Chapter 20 Computational Bayesian Statistics Including Markov Chain Monte Carlo

20.1 Direct Methods for Sampling from the Posterior

20.2 Sampling Importance Resampling

20.3 Markov Chain Monte Carlo Methods

20.4 Slice Sampling

20.5 Inference from a Posterior Random Sample

20.6 Where to Next?

A Introduction to Calculus

B Use of Statistical Tables

C Using the Included Minitab Macros

D Using the Included R Functions

E Answers to Selected Exercises

References

Index

EULA

List of Tables

Chapter 3

Table 3.1

Table 3.2

Table 3.3

Table 3.4

Chapter 5

Table 5.1

Table 5.2

Table 5.3

Table 5.4

Table 5.5

Table 5.6

Chapter 6

Table 6.1

Table 6.2

Table 6.3

Table 6.4

Table 6.5

Table 6.6

Table 6.7

Table 6.8

Table 6.9

Table 6.10

Table 6.11

Table 6.12

Table 6.13

Table 6.14

Table 6.15

Table 6.16

Chapter 8

Table 8.1

Table 8.2

Table 8.3

Table 9.1

Table 9.2

Chapter 10

Table 10.1

Table 10.2

Table 10.3

Table 10.4

Chapter 11

Table 11.1

Table 11.2

Table 11.3

Table 11.4

Table 11.5

Chapter 12

Table 12.1

Chapter 13

Table 13.1

Table 13.2

Table 13.3

Chapter 14

Table 14.1

Table 14.2

Chapter 15

Table 15.1

Table 15.2

Table 15.3

Table 15.4

Chapter 17

Table 17.1

Table 17.2

Chapter 19

Table 19.1

Chapter 20

Table 20.1

Table 20.2

Table 20.3

Table 20.4

Table 20.5

Table B.1

Table B.2

Table B.3

Table B.4

Table B.5

Table B.6

Table C.1

Table C.2

Table C.3

Table C.4

Table C.5

Table C.6

Table C.7

Table C.8

Table C.9

Table C.10

Table C.11

Table C.12

Table C.13

Table C.14

Table C.15

Table C.16

Table C.17

Table C.18

Table C.19

Table C.20

Table C.21

Table C.22

Table C.23

Table C.24

Table C.25

Table D.1

Table D.2

Table D.3

Table D.4

Guide

Cover

Contents

Pages

xiii

xiv

xvi

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

595

592

593

594

595

596

597

598

599

600

601

PREFACE

Our original goal for this book was to introduce Bayesian statistics at the earliest possible stage to students with a reasonable mathematical background. This entailed coverage of a similar range of topics as an introductory statistics text, but from a Bayesian perspective. The emphasis is on statistical inference. We wanted to show how Bayesian methods can be used for inference and how they compare favorably with the frequentist alternatives. This book is meant to be a good place to start the study of Bayesian statistics. From the many positive comments we have received from many users, we think the book succeeded in its goal. A course based on this goal would include Chapters 1-14.

Our feedback also showed that many users were taking up the book at a more intermediate level instead of the introductory level original envisaged. The topics covered in Chapters 2 and 3 would be old hat for these users, so we would have to include some more advanced material to cater for the needs of that group. The second edition aimed to meet this new goal as well as the original goal. We included more models, mainly with a single parameter. Nuisance parameters were dealt with using approximations. A course based on this goal would include Chapters 4-16.

Changes in the Third Edition

Later feedback showed that some readers with stronger mathematical and statistical background wanted the text to include more details on how to deal with multi-parameter models. The third edition contains four new chapters to satisfy this additional goal, along with some minor rewriting of the existing chapters. Chapter 17 covers Bayesian inference for Normal observations where we do not know either the mean or the variance. This chapter extends the ideas in Chapter 11, and also discusses the two sample case, which in turn allows the reader to consider inference on the difference between two means. Chapter 18 introduces the Multivariate Normal distribution, which we need in order to discuss multiple linear regression in Chapter 19. Finally, Chapter 20 takes the user beyond the kind of conjugate analysis is considered in most of the book, and into the realm of computational Bayesian inference. The covered topics in Chapter 20 have an intentional light touch, but still give the user valuable information and skills that will allow them to deal with different problems. We have included some new exercises and new computer exercises which use new Minitab macros and R-functions. The Minitab macros can be downloaded from the book website: http://introbayes.ac.nz. The new R functions have been incorporated in a new and improved version of the R package Bolstad, which can either be downloaded from a CRAN mirror or installed directly in R using the internet. Instructions on the use and installation of the Minitab macros and the Bolstad package in R are given in Appendices C and D respectively. Both of these appendices have been rewritten to accommodate changes in R and Minitab that have occurred since the second edition.

Our Perspective on Bayesian Statistics

A book can be characterized as much by what is left out as by what is included. This book is our attempt to show a coherent view of Bayesian statistics as a good way to do statistical inference. Details that are outside the scope of the text are included in footnotes. Here are some of our reasons behind our choice of the topics we either included or excluded.

In particular, we did not mention decision theory or loss functions when discussing Bayesian statistics. In many books, Bayesian statistics gets compartmentalized into decision theory while inference is presented in the frequentist manner. While decision theory is a very interesting topic in its own right, we want to present the case for Bayesian statistical inference, and did not want to get side-tracked.

We think that in order to get full benefit of Bayesian statistics, one really has to consider all priors subjective. They are either (1) a summary of what you believe or (2) a summary of all you allow yourself to believe initially. We consider the subjective prior as the relative weights given to each possible parameter value, before looking at the data. Even if we use a at prior to give all possible values equal prior weight, it is subjective since we chose it. In any case, it gives all values equal weight only in that parameterization, so it can be considered “objective” only in that parameterization. In this book we do not wish to dwell on the problems associated with trying to be objective in Bayesian statistics. We explain why universal objectivity is not possible (in a footnote since we do not want to distract the reader). We want to leave him/her with the “relative weight” idea of the prior in the parameterization in which they have the problem in.

In the first edition we did not mention Jeffreys' prior explicitly, although the beta prior for binomial and at prior for normal mean are in fact the Jeffreys' prior for those respective observation distributions. In the second edition we do mention Jeffreys' prior for binomial, Poisson, normal mean, and normal standard deviation. In third edition we mention the independent Jeffreys priors for normal mean and standard deviation. In particular, we don't want to get the reader involved with the problems about Jeffreys' prior, such as for mean and variance together, as opposed to independent Jeffreys' priors, or the Jeffreys' prior violating the likelihood principal. These are beyond the level we wish to go. We just want the reader to note the Jeffreys' prior in these cases as possible priors, the relative weights they give, when they may be appropriate, and how to use them. Mathematically, all parameterizations are equally valid; however, usually only the main one is very meaningful. We want the reader to focus on relative weights for their parameterization as the prior. It should be (a) a summary of their prior belief (conjugate prior matching their prior beliefs about moments or median), (b) at (hence objective) for their parameterization, or (c) some other form that gives reasonable weight over the whole range of possible values. The posteriors will be similar for all priors that have reasonable weight over the whole range of possible values.

The Bayesian inference on the standard deviation of the normal was done where the mean is considered a known parameter. The conjugate prior for the variance is the inverse chi-squared distribution. Our intuition is about the standard deviation, yet we are doing Bayes' theorem on the variance. This required introducing the change of variable formula for the prior density.

In the second edition we considered the mean as known. This avoided the mathematically more advanced case where both mean and standard deviation are unknown. In the third edition we now cover this topic in Chapter 17. In earlier editions the Student's t is presented as the required adjustment to credible intervals for the mean when the variance is estimated from the data. In the third edition we show in Chapter 17 that in fact this would be the result when the joint posterior found, and the variance marginalized out. Chapter 17 also covers inference on the difference in two means. This problem is made substantially harder when one relaxes the assumption that both populations have the same variance. Chapter 17 derives the Bayesian solution to the well-known Behrens-Fisher problem for the difference in two population means with unequal population variances. The function bayes.t.test in the R package for this book actually gives the user a numerical solution using Gibbs sampling. Gibbs sampling is covered in Chapter 20 of this new edition.

Acknowledgments

WMB would like to thank all the readers who have sent him comments and pointed out misprints in the first and second editions. These have been corrected. WMB would like to thank Cathy Akritas and Gonzalo Ovalles at Minitab for help in improving his Minitab macros. WMB and JMC would like to thank Jon Gurstelle, Steve Quigley, Sari Friedman, Allison McGinniss, and the team at John Wiley & Sons for their support.

Finally, last but not least, WMB wishes to thank his wife Sylvie for her constant love and support.

WILLIAM M. “BILL' BOLSTAD

Hamilton, New Zealand

JAMES M. CURRAN

Auckland, New Zealand

CHAPTER 1INTRODUCTION TO STATISTICAL SCIENCE

Statistics is the science that relates data to specific questions of interest. This includes devising methods to gather data relevant to the question, methods to summarize and display the data to shed light on the question, and methods that enable us to draw answers to the question that are supported by the data. Data almost always contain uncertainty. This uncertainty may arise from selection of the items to be measured, or it may arise from variability of the measurement process. Drawing general conclusions from data is the basis for increasing knowledge about the world, and is the basis for all rational scientific inquiry. Statistical inference gives us methods and tools for doing this despite the uncertainty in the data. The methods used for analysis depend on the way the data were gathered. It is vitally important that there is a probability model explaining how the uncertainty gets into the data.

Showing a Causal Relationship from Data

Suppose we have observed two variables X and Y. Variable X appears to have an association with variable Y. If high values of X occur with high values of variable Y and low values of X occur with low values of Y, then we say the association is positive. On the other hand, the association could be negative in which high values of variable X occur in with low values of variable Y. Figure 1.1 shows a schematic diagram where the association is indicated by the dashed curve connecting X and Y. The unshaded area indicates that X and Y are observed variables. The shaded area indicates that there may be additional variables that have not been observed.

Figure 1.1 Association between two variables.

Figure 1.2 Association due to causal relationship.

We would like to determine why the two variables are associated. There are several possible explanations. The association might be a causal one. For example, X might be the cause of Y. This is shown in Figure 1.2, where the causal relationship is indicated by the arrow from X to Y.

On the other hand, there could be an unidentified third variable Z that has a causal effect on both X and Y. They are not related in a direct causal relationship. The association between them is due to the effect of Z. Z is called a lurking variable, since it is hiding in the background and it affects the data. This is shown in Figure 1.3.

Figure 1.3 Association due to lurking variable.

Figure 1.4 Confounded causal and lurking variable effects.

It is possible that both a causal effect and a lurking variable may both be contributing to the association. This is shown in Figure 1.4. We say that the causal effect and the effect of the lurking variable are confounded. This means that both effects are included in the association.

Our first goal is to determine which of the possible reasons for the association holds. If we conclude that it is due to a causal effect, then our next goal is to determine the size of the effect. If we conclude that the association is due to causal effect confounded with the effect of a lurking variable, then our next goal becomes determining the sizes of both the effects.

1.1 The Scientific Method: A Process for Learning

In the Middle Ages, science was deduced from principles set down many centuries earlier by authorities such as Aristotle. The idea that scientific theories should be tested against real world data revolutionized thinking. This way of thinking known as the scientific method sparked the Renaissance.

The scientific method rests on the following premises:

A scientific hypothesis can never be shown to be absolutely true.

However, it must potentially be disprovable.

It is a useful model until it is established that it is not true.

Always go for the simplest hypothesis, unless it can be shown to be false.

This last principle, elaborated by William of Ockham in the 13th century, is now known as Ockham’s razor and is firmly embedded in science. It keeps science from developing fanciful overly elaborate theories. Thus the scientific method directs us through an improving sequence of models, as previous ones get falsified. The scientific method generally follows the following procedure:

Ask a question or pose a problem in terms of the current scientific hypothesis.

Gather all the relevant information that is currently available. This includes the current knowledge about parameters of the model.

Design an investigation or experiment that addresses the question from step 1. The predicted outcome of the experiment should be one thing if the current hypothesis is true, and something else if the hypothesis is false.

Gather data from the experiment.

Draw conclusions given the experimental results. Revise the knowledge about the parameters to take the current results into account.

The scientific method searches for cause-and-effect relationships between an experimental variable and an outcome variable. In other words, how changing the experimental variable results in a change to the outcome variable. Scientific modeling develops mathematical models of these relationships. Both of them need to isolate the experiment from outside factors that could affect the experimental results. All outside factors that can be identified as possibly affecting the results must be controlled. It is no coincidence that the earliest successes for the method were in physics and chemistry where the few outside factors could be identified and controlled. Thus there were no lurking variables. All other relevant variables could be identified and could then be physically controlled by being held constant. That way they would not affect results of the experiment, and the effect of the experimental variable on the outcome variable could be determined. In biology, medicine, engineering, technology, and the social sciences it is not that easy to identify the relevant factors that must be controlled. In those fields a different way to control outside factors is needed, because they cannot be identified beforehand and physically controlled.

1.2 The Role of Statistics in the Scientific Method

Statistical methods of inference can be used when there is random variability in the data. The probability model for the data is justified by the design of the investigation or experiment. This can extend the scientific method into situations where the relevant outside factors cannot even be identified. Since we cannot identify these outside factors, we cannot control them directly. The lack of direct control means the outside factors will be affecting the data. There is a danger that the wrong conclusions could be drawn from the experiment due to these uncontrolled outside factors.

The important statistical idea of randomization has been developed to deal with this possibility. The unidentified outside factors can be “averaged out” by randomly assigning each unit to either treatment or control group. This contributes variability to the data. Statistical conclusions always have some uncertainty or error due to variability in the data. We can develop a probability model of the data variability based on the randomization used. Randomization not only reduces this uncertainty due to outside factors, it also allows us to measure the amount of uncertainty that remains using the probability model. Randomization lets us control the outside factors statistically, by averaging out their effects.

Underlying this is the idea of a statistical population, consisting of all possible values of the observations that could be made. The data consists of observations taken from a sample of the population. For valid inferences about the population parameters from the sample statistics, the sample must be “representative” of the population. Amazingly, choosing the sample randomly is the most effective way to get representative samples!

1.3 Main Approaches to Statistics

There are two main philosophical approaches to statistics. The first is often referred to as the frequentist approach. Sometimes it is called the classical approach. Procedures are developed by looking at how they perform over all possible random samples. The probabilities do not relate to the particular random sample that was obtained. In many ways this indirect method places the “cart before the horse.”

The alternative approach that we take in this book is the Bayesian approach. It applies the laws of probability directly to the problem. This offers many fundamental advantages over the more commonly used frequentist approach. We will show these advantages over the course of the book.

Frequentist Approach to Statistics

Most introductory statistics books take the frequentist approach to statistics, which is based on the following ideas:

Parameters, the numerical characteristics of the population, are fixed but unknown constants.

Probabilities are always interpreted as long-run relative frequency.

Statistical procedures are judged by how well they perform in the long run over an infinite number of hypothetical repetitions of the experiment.

Probability statements are only allowed for random quantities. The unknown parameters are fixed, not random, so probability statements cannot be made about their value. Instead, a sample is drawn from the population, and a sample statistic is calculated. The probability distribution of the statistic over all possible random samples from the population is determined and is known as the sampling distribution of the statistic. A parameter of the population will also be a parameter of the sampling distribution. The probability statement that can be made about the statistic based on its sampling distribution is converted to a confidence statement about the parameter. The confidence is based on the average behavior of the procedure over all possible samples.

Bayesian Approach to Statistics

The Reverend Thomas Bayes first discovered the theorem that now bears his name. It was written up in a paper An Essay Towards Solving a Problem in the Doctrine of Chances. This paper was found after his death by his friend Richard Price, who had it published posthumously in the Philosophical Transactions of the Royal Society in 1763 (1763). Bayes showed how inverse probability could be used to calculate probability of antecedent events from the occurrence of the consequent event. His methods were adopted by Laplace and other scientists in the 19th century, but had largely fallen from favor by the early 20th century. By the middle of the 20th century, interest in Bayesian methods had been renewed by de Finetti, Jeffreys, Savage, and Lindley, among others. They developed a complete method of statistical inference based on Bayes’ theorem.

This book introduces the Bayesian approach to statistics. The ideas that form the basis of the this approach are:

Since we are uncertain about the true value of the parameters, we will consider them to be random variables.

The rules of probability are used directly to make inferences about the parameters.

Probability statements about parameters must be interpreted as “degree of belief.” The

prior distribution

must be subjective. Each person can have his/her own prior, which contains the relative weights that person gives to every possible parameter value. It measures how “plausible” the person considers each parameter value to be before observing the data.

We revise our beliefs about parameters after getting the data by using Bayes’ theorem. This gives our

posterior distribution

which gives the relative weights we give to each parameter value after analyzing the data. The posterior distribution comes from two sources: the prior distribution and the observed data.

This has a number of advantages over the conventional frequentist approach. Bayes’ theorem is the only consistent way to modify our beliefs about the parameters given the data that actually occurred. This means that the inference is based on the actual occurring data, not all possible data sets that might have occurred but did not! Allowing the parameter to be a random variable lets us make probability statements about it, posterior to the data. This contrasts with the conventional approach where inference probabilities are based on all possible data sets that could have occurred for the fixed parameter value. Given the actual data, there is nothing random left with a fixed parameter value, so one can only make confidence statements, based on what could have occurred. Bayesian statistics also has a general way of dealing with a nuisance parameter. A nuisance parameter is one which we do not want to make inference about, but we do not want them to interfere with the inferences we are making about the main parameters. Frequentist statistics does not have a general procedure for dealing with them. Bayesian statistics is predictive, unlike conventional frequentist statistics. This means that we can easily find the conditional probability distribution of the next observation given the sample data.

Monte Carlo Studies

In frequentist statistics, the parameter is considered a fixed, but unknown, constant. A statistical procedure such as a particular estimator for the parameter cannot be judged from the value it gives. The parameter is unknown, so we can not know the value the estimator should be giving. If we knew the value of the parameter, we would not be using an estimator.

Instead, statistical procedures are evaluated by looking how they perform in the long run over all possible samples of data, for fixed parameter values over some range. For instance, we fix the parameter at some value. The estimator depends on the random sample, so it is considered a random variable having a probability distribution. This distribution is called the sampling distribution of the estimator, since its probability distribution comes from taking all possible random samples. Then we look at how the estimator is distributed around the parameter value. This is called sample space averaging. Essentially it compares the performance of procedures before we take any data.

Bayesian procedures consider the parameter to be a random variable, and its posterior distribution is conditional on the sample data that actually occurred, not all those samples that were possible but did not occur. However, before the experiment, we might want to know how well the Bayesian procedure works at some specific parameter values in the range.

To evaluate the Bayesian procedure using sample space averaging, we have to consider the parameter to be both a random variable and a fixed but unknown value at the same time. We can get past the apparent contradiction in the nature of the parameter because the probability distribution we put on the parameter measures our uncertainty about the true value. It shows the relative belief weights we give to the possible values of the unknown parameter! After looking at the data, our belief distribution over the parameter values has changed. This way we can think of the parameter as a fixed, but unknown, value at the same time as we think of it being a random variable. This allows us to evaluate the Bayesian procedure using sample space averaging. This is called pre-posterior analysis because it can be done before we obtain the data.

In Chapter 4, we will find out that the laws of probability are the best way to model uncertainty. Because of this, Bayesian procedures will be optimal in the post-data setting, given the data that actually occurred. In Chapters 9 and 11, we will see that Bayesian procedures perform very well in the pre-data setting when evaluated using pre-posterior analysis. In fact, it is often the case that Bayesian procedures outperform the usual frequentist procedures even in the pre-data setting.

Monte Carlo studies are a useful way to perform sample space averaging. We draw a large number of samples randomly using the computer and calculate the statistic (frequentist or Bayesian) for each sample. The empirical distribution of the statistic (over the large number of random samples) approximates its sampling distribution (over all possible random samples). We can calculate statistics such as mean and standard deviation on this Monte Carlo sample to approximate the mean and standard deviation of the sampling distribution. Some small-scale Monte Carlo studies are included as exercises.

1.4 Purpose and Organization of This Text

A very large proportion of undergraduates are required to take a service course in statistics. Almost all of these courses are based on frequentist ideas. Most of them do not even mention Bayesian ideas. As a statistician, I know that Bayesian methods have great theoretical advantages. I think we should be introducing our best students to Bayesian ideas, from the beginning. There are not many introductory statistics text books based on the Bayesian ideas. Some other texts include Berry (1996), Press (1989), and Lee (1989).

This book aims to introduce students with a good mathematics background to Bayesian statistics. It covers the same topics as a standard introductory statistics text, only from a Bayesian perspective. Students need reasonable algebra skills to follow this book. Bayesian statistics uses the rules of probability, so competence in manipulating mathematical formulas is required. Students will find that general knowledge of calculus is helpful in reading this book. Specifically they need to know that area under a curve is found by integrating, and that a maximum or minimum of a continuous differentiable function is found where the derivative of the function equals zero. However, the actual calculus used is minimal. The book is self-contained with a calculus appendix that students can refer to.

Chapter 2 introduces some fundamental principles of scientific data gathering to control the effects of unidentified factors. These include the need for drawing samples randomly, along with some random sampling techniques. The reason why there is a difference between the conclusions we can draw from data arising from an observational study and from data arising from a randomized experiment is shown. Completely randomized designs and randomized block designs are discussed.

Chapter 3 covers elementary methods for graphically displaying and summarizing data. Often a good data display is all that is necessary. The principles of designing displays that are true to the data are emphasized.

Chapter 4 shows the difference between deduction and induction. Plausible reasoning is shown to be an extension of logic where there is uncertainty. It turns out that plausible reasoning must follow the same rules as probability. The axioms of probability are introduced and the rules of probability, including conditional probability and Bayes’ theorem are developed.

Chapter 5 covers discrete random variables, including joint and marginal discrete random variables. The binomial, hypergeometric, and Poisson distributions are introduced, and the situations where they arise are characterized.

Chapter 6 covers Bayes’ theorem for discrete random variables using a table. We see that two important consequences of the method are that multiplying the prior by a constant, or that multiplying the likelihood by a constant do not affect the resulting posterior distribution. This gives us the “proportional form” of Bayes’ theorem. We show that we get the same results when we analyze the observations sequentially using the posterior after the previous observation as the prior for the next observation, as when we analyze the observations all at once using the joint likelihood and the original prior. We demonstrate Bayes’ theorem for binomial observations with a discrete prior and for Poisson observations with a discrete prior.

Chapter 7 covers continuous random variables, including joint, marginal, and conditional random variables. The beta, gamma, and normal distributions are introduced in this chapter.

Chapter 8 covers Bayes’ theorem for the population proportion (binomial) with a continuous prior. We show how to find the posterior distribution of the population proportion using either a uniform prior or a beta prior. We explain how to choose a suitable prior. We look at ways of summarizing the posterior distribution.

Chapter 9 compares the Bayesian inferences with the frequentist inferences. We show that the Bayesian estimator (posterior mean using a uniform prior) has better performance than the frequentist estimator (sample proportion) in terms of mean squared error over most of the range of possible values. This kind of frequentist analysis is useful before we perform our Bayesian analysis. We see the Bayesian credible interval has a much more useful interpretation than the frequentist confidence interval for the population proportion. Onesided and two-sided hypothesis tests using Bayesian methods are introduced.

Chapter 10 covers Bayes’ theorem for the Poisson observations with a continuous prior. The prior distributions used include the positive uniform, the Jeffreys’ prior, and the gamma prior. Bayesian inference for the Poisson parameter using the resulting posterior include Bayesian credible intervals and two-sided tests of hypothesis, as well as one-sided tests of hypothesis.

Chapter 11 covers Bayes’ theorem for the mean of a normal distribution with known variance. We show how to choose a normal prior. We discuss dealing with nuisance parameters by marginalization. The predictive density of the next observation is found by considering the population mean a nuisance parameter and marginalizing it out.

Chapter 12 compares Bayesian inferences with the frequentist inferences for the mean of a normal distribution. These comparisons include point and interval estimation and involve hypothesis tests including both the one-sided and the two-sided cases.

Chapter 13 shows how to perform Bayesian inferences for the difference between normal means and how to perform Bayesian inferences for the difference between proportions using the normal approximation.

Chapter 14 introduces the simple linear regression model and shows how to perform Bayesian inferences on the slope of the model. The predictive distribution of the next observation is found by considering both the slope and intercept to be nuisance parameters and marginalizing them out.

Chapter 15 introduces Bayesian inference for the standard deviation σ, when we have a random sample of normal observations with known mean μ. This chapter is at a somewhat higher level than the previous chapters and requires the use of the change-of-variable formula for densities. Priors used include positive uniform for standard deviation, positive uniform for variance, Jeffreys’ prior, and the inverse chi-squared prior. We discuss how to choose an inverse chi-squared prior that matches our prior belief about the median. Bayesian inferences from the resulting posterior include point estimates, credible intervals, and hypothesis tests including both the one-sided and two-sided cases.

Chapter 16 shows how we can make Bayesian inference robust against a misspecified prior by using a mixture prior and marginalizing out the mixture parameter. This chapter is also at a somewhat higher level than the others, but it shows how one of the main dangers of Bayesian analysis can be avoided.

Chapter 17 returns to the problem we discussed in Chapter 11 — that is, of making inferences about the mean of a normal distribution. In this chapter, however, we explicitly model the unknown population standard deviation and show how the approximations we suggested in Chapter 11 are exactly true. We also deal with the two sample cases so that inference can be performed on the difference between two means.

Chapter 18 introduces the multivariate normal distribution and extends the theory from Chapters 11 and 17 to the multivariate case. The multivariate normal distribution is essential for the discussion of linear models and, in particular, multiple regression.