125,99 €
"...this edition is useful and effective in teaching Bayesian inference at both elementary and intermediate levels. It is a well-written book on elementary Bayesian inference, and the material is easily accessible. It is both concise and timely, and provides a good collection of overviews and reviews of important tools used in Bayesian statistical methods." There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian statistics. The authors continue to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inference for discrete random variables, binomial proportions, Poisson, and normal means, and simple linear regression. In addition, more advanced topics in the field are presented in four new chapters: Bayesian inference for a normal with unknown mean and variance; Bayesian inference for a Multivariate Normal mean vector; Bayesian inference for the Multiple Linear Regression Model; and Computational Bayesian Statistics including Markov Chain Monte Carlo. The inclusion of these topics will facilitate readers' ability to advance from a minimal understanding of Statistics to the ability to tackle topics in more applied, advanced level books. Minitab macros and R functions are available on the book's related website to assist with chapter exercises. Introduction to Bayesian Statistics, Third Edition also features: * Topics including the Joint Likelihood function and inference using independent Jeffreys priors and join conjugate prior * The cutting-edge topic of computational Bayesian Statistics in a new chapter, with a unique focus on Markov Chain Monte Carlo methods * Exercises throughout the book that have been updated to reflect new applications and the latest software applications * Detailed appendices that guide readers through the use of R and Minitab software for Bayesian analysis and Monte Carlo simulations, with all related macros available on the book's website Introduction to Bayesian Statistics, Third Edition is a textbook for upper-undergraduate or first-year graduate level courses on introductory statistics course with a Bayesian emphasis. It can also be used as a reference work for statisticians who require a working knowledge of Bayesian statistics.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 892
Veröffentlichungsjahr: 2016
WILLIAM M. BOLSTADJAMES M. CURRAN
Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
ISBN 978-1-118-09315-8
This book is dedicated to Sylvie,Ben, Rachel,Emily, Mary, and Elizabeth
Preface
Chapter 1 Introduction to Statistical Science
1.1 The Scientific Method:honey A Process for Learning
1.2 The Role of Statistics in the Scientific Method
1.3 Main Approaches to Statistics
1.4 Purpose and Organization of This Text
Chapter 2 Scientific Data Gathering
2.1 Sampling from a Real Population
2.2 Observational Studies and Designed Experiments
Chapter 3 Displaying and Summarizing Data
3.1 Graphically Displaying a Single Variable
3.2 Graphically Comparing Two Samples
3.3 Measures of Location
3.4 Measures of Spread
3.5 Displaying Relationships Between Two or More Variables
3.6 Measures of Association for Two or More Variables
Exercises
Chapter 4 Logic, Probability, and Uncertainty
4.1 Deductive Logic and Plausible Reasoning
4.2 Probability
4.3 Axioms of Probability
4.4 Joint Probability and Independent Events
4.5 Conditional Probability
4.6 Bayes’ Theorem
4.7 Assigning Probabilities
4.8 Odds and Bayes Factor
4.9 Beat the Dealer
Exercises
Chapter 5 Discrete Random Variables
5.1 Discrete Random Variables
5.2 Probability Distribution of a Discrete Random Variable
5.3 Binomial Distribution
5.4 Hypergeometric Distribution
5.5 Poisson Distribution
5.6 Joint Random Variables
5.7 Conditional Probability for Joint Random Variables
Exercises
Chapter 6 Bayesian Inference for Discrete Random Variables
6.1 Two Equivalent Ways of Using Bayes’ Theorem
6.2 Bayes’ Theorem for Binomial with Discrete Prior
6.3 Important Consequences of Bayes’ Theorem
6.4 Bayes’ Theorem for Poisson with Discrete Prior
Exercises
Computer Exercises
Chapter 7 Continuous Random Variables
7.1 Probability Density Function
7.2 Some Continuous Distributions
7.3 Joint Continuous Random Variables
7.4 Joint Continuous and Discrete Random Variables
Exercises
Chapter 8 Bayesian Inference for Binomial Proportion
8.1 Using a Uniform Prior
8.2 Using a Beta Prior
8.3 Choosing Your Prior
8.4 Summarizing the Posterior Distribution
8.5 Estimating the Proportion
8.6 Bayesian Credible Interval
Exercises
Computer Exercises
Chapter 9 Comparing Bayesian and Frequentist Inferences for Proportion
9.1 Frequentist Interpretation of Probability and Parameters
9.2 Point Estimation
9.3 Comparing Estimators for Proportion
9.4 Interval Estimation
9.5 Hypothesis Testing
9.6 Testing a One-Sided Hypothesis
9.7 Testing a Two-Sided Hypothesis
Exercises
Monte Carlo Exercises
Chapter 10 Bayesian Inference for Poisson
10.1 Some Prior Distributions for Poisson
10.2 Inference for Poisson Parameter
Exercises
Computer Exercises
Chapter 11 Bayesian Inference for Normal Mean
11.1 Bayes’ Theorem for Normal Mean with a Discrete Prior
11.2 Bayes’ Theorem for Normal Mean with a Continuous Prior
11.3 Choosing Your Normal Prior
11.4 Bayesian Credible Interval for Normal Mean
11.5 Predictive Density for Next Observation
Exercises
Computer Exercises
Chapter 12 Comparing Bayesian and Frequentist Inferences for Mean
12.1 Comparing Frequentist and Bayesian Point Estimators
12.2 Comparing Confidence and Credible Intervals for Mean
12.3 Testing a One-Sided Hypothesis about a Normal Mean
12.4 Testing a Two-Sided Hypothesis about a Normal Mean
Exercises
Chapter 13 Bayesian Inference for Difference Between Means
13.1 Independent Random Samples from Two Normal Distributions
13.2 Case 1:honey Equal Variances
13.3 Case 2:honey Unequal Variances
13.4 Bayesian Inference for Difference Between Two Proportions Using Normal Approximation
13.5 Normal Random Samples from Paired Experiments
Exercises
Chapter 14 Bayesian Inference for Simple Linear Regression
14.1 Least Squares Regression
14.2 Exponential Growth Model
14.3 Simple Linear Regression Assumptions
14.4 Bayes’ Theorem for the Regression Model
14.5 Predictive Distribution for Future Observation
Exercises
Computer Exercises
Chapter 15 Bayesian Inference for Standard Deviation
15.1 Bayes’ Theorem for Normal Variance with a Continuous Prior
15.2 Some Specific Prior Distributions and the Resulting Posteriors
15.3 Bayesian Inference for Normal Standard Deviation
Exercises
Computer Exercises
Chapter 16 Robust Bayesian Methods
16.1 Effect of Misspecified Prior
16.2 Bayes’ Theorem with Mixture Priors
Exercises
Computer Exercises
Chapter 17 Bayesian Inference for Normal with Unknown Mean and Variance
17.1 The Joint Likelihood Function
17.2 Finding the Posterior when Independent Jeffreys’ Priors for
μ
and
σ
2
Are Used
17.3 Finding the Posterior when a Joint Conjugate Prior for
μ
and σ
2
Is Used
17.4 Difference Between Normal Means with Equal Unknown Variance
17.5 Difference Between Normal Means with Unequal Unknown Variances
Computer Exercises
Appendix:honey Proof that the Exact Marginal Posterior Distribution of
μ
Is
Student’s t
Chapter 18 Bayesian Inference for Multivariate Normal Mean Vector
18.1 Bivariate Normal Density
18.2 Multivariate Normal Distribution
18.3 The Posterior Distribution of the Multivariate Normal Mean Vector when Covariance Matrix Is Known
18.4 Credible Region for Multivariate Normal Mean Vector when Covariance Matrix Is Known
18.5 Multivariate Normal Distribution with Unknown Covariance Matrix
Computer Exercises
Chapter 19 Bayesian Inference for the Multiple Linear Regression Model
19.1 Least Squares Regression for Multiple Linear Regression Model
19.2 Assumptions of Normal Multiple Linear Regression Model
19.3 Bayes’ Theorem for Normal Multiple Linear Regression Model
19.4 Inference in the Multivariate Normal Linear Regression Model
19.5 The Predictive Distribution for a Future Observation
Computer Exercises
Chapter 20 Computational Bayesian Statistics Including Markov Chain Monte Carlo
20.1 Direct Methods for Sampling from the Posterior
20.2 Sampling Importance Resampling
20.3 Markov Chain Monte Carlo Methods
20.4 Slice Sampling
20.5 Inference from a Posterior Random Sample
20.6 Where to Next?
A Introduction to Calculus
B Use of Statistical Tables
C Using the Included Minitab Macros
D Using the Included R Functions
E Answers to Selected Exercises
References
Index
EULA
Chapter 3
Table 3.1
Table 3.2
Table 3.3
Table 3.4
Chapter 5
Table 5.1
Table 5.2
Table 5.3
Table 5.4
Table 5.5
Table 5.6
Chapter 6
Table 6.1
Table 6.2
Table 6.3
Table 6.4
Table 6.5
Table 6.6
Table 6.7
Table 6.8
Table 6.9
Table 6.10
Table 6.11
Table 6.12
Table 6.13
Table 6.14
Table 6.15
Table 6.16
Chapter 8
Table 8.1
Table 8.2
Table 8.3
Table 9.1
Table 9.2
Chapter 10
Table 10.1
Table 10.2
Table 10.3
Table 10.4
Chapter 11
Table 11.1
Table 11.2
Table 11.3
Table 11.4
Table 11.5
Chapter 12
Table 12.1
Chapter 13
Table 13.1
Table 13.2
Table 13.3
Chapter 14
Table 14.1
Table 14.2
Chapter 15
Table 15.1
Table 15.2
Table 15.3
Table 15.4
Chapter 17
Table 17.1
Table 17.2
Chapter 19
Table 19.1
Chapter 20
Table 20.1
Table 20.2
Table 20.3
Table 20.4
Table 20.5
B
Table B.1
Table B.2
Table B.3
Table B.4
Table B.5
Table B.6
C
Table C.1
Table C.2
Table C.3
Table C.4
Table C.5
Table C.6
Table C.7
Table C.8
Table C.9
Table C.10
Table C.11
Table C.12
Table C.13
Table C.14
Table C.15
Table C.16
Table C.17
Table C.18
Table C.19
Table C.20
Table C.21
Table C.22
Table C.23
Table C.24
Table C.25
D
Table D.1
Table D.2
Table D.3
Table D.4
Cover
Contents
1
iv
v
xiii
xiv
xv
xvi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
595
592
593
594
595
596
597
598
599
600
601
Our original goal for this book was to introduce Bayesian statistics at the earliest possible stage to students with a reasonable mathematical background. This entailed coverage of a similar range of topics as an introductory statistics text, but from a Bayesian perspective. The emphasis is on statistical inference. We wanted to show how Bayesian methods can be used for inference and how they compare favorably with the frequentist alternatives. This book is meant to be a good place to start the study of Bayesian statistics. From the many positive comments we have received from many users, we think the book succeeded in its goal. A course based on this goal would include Chapters 1-14.
Our feedback also showed that many users were taking up the book at a more intermediate level instead of the introductory level original envisaged. The topics covered in Chapters 2 and 3 would be old hat for these users, so we would have to include some more advanced material to cater for the needs of that group. The second edition aimed to meet this new goal as well as the original goal. We included more models, mainly with a single parameter. Nuisance parameters were dealt with using approximations. A course based on this goal would include Chapters 4-16.
Later feedback showed that some readers with stronger mathematical and statistical background wanted the text to include more details on how to deal with multi-parameter models. The third edition contains four new chapters to satisfy this additional goal, along with some minor rewriting of the existing chapters. Chapter 17 covers Bayesian inference for Normal observations where we do not know either the mean or the variance. This chapter extends the ideas in Chapter 11, and also discusses the two sample case, which in turn allows the reader to consider inference on the difference between two means. Chapter 18 introduces the Multivariate Normal distribution, which we need in order to discuss multiple linear regression in Chapter 19. Finally, Chapter 20 takes the user beyond the kind of conjugate analysis is considered in most of the book, and into the realm of computational Bayesian inference. The covered topics in Chapter 20 have an intentional light touch, but still give the user valuable information and skills that will allow them to deal with different problems. We have included some new exercises and new computer exercises which use new Minitab macros and R-functions. The Minitab macros can be downloaded from the book website: http://introbayes.ac.nz. The new R functions have been incorporated in a new and improved version of the R package Bolstad, which can either be downloaded from a CRAN mirror or installed directly in R using the internet. Instructions on the use and installation of the Minitab macros and the Bolstad package in R are given in Appendices C and D respectively. Both of these appendices have been rewritten to accommodate changes in R and Minitab that have occurred since the second edition.
A book can be characterized as much by what is left out as by what is included. This book is our attempt to show a coherent view of Bayesian statistics as a good way to do statistical inference. Details that are outside the scope of the text are included in footnotes. Here are some of our reasons behind our choice of the topics we either included or excluded.
In particular, we did not mention decision theory or loss functions when discussing Bayesian statistics. In many books, Bayesian statistics gets compartmentalized into decision theory while inference is presented in the frequentist manner. While decision theory is a very interesting topic in its own right, we want to present the case for Bayesian statistical inference, and did not want to get side-tracked.
We think that in order to get full benefit of Bayesian statistics, one really has to consider all priors subjective. They are either (1) a summary of what you believe or (2) a summary of all you allow yourself to believe initially. We consider the subjective prior as the relative weights given to each possible parameter value, before looking at the data. Even if we use a at prior to give all possible values equal prior weight, it is subjective since we chose it. In any case, it gives all values equal weight only in that parameterization, so it can be considered “objective” only in that parameterization. In this book we do not wish to dwell on the problems associated with trying to be objective in Bayesian statistics. We explain why universal objectivity is not possible (in a footnote since we do not want to distract the reader). We want to leave him/her with the “relative weight” idea of the prior in the parameterization in which they have the problem in.
In the first edition we did not mention Jeffreys' prior explicitly, although the beta prior for binomial and at prior for normal mean are in fact the Jeffreys' prior for those respective observation distributions. In the second edition we do mention Jeffreys' prior for binomial, Poisson, normal mean, and normal standard deviation. In third edition we mention the independent Jeffreys priors for normal mean and standard deviation. In particular, we don't want to get the reader involved with the problems about Jeffreys' prior, such as for mean and variance together, as opposed to independent Jeffreys' priors, or the Jeffreys' prior violating the likelihood principal. These are beyond the level we wish to go. We just want the reader to note the Jeffreys' prior in these cases as possible priors, the relative weights they give, when they may be appropriate, and how to use them. Mathematically, all parameterizations are equally valid; however, usually only the main one is very meaningful. We want the reader to focus on relative weights for their parameterization as the prior. It should be (a) a summary of their prior belief (conjugate prior matching their prior beliefs about moments or median), (b) at (hence objective) for their parameterization, or (c) some other form that gives reasonable weight over the whole range of possible values. The posteriors will be similar for all priors that have reasonable weight over the whole range of possible values.
The Bayesian inference on the standard deviation of the normal was done where the mean is considered a known parameter. The conjugate prior for the variance is the inverse chi-squared distribution. Our intuition is about the standard deviation, yet we are doing Bayes' theorem on the variance. This required introducing the change of variable formula for the prior density.
In the second edition we considered the mean as known. This avoided the mathematically more advanced case where both mean and standard deviation are unknown. In the third edition we now cover this topic in Chapter 17. In earlier editions the Student's t is presented as the required adjustment to credible intervals for the mean when the variance is estimated from the data. In the third edition we show in Chapter 17 that in fact this would be the result when the joint posterior found, and the variance marginalized out. Chapter 17 also covers inference on the difference in two means. This problem is made substantially harder when one relaxes the assumption that both populations have the same variance. Chapter 17 derives the Bayesian solution to the well-known Behrens-Fisher problem for the difference in two population means with unequal population variances. The function bayes.t.test in the R package for this book actually gives the user a numerical solution using Gibbs sampling. Gibbs sampling is covered in Chapter 20 of this new edition.
WMB would like to thank all the readers who have sent him comments and pointed out misprints in the first and second editions. These have been corrected. WMB would like to thank Cathy Akritas and Gonzalo Ovalles at Minitab for help in improving his Minitab macros. WMB and JMC would like to thank Jon Gurstelle, Steve Quigley, Sari Friedman, Allison McGinniss, and the team at John Wiley & Sons for their support.
Finally, last but not least, WMB wishes to thank his wife Sylvie for her constant love and support.
WILLIAM M. “BILL' BOLSTAD
Hamilton, New Zealand
JAMES M. CURRAN
Auckland, New Zealand
Statistics is the science that relates data to specific questions of interest. This includes devising methods to gather data relevant to the question, methods to summarize and display the data to shed light on the question, and methods that enable us to draw answers to the question that are supported by the data. Data almost always contain uncertainty. This uncertainty may arise from selection of the items to be measured, or it may arise from variability of the measurement process. Drawing general conclusions from data is the basis for increasing knowledge about the world, and is the basis for all rational scientific inquiry. Statistical inference gives us methods and tools for doing this despite the uncertainty in the data. The methods used for analysis depend on the way the data were gathered. It is vitally important that there is a probability model explaining how the uncertainty gets into the data.
Suppose we have observed two variables X and Y. Variable X appears to have an association with variable Y. If high values of X occur with high values of variable Y and low values of X occur with low values of Y, then we say the association is positive. On the other hand, the association could be negative in which high values of variable X occur in with low values of variable Y. Figure 1.1 shows a schematic diagram where the association is indicated by the dashed curve connecting X and Y. The unshaded area indicates that X and Y are observed variables. The shaded area indicates that there may be additional variables that have not been observed.
Figure 1.1 Association between two variables.
Figure 1.2 Association due to causal relationship.
We would like to determine why the two variables are associated. There are several possible explanations. The association might be a causal one. For example, X might be the cause of Y. This is shown in Figure 1.2, where the causal relationship is indicated by the arrow from X to Y.
On the other hand, there could be an unidentified third variable Z that has a causal effect on both X and Y. They are not related in a direct causal relationship. The association between them is due to the effect of Z. Z is called a lurking variable, since it is hiding in the background and it affects the data. This is shown in Figure 1.3.
Figure 1.3 Association due to lurking variable.
Figure 1.4 Confounded causal and lurking variable effects.
It is possible that both a causal effect and a lurking variable may both be contributing to the association. This is shown in Figure 1.4. We say that the causal effect and the effect of the lurking variable are confounded. This means that both effects are included in the association.
Our first goal is to determine which of the possible reasons for the association holds. If we conclude that it is due to a causal effect, then our next goal is to determine the size of the effect. If we conclude that the association is due to causal effect confounded with the effect of a lurking variable, then our next goal becomes determining the sizes of both the effects.
In the Middle Ages, science was deduced from principles set down many centuries earlier by authorities such as Aristotle. The idea that scientific theories should be tested against real world data revolutionized thinking. This way of thinking known as the scientific method sparked the Renaissance.
The scientific method rests on the following premises:
A scientific hypothesis can never be shown to be absolutely true.
However, it must potentially be disprovable.
It is a useful model until it is established that it is not true.
Always go for the simplest hypothesis, unless it can be shown to be false.
This last principle, elaborated by William of Ockham in the 13th century, is now known as Ockham’s razor and is firmly embedded in science. It keeps science from developing fanciful overly elaborate theories. Thus the scientific method directs us through an improving sequence of models, as previous ones get falsified. The scientific method generally follows the following procedure:
Ask a question or pose a problem in terms of the current scientific hypothesis.
Gather all the relevant information that is currently available. This includes the current knowledge about parameters of the model.
Design an investigation or experiment that addresses the question from step 1. The predicted outcome of the experiment should be one thing if the current hypothesis is true, and something else if the hypothesis is false.
Gather data from the experiment.
Draw conclusions given the experimental results. Revise the knowledge about the parameters to take the current results into account.
The scientific method searches for cause-and-effect relationships between an experimental variable and an outcome variable. In other words, how changing the experimental variable results in a change to the outcome variable. Scientific modeling develops mathematical models of these relationships. Both of them need to isolate the experiment from outside factors that could affect the experimental results. All outside factors that can be identified as possibly affecting the results must be controlled. It is no coincidence that the earliest successes for the method were in physics and chemistry where the few outside factors could be identified and controlled. Thus there were no lurking variables. All other relevant variables could be identified and could then be physically controlled by being held constant. That way they would not affect results of the experiment, and the effect of the experimental variable on the outcome variable could be determined. In biology, medicine, engineering, technology, and the social sciences it is not that easy to identify the relevant factors that must be controlled. In those fields a different way to control outside factors is needed, because they cannot be identified beforehand and physically controlled.
Statistical methods of inference can be used when there is random variability in the data. The probability model for the data is justified by the design of the investigation or experiment. This can extend the scientific method into situations where the relevant outside factors cannot even be identified. Since we cannot identify these outside factors, we cannot control them directly. The lack of direct control means the outside factors will be affecting the data. There is a danger that the wrong conclusions could be drawn from the experiment due to these uncontrolled outside factors.
The important statistical idea of randomization has been developed to deal with this possibility. The unidentified outside factors can be “averaged out” by randomly assigning each unit to either treatment or control group. This contributes variability to the data. Statistical conclusions always have some uncertainty or error due to variability in the data. We can develop a probability model of the data variability based on the randomization used. Randomization not only reduces this uncertainty due to outside factors, it also allows us to measure the amount of uncertainty that remains using the probability model. Randomization lets us control the outside factors statistically, by averaging out their effects.
Underlying this is the idea of a statistical population, consisting of all possible values of the observations that could be made. The data consists of observations taken from a sample of the population. For valid inferences about the population parameters from the sample statistics, the sample must be “representative” of the population. Amazingly, choosing the sample randomly is the most effective way to get representative samples!
There are two main philosophical approaches to statistics. The first is often referred to as the frequentist approach. Sometimes it is called the classical approach. Procedures are developed by looking at how they perform over all possible random samples. The probabilities do not relate to the particular random sample that was obtained. In many ways this indirect method places the “cart before the horse.”
The alternative approach that we take in this book is the Bayesian approach. It applies the laws of probability directly to the problem. This offers many fundamental advantages over the more commonly used frequentist approach. We will show these advantages over the course of the book.
Most introductory statistics books take the frequentist approach to statistics, which is based on the following ideas:
Parameters, the numerical characteristics of the population, are fixed but unknown constants.
Probabilities are always interpreted as long-run relative frequency.
Statistical procedures are judged by how well they perform in the long run over an infinite number of hypothetical repetitions of the experiment.
Probability statements are only allowed for random quantities. The unknown parameters are fixed, not random, so probability statements cannot be made about their value. Instead, a sample is drawn from the population, and a sample statistic is calculated. The probability distribution of the statistic over all possible random samples from the population is determined and is known as the sampling distribution of the statistic. A parameter of the population will also be a parameter of the sampling distribution. The probability statement that can be made about the statistic based on its sampling distribution is converted to a confidence statement about the parameter. The confidence is based on the average behavior of the procedure over all possible samples.
The Reverend Thomas Bayes first discovered the theorem that now bears his name. It was written up in a paper An Essay Towards Solving a Problem in the Doctrine of Chances. This paper was found after his death by his friend Richard Price, who had it published posthumously in the Philosophical Transactions of the Royal Society in 1763 (1763). Bayes showed how inverse probability could be used to calculate probability of antecedent events from the occurrence of the consequent event. His methods were adopted by Laplace and other scientists in the 19th century, but had largely fallen from favor by the early 20th century. By the middle of the 20th century, interest in Bayesian methods had been renewed by de Finetti, Jeffreys, Savage, and Lindley, among others. They developed a complete method of statistical inference based on Bayes’ theorem.
This book introduces the Bayesian approach to statistics. The ideas that form the basis of the this approach are:
Since we are uncertain about the true value of the parameters, we will consider them to be random variables.
The rules of probability are used directly to make inferences about the parameters.
Probability statements about parameters must be interpreted as “degree of belief.” The
prior distribution
must be subjective. Each person can have his/her own prior, which contains the relative weights that person gives to every possible parameter value. It measures how “plausible” the person considers each parameter value to be before observing the data.
We revise our beliefs about parameters after getting the data by using Bayes’ theorem. This gives our
posterior distribution
which gives the relative weights we give to each parameter value after analyzing the data. The posterior distribution comes from two sources: the prior distribution and the observed data.
This has a number of advantages over the conventional frequentist approach. Bayes’ theorem is the only consistent way to modify our beliefs about the parameters given the data that actually occurred. This means that the inference is based on the actual occurring data, not all possible data sets that might have occurred but did not! Allowing the parameter to be a random variable lets us make probability statements about it, posterior to the data. This contrasts with the conventional approach where inference probabilities are based on all possible data sets that could have occurred for the fixed parameter value. Given the actual data, there is nothing random left with a fixed parameter value, so one can only make confidence statements, based on what could have occurred. Bayesian statistics also has a general way of dealing with a nuisance parameter. A nuisance parameter is one which we do not want to make inference about, but we do not want them to interfere with the inferences we are making about the main parameters. Frequentist statistics does not have a general procedure for dealing with them. Bayesian statistics is predictive, unlike conventional frequentist statistics. This means that we can easily find the conditional probability distribution of the next observation given the sample data.
In frequentist statistics, the parameter is considered a fixed, but unknown, constant. A statistical procedure such as a particular estimator for the parameter cannot be judged from the value it gives. The parameter is unknown, so we can not know the value the estimator should be giving. If we knew the value of the parameter, we would not be using an estimator.
Instead, statistical procedures are evaluated by looking how they perform in the long run over all possible samples of data, for fixed parameter values over some range. For instance, we fix the parameter at some value. The estimator depends on the random sample, so it is considered a random variable having a probability distribution. This distribution is called the sampling distribution of the estimator, since its probability distribution comes from taking all possible random samples. Then we look at how the estimator is distributed around the parameter value. This is called sample space averaging. Essentially it compares the performance of procedures before we take any data.
Bayesian procedures consider the parameter to be a random variable, and its posterior distribution is conditional on the sample data that actually occurred, not all those samples that were possible but did not occur. However, before the experiment, we might want to know how well the Bayesian procedure works at some specific parameter values in the range.
To evaluate the Bayesian procedure using sample space averaging, we have to consider the parameter to be both a random variable and a fixed but unknown value at the same time. We can get past the apparent contradiction in the nature of the parameter because the probability distribution we put on the parameter measures our uncertainty about the true value. It shows the relative belief weights we give to the possible values of the unknown parameter! After looking at the data, our belief distribution over the parameter values has changed. This way we can think of the parameter as a fixed, but unknown, value at the same time as we think of it being a random variable. This allows us to evaluate the Bayesian procedure using sample space averaging. This is called pre-posterior analysis because it can be done before we obtain the data.
In Chapter 4, we will find out that the laws of probability are the best way to model uncertainty. Because of this, Bayesian procedures will be optimal in the post-data setting, given the data that actually occurred. In Chapters 9 and 11, we will see that Bayesian procedures perform very well in the pre-data setting when evaluated using pre-posterior analysis. In fact, it is often the case that Bayesian procedures outperform the usual frequentist procedures even in the pre-data setting.
Monte Carlo studies are a useful way to perform sample space averaging. We draw a large number of samples randomly using the computer and calculate the statistic (frequentist or Bayesian) for each sample. The empirical distribution of the statistic (over the large number of random samples) approximates its sampling distribution (over all possible random samples). We can calculate statistics such as mean and standard deviation on this Monte Carlo sample to approximate the mean and standard deviation of the sampling distribution. Some small-scale Monte Carlo studies are included as exercises.
A very large proportion of undergraduates are required to take a service course in statistics. Almost all of these courses are based on frequentist ideas. Most of them do not even mention Bayesian ideas. As a statistician, I know that Bayesian methods have great theoretical advantages. I think we should be introducing our best students to Bayesian ideas, from the beginning. There are not many introductory statistics text books based on the Bayesian ideas. Some other texts include Berry (1996), Press (1989), and Lee (1989).
This book aims to introduce students with a good mathematics background to Bayesian statistics. It covers the same topics as a standard introductory statistics text, only from a Bayesian perspective. Students need reasonable algebra skills to follow this book. Bayesian statistics uses the rules of probability, so competence in manipulating mathematical formulas is required. Students will find that general knowledge of calculus is helpful in reading this book. Specifically they need to know that area under a curve is found by integrating, and that a maximum or minimum of a continuous differentiable function is found where the derivative of the function equals zero. However, the actual calculus used is minimal. The book is self-contained with a calculus appendix that students can refer to.
Chapter 2 introduces some fundamental principles of scientific data gathering to control the effects of unidentified factors. These include the need for drawing samples randomly, along with some random sampling techniques. The reason why there is a difference between the conclusions we can draw from data arising from an observational study and from data arising from a randomized experiment is shown. Completely randomized designs and randomized block designs are discussed.
Chapter 3 covers elementary methods for graphically displaying and summarizing data. Often a good data display is all that is necessary. The principles of designing displays that are true to the data are emphasized.
Chapter 4 shows the difference between deduction and induction. Plausible reasoning is shown to be an extension of logic where there is uncertainty. It turns out that plausible reasoning must follow the same rules as probability. The axioms of probability are introduced and the rules of probability, including conditional probability and Bayes’ theorem are developed.
Chapter 5 covers discrete random variables, including joint and marginal discrete random variables. The binomial, hypergeometric, and Poisson distributions are introduced, and the situations where they arise are characterized.
Chapter 6 covers Bayes’ theorem for discrete random variables using a table. We see that two important consequences of the method are that multiplying the prior by a constant, or that multiplying the likelihood by a constant do not affect the resulting posterior distribution. This gives us the “proportional form” of Bayes’ theorem. We show that we get the same results when we analyze the observations sequentially using the posterior after the previous observation as the prior for the next observation, as when we analyze the observations all at once using the joint likelihood and the original prior. We demonstrate Bayes’ theorem for binomial observations with a discrete prior and for Poisson observations with a discrete prior.
Chapter 7 covers continuous random variables, including joint, marginal, and conditional random variables. The beta, gamma, and normal distributions are introduced in this chapter.
Chapter 8 covers Bayes’ theorem for the population proportion (binomial) with a continuous prior. We show how to find the posterior distribution of the population proportion using either a uniform prior or a beta prior. We explain how to choose a suitable prior. We look at ways of summarizing the posterior distribution.
Chapter 9 compares the Bayesian inferences with the frequentist inferences. We show that the Bayesian estimator (posterior mean using a uniform prior) has better performance than the frequentist estimator (sample proportion) in terms of mean squared error over most of the range of possible values. This kind of frequentist analysis is useful before we perform our Bayesian analysis. We see the Bayesian credible interval has a much more useful interpretation than the frequentist confidence interval for the population proportion. Onesided and two-sided hypothesis tests using Bayesian methods are introduced.
Chapter 10 covers Bayes’ theorem for the Poisson observations with a continuous prior. The prior distributions used include the positive uniform, the Jeffreys’ prior, and the gamma prior. Bayesian inference for the Poisson parameter using the resulting posterior include Bayesian credible intervals and two-sided tests of hypothesis, as well as one-sided tests of hypothesis.
Chapter 11 covers Bayes’ theorem for the mean of a normal distribution with known variance. We show how to choose a normal prior. We discuss dealing with nuisance parameters by marginalization. The predictive density of the next observation is found by considering the population mean a nuisance parameter and marginalizing it out.
Chapter 12 compares Bayesian inferences with the frequentist inferences for the mean of a normal distribution. These comparisons include point and interval estimation and involve hypothesis tests including both the one-sided and the two-sided cases.
Chapter 13 shows how to perform Bayesian inferences for the difference between normal means and how to perform Bayesian inferences for the difference between proportions using the normal approximation.
Chapter 14 introduces the simple linear regression model and shows how to perform Bayesian inferences on the slope of the model. The predictive distribution of the next observation is found by considering both the slope and intercept to be nuisance parameters and marginalizing them out.
Chapter 15 introduces Bayesian inference for the standard deviation σ, when we have a random sample of normal observations with known mean μ. This chapter is at a somewhat higher level than the previous chapters and requires the use of the change-of-variable formula for densities. Priors used include positive uniform for standard deviation, positive uniform for variance, Jeffreys’ prior, and the inverse chi-squared prior. We discuss how to choose an inverse chi-squared prior that matches our prior belief about the median. Bayesian inferences from the resulting posterior include point estimates, credible intervals, and hypothesis tests including both the one-sided and two-sided cases.
Chapter 16 shows how we can make Bayesian inference robust against a misspecified prior by using a mixture prior and marginalizing out the mixture parameter. This chapter is also at a somewhat higher level than the others, but it shows how one of the main dangers of Bayesian analysis can be avoided.
Chapter 17 returns to the problem we discussed in Chapter 11 — that is, of making inferences about the mean of a normal distribution. In this chapter, however, we explicitly model the unknown population standard deviation and show how the approximations we suggested in Chapter 11 are exactly true. We also deal with the two sample cases so that inference can be performed on the difference between two means.
Chapter 18 introduces the multivariate normal distribution and extends the theory from Chapters 11 and 17 to the multivariate case. The multivariate normal distribution is essential for the discussion of linear models and, in particular, multiple regression.
