120,99 €
Updated classic statistics text, with new problems and examples Probability and Statistical Inference, Third Edition helps students grasp essential concepts of statistics and its probabilistic foundations. This book focuses on the development of intuition and understanding in the subject through a wealth of examples illustrating concepts, theorems, and methods. The reader will recognize and fully understand the why and not just the how behind the introduced material. In this Third Edition, the reader will find a new chapter on Bayesian statistics, 70 new problems and an appendix with the supporting R code. This book is suitable for upper-level undergraduates or first-year graduate students studying statistics or related disciplines, such as mathematics or engineering. This Third Edition: * Introduces an all-new chapter on Bayesian statistics and offers thorough explanations of advanced statistics and probability topics * Includes 650 problems and over 400 examples - an excellent resource for the mathematical statistics class sequence in the increasingly popular "flipped classroom" format * Offers students in statistics, mathematics, engineering and related fields a user-friendly resource * Provides practicing professionals valuable insight into statistical tools Probability and Statistical Inference offers a unique approach to problems that allows the reader to fully integrate the knowledge gained from the text, thus, enhancing a more complete and honest understanding of the topic.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 1071
Veröffentlichungsjahr: 2020
Cover
Wiley Series in Probability and Statistics
Probability and Statistical Inference
Copyright
Dedication
Preface to Third Edition
Preface to Second Edition
About the Companion Website
Chapter 1: Experiments, Sample Spaces, and Events
1.1 Introduction
1.2 Sample Space
1.3 Algebra of Events
1.4 Infinite Operations on Events
Notes
Chapter 2: Probability
2.1 Introduction
2.2 Probability as a Frequency
2.3 Axioms of Probability
2.4 Consequences of the Axioms
2.5 Classical Probability
2.6 Necessity of the Axioms*
2.7 Subjective Probability*
Note
Chapter 3: Counting
3.1 Introduction
3.2 Product Sets, Orderings, and Permutations
3.3 Binomial Coefficients
3.4 Multinomial Coefficients
Notes
Chapter 4: Conditional Probability, Independence, and Markov Chains
4.1 Introduction
4.2 Conditional Probability
4.3 Partitions; Total Probability Formula
4.4 Bayes' Formula
4.5 Independence
4.6 Exchangeability; Conditional Independence
4.7 Markov Chains*
Note
Chapter 5: Random Variables: Univariate Case
5.1 Introduction
5.2 Distributions of Random Variables
5.3 Discrete and Continuous Random Variables
5.4 Functions of Random Variables
5.5 Survival and Hazard Functions
Notes
Chapter 6: Random Variables: Multivariate Case
6.1 Bivariate Distributions
6.2 Marginal Distributions; Independence
6.3 Conditional Distributions
6.4 Bivariate Transformations
6.5 Multidimensional Distributions
Chapter 7: Expectation
7.1 Introduction
7.2 Expected Value
7.3 Expectation as an Integral*
7.4 Properties of Expectation
7.5 Moments
7.6 Variance
7.7 Conditional Expectation
7.8 Inequalities
Chapter 8: Selected Families of Distributions
8.1 Bernoulli Trials and Related Distributions
8.2 Hypergeometric Distribution
8.3 Poisson Distribution and Poisson Process
8.4 Exponential, Gamma, and Related Distributions
8.5 Normal Distribution
8.6 Beta Distribution
Notes
Chapter 9: Random Samples
9.1 Statistics and Sampling Distributions
9.2 Distributions Related to Normal
9.3 Order Statistics
9.4 Generating Random Samples
9.5 Convergence
9.6 Central Limit Theorem
Notes
Chapter 10: Introduction to Statistical Inference
10.1 Overview
10.2 Basic Models
10.3 Sampling
10.4 Measurement Scales
Notes
Chapter 11: Estimation
11.1 Introduction
11.2 Consistency
11.3 Loss, Risk, and Admissibility
11.4 Efficiency
11.5 Methods of Obtaining Estimators
11.6 Sufficiency
11.7 Interval Estimation
Notes
Chapter 12: Testing Statistical Hypotheses
12.1 Introduction
12.2 Intuitive Background
12.3 Most Powerful Tests
12.4 Uniformly Most Powerful Tests
12.5 Unbiased Tests
12.6 Generalized Likelihood Ratio Tests
12.7 Conditional Tests
12.8 Tests and Confidence Intervals
12.9 Review of Tests for Normal Distributions
12.10 Monte Carlo, Bootstrap, and Permutation Tests
Notes
Chapter 13: Linear Models
13.1 Introduction
13.2 Regression of the First and Second Kind
13.3 Distributional Assumptions
13.4 Linear Regression in the Normal Case
13.5 Testing Linearity
13.6 Prediction
13.7 Inverse Regression
13.8 BLUE
13.9 Regression Toward the Mean
13.10 Analysis of Variance
13.11 One‐Way Layout
13.12 Two‐Way Layout
13.13 ANOVA Models with Interaction
13.14 Further Extensions
Notes
Chapter 14: Rank Methods
14.1 Introduction
14.2 Glivenko–Cantelli Theorem
14.3 Kolmogorov–Smirnov Tests
14.4 One‐Sample Rank Tests
14.5 Two‐Sample Rank Tests
14.6 Kruskal–Wallis Test
Note
Chapter 15: Analysis of Categorical Data
15.1 Introduction
15.2 Chi‐Square Tests
15.3 Homogeneity and Independence
15.4 Consistency and Power
15.5 2 x 2 Contingency Tables
15.6
R x C
Contingency Tables
Chapter 16: Basics of Bayesian Statistics
16.1 Introduction
16.2 Prior and Posterior Distributions
16.3 Bayesian Inference
16.4 Final Comments
Notes
APPENDIX A: APPENDIX ASupporting R Code
APPENDIX B: APPENDIX BStatistical Tables
Bibliography
Answers to Odd‐Numbered Problems
Index
End User License Agreement
Chapter 1
Table 1.1 Outcomes on a pair of dice.
Chapter 14
Table 14.1 The data for Example 14.2.
Table 14.2
Table 14.3
Table 14.4
1
Table A.1 Selected Probabili distributions
Table A.2 Densities/pdf's and cdf's of Selected distributions
Table A.3 Quantiles and random number generation of Selected distributions
Table A.4 Basic statistics for data values stored in vector
Table A.5 Basic graphical methods
2
Table B.1 Quantiles of the chi‐square distribution for determining the shorte...
Table B.2 Tail probabilities
of the Kolmogorov distribution
Chapter 1
Figure 1.1 Possible seatings of persons A and B at a square table.
Figure 1.2 Possible seatings of any two persons at a square table.
Figure 1.3 Possible seatings of one person if the place of the other person ...
Figure 1.4 Scheme of a randomized response.
Figure 1.5 Complement, union, and intersection.
Figure 1.6 The first De Morgan's law.
Figure 1.7 Complement of a Rectangle.
Chapter 2
Figure 2.1 Hitting a target.
Figure 2.2 First solution of Bertrand's problem.
Figure 2.3 Second solution of Bertrand's problem.
Figure 2.4 Third solution of Bertrand's problem.
Figure 2.5 Explanation of Bertrand's paradox.
Figure 2.6 Union of two events.
Figure 2.7 Union of three events.
Chapter 3
Figure 3.1 Pascal's triangle.
Figure 3.2 Process of counting votes.
Figure 3.3 Reflection principle.
Chapter 4
Figure 4.1 Possible results of the two first draws in Example 4.5.
Figure 4.2 Transitions in model of epidemic.
Chapter 5
Figure 5.1 Cdf of the distance from the center of the target.
Figure 5.2 Cdf of the number of heads in three tosses of a coin.
Figure 5.3 Cdf of random variable X.
Figure 5.4 Cdf of distribution uniform on
.
Chapter 6
Figure 6.1 Shadows and sections of domain
of integration.
Figure 6.2 Support of density
and the set
.
Figure 6.3 A function that is not a cdf but satisfies (a)–(d).
Figure 6.4 Marginal density.
Figure 6.5 Condition for dependence.
Figure 6.6 Probability of better of two attempts exceeding 0.75.
Figure 6.7 Three‐component system.
Figure 6.8 Joint distribution of
and
.
Figure 6.9 Options for marginal densities of
and
.
Figure 6.10 Conditional densities.
Figure 6.11 Triangular density.
Figure 6.12 Supports of
and
.
Figure 6.13 Approximations of two conditioning events.
Figure 6.14 First two generations in the process of grinding.
Chapter 7
Figure 7.1 Interpretation of expected value of a discrete random variable.
Figure 7.2 Interpretation of expected value of a continuous random variable....
Figure 7.3 Approximating sums for Riemann and Lebesgue integrals.
Figure 7.4 Graph of
and its cdf.
Figure 7.5 Nonintegrable function whose iterated integrals exist and are not...
Figure 7.6 Nonintegrable function whose iterated integrals exist and are equ...
Figure 7.7 Length of 16 feet (Drawing by S. Niewiadomski).
Figure 7.8 Two weightings of A and B.
Figure 7.9 Dependent but uncorrelated random variables.
Chapter 8
Figure 8.1 Flowchart.
Figure 8.2 Series system.
Figure 8.3 Parallel system.
Figure 8.4 Series‐parallel system.
Figure 8.5 Flowchart.
Figure 8.6 Two regression lines.
Figure 8.7 Shapes of beta distributions.
Chapter 10
Figure 10.1 Lifetime T(t*) of a bulb.
Chapter 11
Figure 11.1 Likelihood function for the range
in uniform distribution.
Figure 11.2 Likelihood function for five Bernoulli trials.
Chapter 12
Figure 12.1 Power functions
and
.
Figure 12.2 Power functions of tests
and
.
Figure 12.3 Power of the two‐sided test
.
Figure 12.4 Partition into sets
.
Figure 12.5 Power functions of a one‐sided UMP test (dashed line) and a UMPU...
Figure 12.6 Power function of an unbiased test.
Figure 12.7 Golden rectangles.
Chapter 13
Figure 13.1 True regression.
Figure 13.2 Linear regression for uniform distribution of
.
Figure 13.3 Ages of Polish kings and their heirs at death.
Figure 13.4 (a) No effects of
or
. (b) No effect of
, effect of
, no in...
Chapter 16
Figure 16.1 Posterior densities (dashed line) for different prior densities ...
Figure 16.2 The 90% central credible interval (dotted line) and the 90% high...
1
Figure A.1 The densities of BETA(2, 5), BETA(3, 3), and BETA(1, 4) distribut...
Figure A.2 Histogram of the data generated from the BETA(2, 3) distribution ...
Figure A.3 Histogram of the data generated from the BETA(2, 3) distribution ...
Wiley Series in Probability and Statistics
Title Page
Copyright
Dedication
Preface to Third Edition
Preface to Second Edition
About the Companion Website
Table of Contents
Begin Reading
APPENDIX A Supporting R Code
APPENDIX B Statistical Tables
Answers to Odd‐Numbered Problems
Index
WILEY END USER LICENSE AGREEMENT
ii
iv
v
xi
xii
xiii
xiv
xv
xvi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
545
546
547
548
549
550
551
552
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
Established by Walter A. Shewhart and Samuel S. Wilks
Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Geof H. Givens,Harvey Goldstein, Geert Molenberghs, David W. Scott, Adrian F. M. Smith, Ruey S. Tsay
Editors Emeriti: J. Stuart Hunter, Iain M. Johnstone, Joseph B. Kadane, Jozef L. Teugels
The Wiley Series in Probability and Statistics is well established and authoritative. It covers many topics of current research interest in both pure and applied statistics and probability theory. Written by leading statisticians and institutions, the titles span both state‐of‐the‐art developments in the field and classical methods.
Reflecting the wide range of current research in statistics, the series encompasses applied, methodological and theoretical statistics, ranging from applications and new techniques made possible by advances in computerized practice to rigorous treatment of theoretical approaches.
This series provides essential and invaluable reading for all statisticians, whether in academia, industry, government, or research.
A complete list of titles in this series can be found at http://www.wiley.com/go/wsps
Third Edition
Magdalena Niewiadomska‐Bugaj
Robert Bartoszyński†
This edition first published 2021
© 2021 John Wiley & Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Magdalena Niewiadomska‐Bugaj and Robert Bartoszyński to be identified as the authors of this work has been asserted in accordance with law.
Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Niewiadomska‐Bugaj, Magdalena, author. | Bartoszyński, Robert,
author.
Title: Probability and statistical inference / Magdalena
Niewiadomska‐Bugaj, Robert Bartoszyński.
Description: Third edition. | Hoboken, NJ : Wiley‐Interscience, 2021. |
Revised edition of: Probability and statistical inference / Robert
Bartoszyński, Magdalena Niewiadomska‐Bugaj. 2nd ed. c2008. | Includes
bibliographical references and index.
Identifiers: LCCN 2020021071 (print) | LCCN 2020021072 (ebook) | ISBN
9781119243809 (cloth) | ISBN 9781119243816 (adobe pdf) | ISBN
9781119243823 (epub)
Subjects: LCSH: Probabilities. | Mathematical statistics.
Classification: LCC QA273 .B2584 2021 (print) | LCC QA273 (ebook) | DDC
519.5/4--dc23
LC record available at https://lccn.loc.gov/2020021071
LC ebook record available at https://lccn.loc.gov/2020021072
Cover Design: Wiley
Cover Images: (graph) Courtesy of Magdalena Niewiadomska‐Bugaj, Colorful abstract long exposure pictures © Artur Debat/Getty Images
To my parents
– MNB
You have in front of you the third edition of the “Probability and Statistical Inference,” a text originally published in 1996. I have been using this book in the classroom since then, and it has always been interesting to see how it serves the students, how they react to it, and what could still be done to make it better. These reflections prompted me to prepare a second edition, published in 2007. But academia is changing quickly; who the students are is changing, and how we should teach to help them learn is changing as well. This is what made me consider a third edition. The response from Wiley Publishing was positive and my work began.
There were three main changes that I saw as necessary. First, adding a chapter on the basics of Bayesian statistics, as I realized that upper level undergraduate students and graduate students needed an earlier introduction to Bayesian inference. Another change was to make the book more appropriate for the flipped classroom format. I have experimented with it for three years now and it is working quite well. The book introduces and illustrates concepts through more than 400 examples. Preparing the material mainly at home gives students more time in class for questions, discussion, and for problem solving. I have also added over 70 new problems to make the selection easier for the instructor. A third change was including an appendix with an R code that would help students complete projects and homework assignments. My two‐semester class based on this text includes three projects. The first one –in the fall semester–has students present applications of selected distributions, including graphics. Two projects for the spring semester involve resampling methods. The necessary R code is included in the appendix.
There are many people to whom I owe my thanks. First, I would like to thank Wiley Editor Jon Gurstelle, who liked the idea of preparing the third edition. After Jon accepted another job elsewhere, the book and I came under the excellent care of the Editorial Teams of Mindy Okura‐Mokrzycki, Kathleen Santoloci, Linda Christina, and Kimberly Monroe‐Hill who have supported me throughout this process. I would also like to thank Carla Koretsky, the Dean of the College of Arts and Sciences at Western Michigan University, and WMU Provost, Sue Stapleton, for granting me a semester‐long administrative sabbatical leave that significantly sped up the progress of the book.
I am indebted to several of my students for their valuable comments. I am also grateful to my departmental colleagues, especially Hyun Bin Kang and Duy Ngo, who used the text in class and gave me their valuable feedback. Hyun Bin also helped me with the formatting of the R code in the appendix. Finally, I thank my husband, Jerzy, for his support and encouragement.
MNB
November 2020
The first edition of this book was published in 1996. Since then, powerful computers have come into wide use, and it became clear that our text should be revised and material on computer‐intensive methods of statistical inference should be added. To my delight, Steve Quigley, Executive Editor of John Wiley and Sons, agreed with the idea, and work on the second edition began.
Unfortunately, Robert Bartoszyński passed away in 1998, so I was left to carry out this revision by myself. I revised the content by creating a new chapter on random samples, adding sections on Monte Carlo methods, bootstrap estimators and tests, and permutation tests. More problems were added, and existing ones were reorganized. Hopefully nothing was lost of the “spirit” of the book which Robert liked so much and of which he was very proud.
This book is intended for seniors or first‐year graduate students in statistics, mathematics, natural sciences, engineering, and any other major where an intensive exposure to statistics is necessary. The prerequisite is a calculus sequence that includes multivariate calculus. We provide the material for a two‐semester course that starts with the necessary background in probability theory, followed by the theory of statistics.
What distinguishes our book from other texts is the way the material is presented and the aspects that are stressed. To put it succinctly, understanding “why” is prioritized over the skill of “how to.” Today, in an era of undreamed‐of computational facilities, a reflection in an attempt to understand is not a luxury but a necessity.
Probability theory and statistics are presented as self‐contained conceptual structures. Their value as a means of description and inference about real‐life situations lies precisely in their level of abstraction—the more abstract a concept is, the wider is its applicability. The methodology of statistics comes out most clearly if it is introduced as an abstract system illustrated by a variety of real‐life applications, not confined to any single domain.
Depending on the level of the course, the instructor can select topics and examples, both in the theory and in applications. These can range from simple illustrations of concepts, to introductions of whole theories typically not included in comparable textbooks (e.g., prediction, extrapolation, and filtration in time series as examples of use of the concepts of covariance and correlation). Such additional, more advanced, material (e.g., Chapter 5 on Markov Chains) is marked with asterisks. Other examples are the proof of the extension theorem (Theorem 6.2.4), showing that the cumulative distribution function determines the measure on the line; the construction of Lebesgue, Riemann–Stieltjes, and Lebesgue–Stieltjes integrals; and the explanation of the difference between double integral and iterated integrals (Section 8.3).
In the material that is seldom included in other textbooks on mathematical statistics, we stress the consequences of nonuniqueness of a sample space and illustrate, by examples, how the choice of a sample space can facilitate the formulation of some problems (e.g., issues of selection or randomized response). We introduce the concept of conditioning with respect to partition (Section 4.4); we explain the Borel–Kolmogorov paradox by way of the underlying measurement process that provides information on the occurrence of the condition (Example 7.22); we present the Neyman–Scott theory of outliers (Example 10.4); we give a new version of the proof of the relation between mean, median, and standard deviation (Theorem 8.7.3); we show another way of conditioning in the secretary problem (Example 4.10). Among examples of applications, we discuss the strategies of serves in tennis (Problem 4.2.12), and a series of problems (3.2.14–3.2.20) concerning combinatorial analysis of voting power. In Chapter 11, we discuss the renewal paradox, the effects of importance sampling (Example 11.6), and the relevance of measurement theory for statistics (Section 11.6). Chapter 14 provides a discussion of true regression versus linear regression and concentrates mostly on explaining why certain procedures (in regression analysis and ANOVA) work, rather than on computational details. In Chapter 15, we provide a taste of rank methods—one line of research starting with the Glivenko–Cantelli Theorem and leading to Kolmogorov–Smirnov tests, and the other line leading to Mann‐Whitney and Wilcoxon tests. In this chapter, we also show the traps associated with multiple tests of the same hypothesis (Example 15.3). Finally, Chapter 16 contains information on partitioning contingency tables—the method that provides insight into the dependence structure. We also introduce McNemar's test and various indices of association for tables with ordered categories.
The backbone of the book is the examples used to illustrate concepts, theorems, and methods. Some examples raise the possibilities of extensions and generalizations, and some simply point out the relevant subtleties.
Another feature that distinguishes our book from most other texts is the choice of problems. Our strategy was to integrate the knowledge students acquired thus far, rather than to train them in a single skill or concept. The solution to a problem in a given section may require using knowledge from some preceding sections, that is, reaching back into material already covered. Most of the problems are intended to make the students aware of facts they might otherwise overlook. Many of the problems were inspired by our teaching experience and familiarity with students' typical errors and misconceptions.
Finally, we hope that our book will be “friendly” for students at all levels. We provide (hopefully) lucid and convincing explanations and motivations, pointing out the difficulties and pitfalls of arguments. We also do not want good students to be left alone. The material in starred chapters, sections, and examples can be skipped in the main part of the course, but used at will by interested students to complement and enhance their knowledge. The book can also be a useful reference, or source of examples and problems, for instructors who teach courses from other texts.
I am indebted to many people without whom this book would not have reached its current form. First, thank you to many colleagues who contributed to the first edition and whose names are listed there. Comments of many instructors and students who used the first edition were influential in this revision. I wish to express my gratitude to Samuel Kotz for referring me to Stigler's (1986) article about the “right and lawful rood,” which we previously used in the book without reference (Example 8.40). My sincere thanks are due to Jung Chao Wang for his help in creating the R‐code for computer‐intensive procedures that, together with additional examples, can be found on the book's ftp site
ftp://ftp.wiley.com/public/sc_tech_med/probability_statistical.
Particular thanks are due to Katarzyna Bugaj for careful proofreading of the entire manuscript, Łukasz Bugaj for meticulously creating all figures with the Mathematica software, and Agata Bugaj for her help in compiling the index. Changing all those diapers has finally paid off.
I wish to express my appreciation to the anonymous reviewers for supporting the book and providing valuable suggestions, and to Steve Quigley, Executive Editor of John Wiley & Sons, for all his help and guidance in carrying out the revision.
Finally, I would like to thank my family, especially my husband Jerzy, for their encouragement and support during the years this book was being written.
Magdalena Niewiadomska‐Bugaj
October 2007
This book is accompanied by a companion website:
www.wiley.com/go/probabilityandstatisticalinference3e
The website includes the Instructor's Solution Manual and will be live in early 2021.
The consequences of making a decision today often depend on what will happen in the future, at least on the future that is relevant to the decision. The main purpose of using statistical methods is to help in making better decisions under uncertainty.
Judging from the failures of weather forecasts, to more spectacular prediction failures, such as bankruptcies of large companies and stock market crashes, it would appear that statistical methods do not perform very well. However, with a possible exception of weather forecasting, these examples are, at best, only partially statistical predictions. Moreover, failures tend to be better remembered than successes. Whatever the case, statistical methods are at present, and are likely to remain indefinitely, our best and most reliable prediction tools.
To make decisions under uncertainty, one usually needs to collect some data. Data may come from past experiences and observations, or may result from some controlled processes, such as laboratory or field experiments. The data are then used to hypothesize about the laws (often called mechanisms) that govern the fragment of reality of interest. In our book, we are interested in laws expressed in probabilistic terms: They specify directly, or allow us to compute, the chances of some events to occur. Knowledge of these chances is, in most cases, the best one can get with regard to prediction and decisions.
Probability theory is a domain of pure mathematics and as such, it has its own conceptual structure. To enable a variety of applications (typically comprising of all areas of human endeavor, ranging from biological, medical, social and physical sciences, to engineering, humanities, business, etc.), such structure must be kept on an abstract level. An application of probability to the particular situation analyzed requires a number of initial steps in which the elements of the real situation are interpreted as abstract concepts of probability theory. Such interpretation is often referred to as building a probabilistic model of the situation at hand. How well this is done is crucial to the success of application.
One of the main concepts here is that of an experiment—a term used in a broad sense. It means any process that generates data which is influenced, at least in part, by chance.
In analyzing an experiment, one is primarily interested in its outcome—the concept that is not defined (i.e., a primitive concept) but has to be specified in every particular application. This specification may be done in different ways, with the only requirements being that (1) outcomes exclude one another and (2) they exhaust the set of all logical possibilities.
Consider an experiment consisting of two tosses of a regular die. An outcome is most naturally represented by a pair of numbers that turn up on the upper faces of the die so that they form a pair , with (see Table 1.1).
Table 1.1 Outcomes on a pair of dice.
1
2
3
4
5
6
1
(1, 1)
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(1, 6)
2
(2, 1)
(2, 2)
(2, 3)
(2, 4)
(2, 5)
(2, 6)
3
(3, 1)
(3, 2)
(3, 3)
(3, 4)
(3, 5)
(3, 6)
4
(4, 1)
(4, 2)
(4, 3)
(4, 4)
(4, 5)
(4, 6)
5
(5, 1)
(5, 2)
(5, 3)
(5, 4)
(5, 5)
(5, 6)
6
(6, 1)
(6, 2)
(6, 3)
(6, 4)
(6, 5)
(6, 6)
In the case of an experiment of tossing a die three times, the outcomes will be triplets , with , , and being integers between 1 and 6.
Since the outcome of an experiment is not known in advance, it is important to determine the set of all possible outcomes. This set, called the sample space, forms the conceptual framework for all further considerations of probability.
Definition 1.2.1 The sample space, denoted by , is the set of all outcomes of an experiment. The elements of the sample space are called elementary outcomes, or sample points.
In Example 1.1, the sample space has sample points in the case of two tosses, and points in the case of three tosses of a die. The first statement can be verified by a direct counting of the elements of the sample space. Similar verification of the second claim, although possible in principle, would be cumbersome. In Chapter 3, we will introduce some methods of determining the sizes of sets without actually counting sample points.
Suppose that the only available information about the numbers, those that turn up on the upper faces of the die, is their sum. In such a case as outcomes, we take 11 possible values of the sum so that
For instance, all outcomes on the diagonal of Table 1.1—(6, 1), (5, 2), (4, 3), (3, 4), (2, 5), and (1, 6)—are represented by the same value 7.
If we are interested in the number of accidents that occur at a given intersection within a month, the sample space might be taken as the set consisting of all nonnegative integers. Realistically, there is a practical limit, say , of the monthly numbers of accidents at this particular intersection. Although one may think that it is simpler to take the sample space it turns out that it is often much simpler to take the infinite sample space if the “practical bound” is not very precise.
Since outcomes can be specified in various ways (as illustrated by Examples 1.1 and 1.3), it follows that the same experiment can be described in terms of different sample spaces . The choice of a sample space depends on the goal of description. Moreover, certain sample spaces for the same experiment lead to easier and simpler analysis. The choice of a “better” sample space requires some skill, which is usually gained through experience. The following two examples illustrate this point.
Let the experiment consist of recording the lifetime of a piece of equipment, say a light bulb. An outcome here is the time until the bulb burns out. An outcome typically will be represented by a number ( if the bulb is not working at the start), and therefore is the nonnegative part of the real axis. In practice, is measured with some precision (in hours, days, etc.), so one might instead take . Which of these choices is better depends on the type of subsequent analysis.
Two persons enter a cafeteria and sit at a square table, with one chair on each of its sides. Suppose we are interested in the event “they sit at a corner” (as opposed to sitting across from one another). To construct the sample space, we let A and B denote the two persons, and then take as the set of outcomes represented by 12 ideograms in Figure 1.1. One could argue, however, that such a sample space is unnecessarily large. If we are interested only in the event “they sit at a corner,” then there is no need to label the persons as A and B. Accordingly, the sample space may be reduced to the set of six outcomes depicted in Figure 1.2. But even this sample space can be simplified. Indeed, one could use the rotational symmetry of the table and argue that once the first person selects a chair (it does not matter which one), then the sample space consists of just three chairs remaining for the second person (see Figure 1.3).
Figure 1.1 Possible seatings of persons A and B at a square table.
Figure 1.2 Possible seatings of any two persons at a square table.
Figure 1.3 Possible seatings of one person if the place of the other person is fixed.
Sample spaces can be classified according to the number of sample points they contain. Finite sample spaces contain finitely many outcomes, and elements of infinitely countable sample spaces can be arranged into an infinite sequence; other sample spaces are called uncountable.
The next concept to be introduced is that of an event. Intuitively, an event is anything about which we can tell whether or not it has occurred, as soon as we know the outcome of the experiment. This leads to the following definition:
Definition 1.2.2 An event is a subset of the sample space .
In Example 1.1 an event such as “the sum equals 7” containing six outcomes and is a subset of the sample space . In Example 1.3, the same event consists of one outcome, 7.
When an experiment is performed, we observe its outcome. In the interpretation developed in this chapter, this means that we observe a point chosen randomly from the sample space. If this point belongs to the subset representing the event , we say that the event A has occurred.
We will let events be denoted either by letters , possibly with identifiers, such as or by more descriptive means, such as and , where and are some numerical attributes of the sample points (formally: random variables, to be discussed in Chapter 5). Events can also be described through verbal phrases, such as “two heads in a row occur before the third tail” in the experiment of repeated tosses of a coin.
In all cases considered thus far, we assumed that an outcome (a point in the sample space) can be observed. To put it more precisely, all sample spaces considered so far were constructed in such a way that their points were observable. Thus, for any event , we were always able to tell whether it occurred or not.
The following examples show experiments and corresponding sample spaces with sample points that are only partially observable:
Candidates for a certain job are characterized by their level of skills required for the job. The actual value of is not observable, though; what we observe is the candidate's score on a certain test. Thus, the sample point in is a pair , and only one coordinate of , , is observable.
The objective might be to find selection thresholds and , such that the rule: “accept all candidates whose score exceeds ” would lead to maximizing the (unobservable) number of persons accepted whose true level of skill exceeds . Naturally, to find such a solution, one needs to understand statistical relation between observable and unobservable .
Another example when the points in the sample space are only partially observable concerns studies of incidence of activities about which one may hesitate to respond truthfully, or even to respond at all. These are typically studies related to sexual habits or preferences, abortion, law and tax violation, drug use, and so on.
Let be the activity analyzed, and assume that the researcher is interested in the frequency of persons who ever participated in activity (for simplicity, we will call them ‐persons). It ought to be stressed that the objective is not to identify the ‐persons, but only to find the proportion of such persons in the population.
The direct question reduced to something like “Are you a ‐person?” is not likely to be answered truthfully, if at all. It is therefore necessary to make the respondent safe, guaranteeing that their responses will reveal nothing about them as regards . This can be accomplished as follows: The respondent is given a pair of distinguishable dice, for example, one green and one white. She throws them both at the same time, in such a way that the experimenter does not know the results of the toss (e.g., the dice are in a box and only the respondent looks into the box after it is shaken). The instruction is the following: If the green die shows an odd face (1, 3, or 5), then respond to the question “Are you a ‐person?” If the green die shows an even face (2, 4, or 6), then respond to the question, “Does the white die show an ace?” The scheme of this response is summarized by the flowchart in Figure 1.4.
Figure 1.4 Scheme of a randomized response.
The interviewer knows the answer “yes” or “no” but does not know whether it is the answer to the question about or the question about the white die. Here a natural sample space consists of points where and are outcomes on green and white die, respectively, while is 1 or 0 depending on whether or not the respondent is a ‐person. We have = “yes” if and or 5 for any , or if and for any . In all other cases, “no.”
One could wonder what is a possible advantage, if any, of not knowing the question asked and observing only the answer. This does not make sense if we need to know the truth about each individual respondent. However, one should remember that we are only after the overall frequency of ‐persons.
We are in fact “contaminating” the question by making the respondent answer either a ‐question or some other auxiliary question. But this is a “controlled contamination”: we know how often (on average) the respondents answer the auxiliary question, and how often their answer is “yes.” Consequently, as we will see in Chapter 11, we can still make an inference about the proportion of ‐persons based on the observed responses.
1.2.1
List all sample points in sample spaces for the following experiments:
(i)
We toss a balanced coin.
1
If heads come up, we toss a die. Otherwise, we toss the coin two more times.
(ii)
A coin is tossed until the total of two tails occurs, but no more than four times (i.e., a coin is tossed until the second tail or fourth toss, whichever comes first).
1.2.2
Alice, Bob, Carl, and Diana enter the elevator on the first floor of a four‐story building. Each of them leaves the elevator on either the second, third, or fourth floor.
(i)
Describe the sample space without listing all sample points.
(ii)
List all sample points such that Carl and Diana leave the elevator on the third floor.
(iii)
List all sample points if Carl and Diana leave the elevator at the same floor.
1.2.3
An urn contains five chips, labeled
. Three chips are selected. List all outcomes included in the event “the second largest number drawn was 3.”
1.2.4
In a game of craps, the player rolls a pair of dice. If he gets a total of 7 or 11, he wins at once; if the total is 2, 3, or 12, he loses at once. Otherwise, the sum, say
, is his “point,” and he keeps rolling dice until either he rolls another
(in which case he wins) or he rolls a 7 in which case he loses. Describe the event “the player wins with a point of 5.”
1.2.5
The experiment consists of placing six balls in three boxes. List all outcomes in the sample space if:
(i)
The balls are indistinguishable, but the boxes are distinguishable. (
Hint
: There are 28 different placements.)
(ii)
Neither the balls nor the boxes are distinguishable.
(iii)
Two balls are white and four are red; the boxes are distinguishable.
1.2.6
John and Mary plan to meet each other. Each of them is to arrive at the meeting place at some time between 5 p.m. and 6 p.m. John is to wait 20 minutes (or until 6 p.m., whichever comes first), and then leave if Mary does not show up. Mary will wait only 5 minutes (or until 6 p.m., whichever comes first), and then leave if John does not show up. Letting
and
denote the arrival times of John and Mary, determine the sample space and describe events (i)–(viii) by drawing pictures, or by appropriate inequalities for
and
. If you think that the description is impossible, say so.
(i)
John arrives before Mary does.
(ii)
John and Mary meet.
(iii)
Either Mary comes first, or they do not meet.
(iv)
Mary comes first, but they do not meet.
(v)
John comes very late.
(vi)
They arrive less than 15 minutes apart, and they do not meet.
(vii)
Mary arrives at 5:15 p.m. and meets John, who is already there.
(viii)
They almost miss one another.
Problems 1.2.7–1.2.8 show an importance of the sample space selection.
1.2.7
Let
be the experiment consisting of tossing a coin three times, with H and T standing for heads and tails, respectively.
The following set of outcomes is an incomplete list of the points of the sample space
of the experiment
: {HHH, HTT, TTT, HHT, HTH, THH}. Use a tree diagram to find the missing outcomes.
An alternative sample space
for the same experiment
consists of the following four outcomes: no heads
, one head
, two heads
and three heads
. Which of the following events can be described as subsets of
but not as subsets of
?
More than two heads.
Head on the second toss.
More tails than heads.
At least one tail, with head on the last toss.
At least two faces the same.
Head and tail alternate.
Still another sample space
for the experiment
consists of the four outcomes
and
. The first coordinate is 1 if the first two tosses show the same face and 0 otherwise; the second coordinate is 1 if the last two tosses show the same face, and 0 otherwise. For instance, if we observe HHT, the outcome is
. List the outcomes of
that belong to the event
of
.
Which of the following events can be represented as subsets of
, but cannot be represented as subsets of
?
First and third toss show the same face.
Heads on all tosses.
All faces the same.
Each face appears at least once.
More heads than tails.
1.2.8
Let
be an experiment consisting of tossing a die twice. Let
be the sample space with sample points
with
and
being the numbers of dots that appear in the first and second toss, respectively.
(i) Let be the sample space for the experiment consisting of all possible sums so that . Which of the following events can be defined as subsets of but not of ?
One face odd, the other even.
Both faces even.
Faces different.
Result on the first toss less than the result on the second.
Product greater than 10.
Product greater than 30.
(ii) Let be the sample space for the experiment consisting of all possible absolute values of the difference so that . Which of the following events can be defined as subsets of but not of ?
One face shows twice as many dots as the other,
Faces the same,
One face shows six times as many dots as the other,
One face odd, the other even,
The ratio of the numbers of dots on the faces is different from 1.
1.2.9
Referring to
Example 1.9
, suppose that we modify it as follows: The respondent tosses a green die (with the outcome unknown to the interviewer). If the outcome is odd, he responds to the Q‐question; otherwise, he responds to the question “Were you born in April?” Again, the interviewer observes only the answer “yes” or “no.” Apart from the obvious difference in the frequency of the answer “yes” to the auxiliary question (on the average 1 in 12 instead of 1 in 6), are there any essential differences between this scheme and the scheme in
Example 1.9
? Explain your answer.
Next, we introduce concepts that will allow us to form composite events out of simpler ones. We begin with the relations of inclusion and equality.
Definition 1.3.1 The event is contained in the event , or contains, if every sample point of is also a sample point of . Whenever this is true, we will write , or equivalently, .
An alternative terminology here is that implies (or entails) .
Definition 1.3.2 Two events and are said to be equal, , if and .
It follows that two events are equal if they consist of exactly the same sample points.
Consider two tosses of a coin, and the corresponding sample space consisting of four outcomes: HH, HT, TH, and TT. The event “heads in the first toss” HH, HT} is contained in the event “at least one head” HH, HT, TH}. The events “the results alternate” and “at least one head and one tail” imply one another, and hence are equal.
Definition 1.3.3 The set containing no elements is called the empty set and is denoted by . The event corresponding to is called a null (impossible) event.
The reader may wonder whether it is correct to use the definite article in the definition above and speak of “the empty set,” since it would appear that there may be many different empty sets. For instance, the set of all kings of the United States and the set of all real numbers such that are both empty, but one consists of people and the other of numbers, so they cannot be equal. This is not so, however, as is shown by the following formal argument (to appreciate this argument, one needs some training in logic). Suppose that and are two empty sets. To prove that they are equal, one needs to prove that and . Formally, the first inclusion is the implication: “if belongs to , then belongs to .” This implication is true, because its premise is false: there is no that belongs to . The same holds for the second implication, so .
We now give the definitions of three principal operations on events: complementation, union, and intersection.
Definition 1.3.4 The set that contains all sample points that are not in the event will be called the complement of and denoted , to be read also as “not .”
Definition 1.3.5 The set that contains all sample points belonging either to or to (so possibly to both of them) is called the union of and and denoted , to be read as “ or .”
Definition 1.3.6 The set that contains all sample points belonging to both and is called the intersection of and and denoted .
An alternative notation for a complement is or , whereas in the case of an intersection, one often writes instead of .
The operations above have the following interpretations in terms of occurrences of events:
Event
occurs if event
does not occur.
Event
occurs when either
or
or both events occur.
Event
occurs when both
and
occur.
Consider an experiment of tossing a coin three times, with the sample space consisting of outcomes described as HHH, HHT, and so on. Let and be the events “heads and tails alternate” and “heads on the last toss,” respectively. The event occurs if either heads or tails occur at least twice in a row so that , while is “tails on the last toss,” hence, . The union is the event “either the results alternate or it is heads on the last toss,” meaning . Observe that while has two outcomes and has four outcomes, their union has only five outcomes, since the outcome HTH appears in both events. This common part is the intersection .
Some formulas can be simplified by introducing the operation of the difference of two events.
Definition 1.3.7 The difference, of events and contains all sample points that belong to but not to
The symmetric difference, , contains sample points that belong to or to , but not to both of them:
In Example 1.12, the difference is described as “at least two identical outcomes in a row and tails on the last toss,” which means the event {HHT, HTT, TTT}.
Next, we introduce the following important concept:
Definition 1.3.8 If , then the events and are called disjoint, or mutually exclusive.
Based on
